Terry Heaton has an RSS feed through Technorati that’s a search pattern for his name, same as what I do with NewsGator Online. Indeed, it’s usually a smart strategy for keeping track of your company, products and identity in the blogosphere.
Except when Nude Japanese Nurses sneak into the picture:
Now it’s possible that Terry’s name is appearing in the porn RSS feed, but unlikely. Nonetheless, Dave Sifry at Technorati explains…
On a mailing list, Dave simply responded:
“I’ve passed this on to our data quality folks. The spam should be eliminated from your feed shortly.”
That doesn’t make sense, though, either to Terry or myself. It’s just hard to imagine that Terry’s name appears within the porn blog’s RSS feed content, though, as you’ll find out, that’s exactly what’s happening here.
I asked Dave for a clarification and he explained:
“It means that someone mentioned your search term in a spam blog, and we haven’t marked that blog as spam yet.”
Digging into the feed itself with some Linux tools (did I mention how nice it is to have a command line interface in Mac OS X?) I found that indeed the page that the feed references both “Terry” and “Heaton”, though not adjacent to each other.
First, the first appearance of pattern “terry”:
and then the pattern “heaton”:
(Here I’m showing two lines of context from the feed on either side of the matching word, which I’ve put into bold just to make it easier to see what’s going on.)
So Dave is right: by the coincidence of there being two porn stars mentioned in the blog spam page, Patricia Heaton and Terry Farrell, Terry Heaton now finds his RSS feed infected with Nude Japanese Nurses. Who would have thought?
The problem here illustrates the challenge of keeping communication channels clean and free of the infection of spam and porn. When we gained the ability to subscribe to RSS feeds that were smart searches, few realized that even a complicated pattern could have the unintended consequences of giving these lowlifes yet another channel of access.
Now you can see why an RSS feed tool is a lot more complicated too.
What surprises me is that porn and spam bloggers aren’t just posting entries of the most common 100 first names and 100 last names, knowing that many many people who have RSS search feeds will inadvertently find their names and match the junk. Terry didn’t match this page because the porn blogger deliberately aimed to have this happen, it was just an unfortunate coincidence of naming. But tomorrow?
Geek tip: To get the output I showed above, I used the “GET” command to pull the raw source of the page onto my system, then “tr” to change all spaces to line breaks and “grep -C” to match patterns with two lines of context above and below.