Nude Japanese Nurses snuck into Terry’s RSS feed?

Terry Heaton has an RSS feed through Technorati that’s a search pattern for his name, same as what I do with NewsGator Online. Indeed, it’s usually a smart strategy for keeping track of your company, products and identity in the blogosphere.
Except when Nude Japanese Nurses sneak into the picture:

Technorati search RSS feed: with porn injected?

Now it’s possible that Terry’s name is appearing in the porn RSS feed, but unlikely. Nonetheless, Dave Sifry at Technorati explains…

On a mailing list, Dave simply responded:
“I’ve passed this on to our data quality folks. The spam should be eliminated from your feed shortly.”
That doesn’t make sense, though, either to Terry or myself. It’s just hard to imagine that Terry’s name appears within the porn blog’s RSS feed content, though, as you’ll find out, that’s exactly what’s happening here.
I asked Dave for a clarification and he explained:
“It means that someone mentioned your search term in a spam blog, and we haven’t marked that blog as spam yet.”
Digging into the feed itself with some Linux tools (did I mention how nice it is to have a command line interface in Mac OS X?) I found that indeed the page that the feed references both “Terry” and “Heaton”, though not adjacent to each other.
First, the first appearance of pattern “terry”:


and then the pattern “heaton”:


(Here I’m showing two lines of context from the feed on either side of the matching word, which I’ve put into bold just to make it easier to see what’s going on.)
So Dave is right: by the coincidence of there being two porn stars mentioned in the blog spam page, Patricia Heaton and Terry Farrell, Terry Heaton now finds his RSS feed infected with Nude Japanese Nurses. Who would have thought?
The problem here illustrates the challenge of keeping communication channels clean and free of the infection of spam and porn. When we gained the ability to subscribe to RSS feeds that were smart searches, few realized that even a complicated pattern could have the unintended consequences of giving these lowlifes yet another channel of access.
Now you can see why an RSS feed tool is a lot more complicated too.
What surprises me is that porn and spam bloggers aren’t just posting entries of the most common 100 first names and 100 last names, knowing that many many people who have RSS search feeds will inadvertently find their names and match the junk. Terry didn’t match this page because the porn blogger deliberately aimed to have this happen, it was just an unfortunate coincidence of naming. But tomorrow?
Geek tip: To get the output I showed above, I used the “GET” command to pull the raw source of the page onto my system, then “tr” to change all spaces to line breaks and “grep -C” to match patterns with two lines of context above and below.

5 comments on “Nude Japanese Nurses snuck into Terry’s RSS feed?

  1. LOL! You might want to retract the statement about the two “porn stars”. Patricia Heaton = the wife on “Everyone Loves Raymond” and Terry Farrell = “Lt. Commander Jadzia Dax” on Deep Space Nine. 😉

  2. Good discussion Dave.
    Whenever mass quantities of disparate information are aggregated into one spot, the likelihood of unintended assimilations grow – perhaps exponentially (I’m no math genius).
    We build and manage a fair number of mashup feeds based on a variety of content sources, but we’ve learned that (for business purposes) trusted content sources represent the only viable approach to leveraging external content. But even so, there still remains a slight risk of getting bad content.
    We use Google’s API, Google News, and specifically targeted domains to build both internal intelligence and customer-facing content aggregations. Sometimes we use other known and trusted feeds. Our automated services leverage RSS resources and re-deilver the results as secure (or public) HTML pages and also offer RSS feeds depending on the business requirement.
    To further manage the slight risk of attracting unintended hits, we provide our customers with a content review process that allows them to jettison any information of questionable origin or deemed unsuitable. Once an item (or domain) is given the boot, our services monitor all feeds (not just the one where the nefarious item was discovered) and never allows the content in. Lastly, we provide specific configuration commands that can be employed to filter incoming content with great prejudice, lowering the risk of poor quality items long before the feed is constructed.
    In the case of Terry’s situation, the semantic web has proven to be not so semantically astute. 😉 It’s not surprising – most of us understand why this happens, but what we didn’t anticipate was the rapid adoption of “persistent search” – i.e., the ability for anyone to create a search feed that’s always fresh with new discoveries. This is a very powerful idea because it provides a way for content to find you in your news reader (instead of the other way around). However, when search feeds are poorly constructed, you’re likely to get lots of crap. This represents one definition of a low-quality feed.
    How does Terry (or anyone or any business) combat this problem? Ironically, the same approach provides a method – create an intelligence service based on persistent searches that comprehensively watch for assimilations of your business brand with terms you might not like to see.
    There are many ways to do this – my favorite is Blogsite of course, but YahooPipes is also a great place to create and manage your own intelligence dashboard.

  3. One more comment –
    While blogrolls and trackbacks are useful, they too create unacceptable risks for businesses that blog and leverage RSS publicly – after all, these ideas create unintended relationships that also must be monitored.
    An alternative approach is to use direct and related assimilations via search between your posts and external content sources. You’ll see a good example here:

Leave a Reply

Your email address will not be published. Required fields are marked *