Why are all the blog search engines so terrible?

When I first started tracking discussions in the blogosphere, I used Google, which, while it had the benefit of not forcing me to actually type in a URL since it was already integrated into my browser, wasn’t a great solution because – as of yet – Google doesn’t track blog articles, per se. They show up eventually, but mixed in with the rest of the Web.
Then I was turned on to Technorati and used their service for a while, particularly liking how I could subscribe to an RSS feed of my search. A cool idea and great use of RSS. But, somehow, it never quite found all the blog discussions that referenced the key phrases or specific domains I was tracking.


Then I moved from NetNewsWire to NewsGator Online‘s online service and used their “Smart Feed” feature to track the discussion. But it too suffers from the same limitation of not catching everything to do with my specific feed subjects.
A week ago I started reading about how people were extolling Bloglines as the best place to track blog discussions, so I’ve spent some time experimenting with its tracking capabilities. You know what? It’s not very good either.
I search for my own web site’s domain name, for example (think of it as the poor man’s trackback 🙂 and Bloglines shows me matches, but it shows me too many darn matches.
If my colleague DL Byron were to write about me in his Textura Design blog, for example, I’d see a Bloglines match for his entry at the URL www.texturadesign.com and another one for just texturadesign.com. Lame. Annoying, redundant and lame.
And I haven’t even mentioned the frustration of when my search pattern appears in a weblog article that’s cloned on two or three sites (Shel, Neville, you know what I’m talking about!)
Stephen Baker over at BusinessWeek did a nice writeup of the basic challenges in this space too, if you want some background reading: Finding a blog in a haystack.
This lack of a good search engine for the blogosphere is one reason that I’m eagerly awaiting the entry of Google, MSN Search and Yahoo into the RSS tracking world: they have much richer search patterns and much more savvy algorithms to eliminate redundant and duplicate matches.
So, Technorati, Bloglines, NewsGator et. al., what’s up with this?
Surely someone can apply some basic pattern recognition to these tools so that we can have one decent blogosphere tracking utility, ideally one where I could exclude specific mirror sites and highlight other sites?

This article originally appeared on the Blog Business Summit Web site in a slightly different form.

Update: Joe Wikert has some quite cogent thoughts on Blog Search Engines that are definitely worth reading as a supplement to this article.

One comment on “Why are all the blog search engines so terrible?

  1. Indeed. The real puzzle here is why Google can’t do this themselves. Particularly when they own blogger.com
    But then that means they’ve left a hole in search for others as well as startups like Technorati and Bloglines to fill.

Leave a Reply

Your email address will not be published. Required fields are marked *