Who is that knocking at my Weblog’s door? -- analysis from Intuitive Stories

As part of my research for a new book I’m writing, I was digging around in my Ask Dave Taylor Web site just to see how the most recent Web browsers identify themselves. Much to my surprise, there are literally hundreds of different crawlers hitting the site now, over and above the usual 20-30 popular Web browsers. Crawlers that I’ve never heard of from sites — when they’re identified at all — that are equally unfamiliar to me.

Here’s a pile of different robots and crawlers I found in my log file, all visiting within a single 24 hour period:

Amfibibot/0.06 (Amfibi Robot; http://www.amfibi.com)
Baiduspider+(+http://www.baidu.com/search/spider.htm)
BecomeBot/1.23; +http://www.become.com/webmasters.html)
BecomeBot/2.0beta; +http://www.become.com/webmasters.html)
blogsnowbot (+http://www.blogsnow.com/bot.html)
boitho.com-dc/0.xx (http://www.boitho.com/dcbot.html)
Enterprise_Search/1.00.143;MSSQL (http://www.innerprise.net/es-spider.asp)
everyfeed-spider/1.0 (http://www.everyfeed.com)
FAST Enterprise Crawler 6 (Experimental)
HenryTheMiragoRobot (http://www.miragorobot.com/scripts/mrinfo.asp)
HooWWWer/2.0.9 (+http://cosco.hiit.fi/search/hoowwwer/)
Iltrovatore-Setaccio/1.2 (It-bot; http://www.iltrovatore.it/bot.html)
msnbot/0.3 (+http://search.msn.com/msnbot.htm)
NewzCrawler/1.7 (Newz Crawler
NextopiaBOT (+http://www.nextopia.com)distributed crawler client beta
NPBot (http://www.nameprotect.com/botinfo.html)
NusEyeFeedCrawler/0.005 (cs.northwestern.edu);
NutchCVS/0.05 (Nutch; http://www.nutch.org/docs/en/bot.html)
psbot/0.1 (+http://www.picsearch.com/bot.html)
Spider-Sleek/2.0 (+http://search-info.com/linktous.html)
SpurlBot/0.2)
SurveyBot/2.3 (Whois Source)
Trampel-Bot (www.trampelpfad.de)
TutorGigBot/1.5 ( +http://www.tutorgig.info )
Vagabondo/2.0 MT; http://aanmelden.ilse.nl/?aanmeld_mode=webhints)
ZyBorg/1.0 ( http://www.WISEnutbot.com)

Thankfully, many of these are polite enough to include a URL where I can glean more information, but it’s a darn surprise how many there are!

Playing detective for a bit, there are some interesting sites visiting my server, including BecomeBot, which is”the user-agent for Become’s new web crawler. Become is crawling the web to build a next generation search engine.” and TutorGig, which “lists thousands of courses. These courses include not only online courses, but also more traditional courses that are taught in person on or off campus. Users locate courses by searching on keywords of interest. TutorGig.com has a huge database of over a million tutorial sites categorized by more than 2000 subjects.”

Further, I’m sure that some of the crawlers that hit my site are spam tools. When a crawler identifies itself as larbin_2.6.3 larbin2.6.3@unspecified.mail, libwww-perl/5.76, gazz/5.0, Pluck Soap Client/1.0Program Shareware 1.0.2 or HenryTheMiragoRobot, LPW::Simple, or one of my other favorites, Anonymized by Stegos Internet Anonymizer, ya just gotta wonder…

Anyone else being overrun by weird and suspicious bots?

4 comments on “Who is that knocking at my Weblog’s door?”

Stewart Vardaman says:

January 6, 2005 at 9:45 pm

I pull your site’s RSS every morning using Sunrise 0.36. I read most blogs on my Palm Tungsten at work during breaks/lunch.etc.

Jason Dowdell says:

January 24, 2005 at 10:29 am

Re: the become.com spider
They’re launching Feb 10, 2005 and will have 2.2 billion pages in their index… All of which are related to shopping. They’ll be debuting a proprietary algorithm as well.

Dave says:

June 8, 2005 at 10:23 am

I have been getting hit with the obidos-bot. Ever heard of that one?

Dave Taylor says:

June 9, 2005 at 12:25 am

Some Google investigation reveals that the chap who owns this Web site — http://www.onfocus.com/ — is the author of obidos-bot. It also suggests that his ‘bot ignores the robots.txt file and ruleset, frustratingly.

Intuitive Stories

Who is that knocking at my Weblog’s door?

Related

Braindead Affiliate Tax Lands, Amazon Cuts Me Off

What Wikipedia Lost: Credibility

Cool films coming in 2007, and some clunkers

Social networking cartoon of the day: Penny Arcade

Warner experiments with theatrical + DVD hybrid release for “Watchmen”

A class of CU journalism seniors, and only one was blogging?

4 comments on “Who is that knocking at my Weblog’s door?”

Leave a Reply Cancel reply

Share this:

Related

Related Posts

4 comments on “Who is that knocking at my Weblog’s door?”

Leave a Reply Cancel reply