Who is that knocking at my Weblog’s door?

As part of my research for a new book I’m writing, I was digging around in my Ask Dave Taylor Web site just to see how the most recent Web browsers identify themselves. Much to my surprise, there are literally hundreds of different crawlers hitting the site now, over and above the usual 20-30 popular Web browsers. Crawlers that I’ve never heard of from sites — when they’re identified at all — that are equally unfamiliar to me.

Here’s a pile of different robots and crawlers I found in my log file, all visiting within a single 24 hour period:

  • Amfibibot/0.06 (Amfibi Robot; http://www.amfibi.com)
  • Baiduspider+(+http://www.baidu.com/search/spider.htm)
  • BecomeBot/1.23; +http://www.become.com/webmasters.html)
  • BecomeBot/2.0beta; +http://www.become.com/webmasters.html)
  • blogsnowbot (+http://www.blogsnow.com/bot.html)
  • boitho.com-dc/0.xx (http://www.boitho.com/dcbot.html)
  • Enterprise_Search/1.00.143;MSSQL (http://www.innerprise.net/es-spider.asp)
  • everyfeed-spider/1.0 (http://www.everyfeed.com)
  • FAST Enterprise Crawler 6 (Experimental)
  • HenryTheMiragoRobot (http://www.miragorobot.com/scripts/mrinfo.asp)
  • HooWWWer/2.0.9 (+http://cosco.hiit.fi/search/hoowwwer/)
  • Iltrovatore-Setaccio/1.2 (It-bot; http://www.iltrovatore.it/bot.html)
  • msnbot/0.3 (+http://search.msn.com/msnbot.htm)
  • NewzCrawler/1.7 (Newz Crawler
  • NextopiaBOT (+http://www.nextopia.com)distributed crawler client beta
  • NPBot (http://www.nameprotect.com/botinfo.html)
  • NusEyeFeedCrawler/0.005 (cs.northwestern.edu);
  • NutchCVS/0.05 (Nutch; http://www.nutch.org/docs/en/bot.html)
  • psbot/0.1 (+http://www.picsearch.com/bot.html)
  • Spider-Sleek/2.0 (+http://search-info.com/linktous.html)
  • SpurlBot/0.2)
  • SurveyBot/2.3 (Whois Source)
  • Trampel-Bot (www.trampelpfad.de)
  • TutorGigBot/1.5 ( +http://www.tutorgig.info )
  • Vagabondo/2.0 MT; http://aanmelden.ilse.nl/?aanmeld_mode=webhints)
  • ZyBorg/1.0 ( http://www.WISEnutbot.com)

Thankfully, many of these are polite enough to include a URL where I can glean more information, but it’s a darn surprise how many there are!

Playing detective for a bit, there are some interesting sites visiting my server, including BecomeBot, which is”the user-agent for Become’s new web crawler. Become is crawling the web to build a next generation search engine.” and TutorGig, which “lists thousands of courses. These courses include not only online courses, but also more traditional courses that are taught in person on or off campus. Users locate courses by searching on keywords of interest. TutorGig.com has a huge database of over a million tutorial sites categorized by more than 2000 subjects.”

Further, I’m sure that some of the crawlers that hit my site are spam tools. When a crawler identifies itself as larbin_2.6.3 larbin2.6.3@unspecified.mail, libwww-perl/5.76, gazz/5.0, Pluck Soap Client/1.0Program Shareware 1.0.2 or HenryTheMiragoRobot, LPW::Simple, or one of my other favorites, Anonymized by Stegos Internet Anonymizer, ya just gotta wonder…

Anyone else being overrun by weird and suspicious bots?

4 comments on “Who is that knocking at my Weblog’s door?

  1. Re: the become.com spider
    They’re launching Feb 10, 2005 and will have 2.2 billion pages in their index… All of which are related to shopping. They’ll be debuting a proprietary algorithm as well.

Leave a Reply

Your email address will not be published. Required fields are marked *