Spot the Bot

So I had a really fun day yesterday. I spent the day poring over reports like this, trying to figure out which robots, or bots, are responsible for our servers being hammered over the last few days.

Bots are little software programs that follow links around the Web and that suck back data to some central location. Examples of good bots include Google and Yahoo! bots which collect the data to build the indexes for their search engines. I don't mind them, because they send us traffic. Examples of bad bots include mysterious, unnamed sites in China who don't obey the bot standards, and are most likely scraping data for their own use.

I found about 30 IP addresses of suspicious looking bots that were consuming large amounts of bandwidth, or were hitting us large numbers of times in short periods.

I then crosschecked these IP addresses against an RIA member list, to make sure that the black list that I was building did not include any humans.

The good news is that I found a couple of culprits - bots that are hitting us so hard, and so fast, that they would have the potential to cripple the site. Those bots have been banned now, and I think that performance is improving.

No comments:

Post a Comment