6/28/08

Debrief

Late yesterday afternoon, we had our best web server fail. The traffic load was too much for the remaining two web servers to handle, so performance tanked. Our host was unable to troubleshoot the issue, and we were told that they would take another shot at it Saturday (today) AM.

So, at this point, we knew a couple of things; we knew that our remaining two servers were not up to our traffic load, and that there was no guarantee that we would be able to get our big server back online today.

So yesterday evening, Automatt and I made the decision to try and migrate the service to what is called a "cloud computing" hosting provider. This is a new type of host that gives you virtual machines, and the ability to quickly add capacity as your traffic grows. It's all self serve, so it's conceivable that we would be able to get back up and running late Friday night.

So we configured five virtual webservers and a database server, migrated our database over to the new host (which took a few hours as it's pretty big), and pointed the "RateItAll.com"domain to the new IP address. By the time everything was ready to go, Automatt and I had worked through the night and it was after 6AM PST. Unfortunately, when you update DNS information, there's not an immediate update across the Web. In our case, some folks were not able to access the site until about 1:30PM today.

Once we had RateItAll up an running on the new service provider and everybody around the world was able to hit the site, it became quickly apparent that our new virtual host did not have the computing power to handle RateItAll's load. Many more virtual servers would have been needed, and it was unclear that even then, these virtual servers would be able to handle the dynamic nature of RateItAll.

Around the same time, our original host got back to us and let us know that they had managed to repair our big web server that had gone down late on Friday.

So at this point, early this afternoon, we made the decision to go back to our original hosting configuration, which we were sure could handle RateItAll's traffic. Some of you may still be seeing an IP address as opposed to the RateItAll.com domain in the web address line - this should be resolved back to RateItAll.com shortly.

Once we pointed the domain name back to our original IP address, for whatever the reason, bots from Google and Yahoo started hammering us at much higher intensity than normal. This has impacted performance today, making the site very slow. We are now working to slow down these crawlers to get the site speed back up.

It's been a brutal weekend so far with very little sleep and a lot of stress. The good news is that we learned some valuable info - mainly, that cloud computing (at least the provider we tried) is still too early in the game to be a viable host for a big traffic site like RateItAll. We also learned that although we have multiple servers, we are very much dependent on a single server to maintain our quality of service - this will be changing this week.

Thanks for your patience as we go through this.

3 comments:

  1. Larry, It sounds complicated. I don't know much about those things, but I appreciate your efforts, as I'm sure everyone does.Thanks for all the hard work. Rateitall is my absolute favorite site!

    Randy

    ReplyDelete
  2. Ditto...I don't understand the nerd speak to well, but thanks for all of the effort!

    ReplyDelete

Apture