One of the methods we use for checking the health of our applications is stats collected from HAProxy. We utilize it to see how many requests are scheduled for execution on mongrel instances. The graph is one indication of how our applications perform. When we launched the new version of the site three weeks ago, the graph for a single vertical (ReadingSocial) on a typical Tuesday looked like this:
So, between porting all verticals to Myspace, Orkut, Bebo, and enhancing the functionality, we spent some time on optimization. In addition to analyzing slow-query logs with mysqlsla, Aaron wrapped all external API calls (and we do a lot of them - to Amazon, Facebook, Myspace, etc) in slow monitoring so we could see where the latest external bottleneck was so we could fix it one by one. Three weeks later the graph became much more peaceful:
