Zas and I have been working hard to improve the capacity and stability of the site. In the last week, we’ve identified and fixed at least 3 problems with the search servers and we’ve added a timeout function that times out queries that take longer than 3 seconds. We think that the main cause of trouble was that queries were piling up after a slow query ran too long and that the servers never recovered from that and consequently crashed.
We won’t go as far as saying that the search servers are fixed — every time we have a smidgen of hope that things are improving, they crash again. Seemingly out of spite! So, the search servers are better. 😉
Zas has also made a number of changes to the gateways and how we rate limit our incoming traffic. The rate limiting is now being done in a smarter way that reduces the overall traffic on our web servers. Well done!
We’ve also increased our bandwidth budget by 4mbits per second, which makes the site feel considerably more responsive.
Let me put these improvement into numbers: About a week ago were were struggling to keep up 250 requests per second and the site felt very sluggish. Now we can handle 500 requests a second and the site feels considerably faster. For large chunks of the day we are managing to handle all the traffic we should handle. And, the search servers haven’t crashed in 4 days!
We hope that this will give us a solid base from which to release the scheme upgrade tomorrow. Then once that is complete, we will start work on moving to the new hosting company.
Thanks for being patient with us!
WOW, it sounds very good! 🙂