Sunday, January 27, 2013

The thrashing of the squids.

RW now has a more elaborate setup than the previous single server:
  • apache1, which has the MySQL server, the Lucene search server and, of course, the Apache (with libphp5) - a 4GB Ubuntu 12.04 box
  • squid1, a 1GB Ubuntu 12.04 box running Squid (a reverse proxy server) and that's it
  • squid2, ditto
  • a load balancer, the thing at the IP for rationalwiki.org, which just sends requests to the squids as fast as it can.
The squids just serve up plain HTML really fast. Only logged-in requests, completely new requests and obscure diffs even get passed to apache1, where MediaWiki spends much time contemplating the request and eventually serves it up in its creaking majesty. So the less of that it has to do, the better.

This is holding up really nicely of late - we didn't even notice the effects of our last Reddit onslaught!

Lately it's been flaky - Squid has been exhausting memory on its servers badly enough to invoke Linux's Out Of Memory killer. Apparently this is pretty much always due to misconfiguration, so we're trying to work out the magical numbers. Please bear with us.

In the meantime, if you get a 503 error or a blank page or other really weird flakiness from the site, please email David at dgerard@gmail.com with the time and date of the error. This can be useful occasionally!

Edit: Kludge in place on squid1/2 to check the Squid process once a minute and restart if necessary. This should make things more reliable while we work out how to do it right.

6 comments:

  1. I've sometimes seen edits not show up without hard refreshing (or at least waiting, I don't know which). Is that part of this?

    ReplyDelete
  2. Might be, though it should actually tell Squid to clear an edited post.

    ReplyDelete
  3. Getting constant 503s right now.

    ReplyDelete
  4. Yeah, the squids both fell over. I'm going to need to put a checker on that checks every minute, until we work out the correct squid config. Gah.

    ReplyDelete