Sunday, July 28, 2013

Apache zombie process problems.

The Apache server is being a dick. It keeps dying with unkillable zombie processes; these don't serve data, but they do keep their hold on port 80. The only workaround is to reboot the server. This is, of course, ridiculous. I've put a ticket in with Linode. Anyone got ideas? It's just the standard Ubuntu apache2 package.

Update: Apache was causing kernel oopses. (Things that make you go "wtf.") Linode suggest using kernel 3.9-linode instead of 2.6-linode (the default for Ubuntu 10.04, which our box runs on). We'll see how that goes next reboot. Might even fix the white-screening blog too.

Friday, July 12, 2013

Wiki slow, we're being hammered.

The wiki has been really slow all day. The Apache server and the Squids appear quite happy with life - but the load balancer is reporting huge traffic spikes. More information as I have it.

(The blogs are unaffected, as they're served directly from the Apache server.)

Update: Here's the load balancer connections per second. (The gaps are when logging is killed to increase throughput or something.) Note the bit where we go from a few tens of queries a second up to 800 a second.

Monday, May 27, 2013

Site is doing its yoyo impression again.

I'm not sure what's going on (and I'm currently without broadband, so I don't have reliable internet time to diagnose the problem), but the Apache server is occasionally going into conniptions and needing a reboot. (It's in one as I speak.) We are aware of the problem ... fwiw.

Monday, April 29, 2013

State of the blog: recovering.

And the blog returns from its holiday at the seaside. The blog is now happily resident on the main RationalWiki server. Just waiting for the DNS to resolve everywhere. I also need to fix the front page images (missing because of the way I did the import).
 
The next holiday at the seaside will probably be in a week or two, when I’m busy moving house. But I’m sure our contributors will write in great volume to take up the slack. Just as soon as I get around to recreating their accounts on this instance.

We also have an insanely dull front page, just waiting for a burgeoning blog network full of interesting people to spontaneously manifest from the quantum ether.

Wednesday, April 24, 2013

State of the blog: addled.

The site the RationalWiki Blog is presently living on is fine for what it is, but what it is includes not being up to our needs, hence the blog's party-stopping yoyo impression for much of today. We're finally getting around to doing the sensible thing and setting it up on the main RationalWiki server (which just got its memory doubled, so there's actually room for stuff). Please excuse hiccups in the meantime.

Saturday, April 20, 2013

Site slowness.

The site is frequently running ridiculously slowly. At present I don't really know why. Sometimes it's obvious (load average 30, Apache/PHP compiling lots of slow queries running), sometimes it isn't (when it was slow this afternoon). The system still has plenty of memory free. The squids don't show an unusual rate of hits on the site. Further updates at RationalWiki:Technical support. At least we can all be puzzled in a fully informed manner.

Tuesday, April 9, 2013

Short downtime for upgrade some time during 10-11 April 2013.

RationalWiki gets an upgrade! Per the details here, Linode has doubled everyone's RAM for free. Trent will be putting the various machines into upgrade queues today. He will stagger the upgrades starting with the Squids to limit downtime. However, when the backend is upgraded the site will not be available.

(We are very happy customers of Linode and can heartily recommend their service. This is a referer link that gives RWF a few pennies.)

Saturday, February 16, 2013

Search index updated, will update weekly.

Search on RationalWiki uses Lucene. The Lucene index hadn't actually updated since July 2012 - because incremental updating didn't work, because Trent had to disable the OAIRepository extension because it was breaking other stuff.

A complete reindex (we're currently around 86,000 pages total) takes 20-30 minutes, so David's set it to run one of those weekly. This is not ideal, but is a vast improvement.

Fixing it properly will require fixing OAIRepository, and possibly even writing documentation that doesn't have a big red warning saying it's completely wrong. Anyone feeling inspired?

Friday, February 8, 2013

Editing through open proxies has been blocked.

We've been suffering an attack from a vandalbot written specially for us which appears to hop from open proxy to open proxy. So I've had to disable editing through an open proxy, using the following lines in LocalSettings.php:

$wgEnableDnsBlacklist = true;
$wgDnsBlacklistUrls = array( 'xbl.spamhaus.org', 'opm.tornevall.org' );


Ordinary not-logged-in IP-number editors should be able to edit and remonstrate with us over our ignorance in the usual manner, per the invitation to do so on the front page (and have been doing so).

Note that I think the open proxy block also affects logged-in users. If you are a regular RW editor and this is a serious problem for you, please email David.

This only affects English RationalWiki, not Russian RationalWiki .

Editing through an ordinary non-anonymous proxy (your work or ISP) is fine.

Sunday, January 27, 2013

The thrashing of the squids.

RW now has a more elaborate setup than the previous single server:
  • apache1, which has the MySQL server, the Lucene search server and, of course, the Apache (with libphp5) - a 4GB Ubuntu 12.04 box
  • squid1, a 1GB Ubuntu 12.04 box running Squid (a reverse proxy server) and that's it
  • squid2, ditto
  • a load balancer, the thing at the IP for rationalwiki.org, which just sends requests to the squids as fast as it can.
The squids just serve up plain HTML really fast. Only logged-in requests, completely new requests and obscure diffs even get passed to apache1, where MediaWiki spends much time contemplating the request and eventually serves it up in its creaking majesty. So the less of that it has to do, the better.

This is holding up really nicely of late - we didn't even notice the effects of our last Reddit onslaught!

Lately it's been flaky - Squid has been exhausting memory on its servers badly enough to invoke Linux's Out Of Memory killer. Apparently this is pretty much always due to misconfiguration, so we're trying to work out the magical numbers. Please bear with us.

In the meantime, if you get a 503 error or a blank page or other really weird flakiness from the site, please email David at dgerard@gmail.com with the time and date of the error. This can be useful occasionally!

Edit: Kludge in place on squid1/2 to check the Squid process once a minute and restart if necessary. This should make things more reliable while we work out how to do it right.