Sunday, September 13, 2009

Server crash post-mortem

Time for the official post-mortem of what happened as far as the server crash goes. The official cause of the crash shall be listed as a failing power supply unit.

About a month ago the power supply for the server went completely dead. In order to get the server back up and running as quickly as possible I swapped in a spare unit I had from an older computer. It did the job beautifully. Seeing as how everything appeared to be working fine and there were no substantial problems I didn't replace it with a new unit.

About 4 days before I left for my trip back home the server shut itself down. When I booted it back up I had some problems with the MYSQL server and so chalked the problem up to that. Then I left and went away. We all know what happened next. When I got back to the server I found that it was in the same disabled state as the first crash. I got it back up and decided to watch and see what it would do. A day later the same thing happened.

So I ran some tests on the power supply and it was providing irregular power on the 12v rail, my guess is that was probably leading to a temperature triggered shutdown. Anyway, regardless, 2 days ago I purchased a high-end power supply unit and swapped it in. Everything seems to be running fine now.

Thanks to the donations everyone at RW gave or have promised to give, I have gone ahead and upgraded some of the networking hardware that was worrying me as well. I am also working on getting some hardware to allow for remote management of the server even if it is unresponsive, as well as server resets automatically if it becomes unresponsive.

So the whole thing is my fault for not swapping in a new power unit after the old one failed and instead relying on a spare one. Feel free to block me for some pi unit of time for my failing.

As a final note, if this is truly an act of God as more than one person posited, it is pretty convoluted and weak. A swarm of locust munching on my power cords would have been far more effective in both maintaining downtime and for the general "shock and awe" of it all.

And a few RationalWiki prods for the road:

No true Scotsman
Common descent

16 comments:

  1. First!

    Thanks so much for all your hard work Trent. Yo rock. Gonna set up a new PayPal account and send you some cash as soon as my first pay cheque of the semester arrives...

    T of P.

    ReplyDelete
  2. Thank you, you are a gentleman and a scholar

    -Passerby25

    ReplyDelete
  3. I like the new "backup is running" feature which explains why my edits have spinning wheels.

    Nice work.

    ReplyDelete
  4. Do we have a problem this Sunday?

    ReplyDelete
  5. I think we do, but we'll have to wait until Trent comes back from church.

    ReplyDelete
  6. onoz teh vandal site is down again lol!!!!!! just kidding.

    Trent, hurry up and get back from church so you can layyy your hands on teh computer and HEAL it in the NAAAME of Jesus!

    - s. skwrl

    ReplyDelete
  7. On the plus side this blog tells us: "RationalWiki is currently down" - but if this also means that action is being taken is not clear.

    Time to whip the hamster I feel.

    ReplyDelete
  8. The server is unreachable, looks like a hardware problem again.

    ReplyDelete
  9. I sent a mail to Trent a while ago.

    ReplyDelete
  10. Ratwiki shut down again so Trent can justify asking for even more money. It will be interesting to see how long the ratidiots play along.

    ReplyDelete
  11. Rationalwiki is down, another victory for conservativism over the evils of liburuls!

    ReplyDelete
  12. Heavens no, Jason H, is was an ACT OF GOD. Surely you should know that by now - God really wants CP to succeed. Plus, you forget - atheists are terrible at charity, so Trent won't get much from us.

    ReplyDelete
  13. The evil, liberal, athiests triumph!

    RW is back up.

    ReplyDelete
  14. I couldn't access the site earlier, but the blog widget said 'RationalWiki is up'......

    I wish I was a right-wing bible-basher; their sites never fail!

    ReplyDelete
  15. This text is appearing instead of the "correct" text on all articles and talk pages:

    A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was:

    (SQL query hidden)

    from within function "Revision::loadText". MySQL returned error "144: Table './rationa1_wiki/rw_text' is marked as crashed and last (automatic?) repair failed (localhost)".

    TofP

    ReplyDelete
  16. Me too (and the monitor thing's showing 4 minutes up)

    A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was:

    (SQL query hidden)

    from within function "MessageCache::loadFromDB". MySQL returned error "144: Table './rationa1_wiki/rw_text' is marked as crashed and last (automatic?) repair failed (localhost)".

    ReplyDelete