Wednesday, December 9, 2009

Colbert Nation

The server is under tremendous load tonight, most likely from our top spots for relevant google searches after Andrew Schlafly's appearance the Colbert show.

I will do what I can behind the scenes to maximize our ability to weather the storm, but it is likely to be rough for a day or so. I imagine the initial rush should be only a few hours.

Thursday, November 26, 2009

Techinical issues...

If you have tried to access RW in the last hour or so you may have noticed we are having some technical issues.

Working on it while I type, and the good news is that the server was able to recover from a crash to a state that remote access was available. This is important because it means our fault protection appears to be working allowing for remote recovery from issues.

Hoping to have everything back and working with in the hour.

Update: Site recovery went fine, but rerunning back up since it got jacked up, so looking at another 20 minutes or so till the site is fully accessible again.

UPDATE: Okay site should be back up and running as normal. Its well past my bedtime.....have to teach tomorrow morning. OH well. Leave a message on my talk page or on here if you have problems with the site or you can contact Nx as he should be able to handle anything. He is still here right?

Sunday, November 1, 2009

October stats

October was the most active month ever for RW. There were multiple events that helped spur us on to nearly 100,000 unique IP address visits. We also appear to have recovered from our slump due to the RW outage in August/September.



Below is the daily traffic record. We had three major events that helped propel our traffic. I will talk about those below.



Tuesday, October 6, 2009

Traffic spike

Due to the coverage of The Conservative Bible Project RW is seeing a predictable jump in traffic. Its still going hot and in order to keep serving up are brand of the good news I am holding off on the back up for tonight.

Don't break anything.

Saturday, October 3, 2009

An act of "nature"

Sorry folks, based on the blinking clock above my computer it appears the power went out to my house. All the various fault protection equipment kicked in as appropriate but I can't sustain the server for longer than about half an hour with no power. Based on site monitoring it appears that it was inaccessible for only a few minutes at most.

Friday, September 25, 2009

Fault protection take 2

I have received the additional hardware I needed to get the fault protection working...I think. I will be setting that up and running some test on it today. It shouldn't effect RationalWiki at all as I can do my testing further down the network chain. If all goes well I will need to restart the server and that's about it. I will drop an intercom on RW when/if I do that.

Update: Okay, fault protection seems to be working well. I have implemented it live on the server now. I will continue to monitor everything and adjust as needed. But all appears well for now.

Wednesday, September 23, 2009

So just how bad has it been?

The recent downtime issues annoy me at least (probably more) than every user of RationalWiki. As always, I strive to offer the best service as far as performance, availability, and protection against catastrophe that I can. But I am not perfect, and I think it is important to keep in mind that this is hobby, and my "real life" as a graduate student is as overworked and under paid as most cliches depict it.

All that said and done, just how bad has it really been? After the great crash in August I started working with a service to help monitor RationalWiki's up time and server performance. With close to three weeks worth of data we can paint a picture for how bad things have been.



If you take a look at this analysis you can see that RW has actually been up 95 percent of the time. That is not bad all things considered. Take into account a few points: first that a good chunk of our downtime is packed together so it is mostly caused by 1 or 2 disasters that caused prolonged downtime, and second that our nightly backups cause time out errors for about 20-30 minutes. If you remove the few major disasters our uptime averages just over 99 percent a day, and without the backups you are looking at 100 percent coverage most days.

The key then is disaster recovery. To be able to quickly handle issues that cause long protracted downtime. Most of these are easily handled if I am awake and with in walking distance of the server. The issues today with the cable going down are very rare. So we are left with one major issues: server cop-outs that prevent remote log in and shutdown the site that occur when I am either asleep or traveling.

I am actively working on a solution that I think will greatly increase the servers ability to auto-recover from failure, and to expand the options for remote administration in the event of catastrophic failure when I am not present (ala what happened in August).

A lot of this is trial-by-error and learning as I go. I have never done a project like this before. All we can do is learn from our mistakes, and move forward with the goal of doing the best we can. That said what we do have is pretty good I think.

And we are back....

What appears to have happened is my next door neighbors got a cable hookup, and God put it into the mind of the cable technician to accidentally disconnect my cable while doing the hook up.

Where God failed in his plan was in prevent another technician from showing up and fixing the problem. An even larger failure was that he allowed the technician to show up hours early! How often does that happen? I think we know who to thank for that!

On a more serious note, in my comment on the last post I discussed the monthly cost of running RW. The point was not to ask for more money, the cost stems form the dedicated IP address and extra bandwidth that RW requires. It is an extra $50 a month above what I would pay anyway. I get about $20 a month in donations from people that give every month. That means my per month out of pocket expense is $30 a month. I can totally handle that. The occasional need for new hardware that I can't afford is usually met almost immediately by a small donation drive. Every now and then I get an extra donation that I keep on the back burner for emergencies.

The RW accounts have about $80 sitting in them right now for emergency purchases. I like to keep that amount around $100-$150 to make sure I can get almost anything we need.

All of this is to say that at the moment the financial cost of RW is not really a burden on anyone. That was the point to moving to a privately hosted site.

If we were to try and move to a commercially hosted site the financial burden increases multiple fold for me and those people who are able and willing to donate. There is also substantial risk that if we fail to get enough cash we could lose the site.

So we put up with this less than perfect up time because it means that RW is under no immediate risk of permanent shutdown and is not a going to bankrupt anyone in the proccess.

When it rains...

So God's will has struck again, the site is down but this time it has nothing to do with the server. My internet connection seems to have crapped out. Because I pay $200 a month for it though one of the perks is "emergency service" so I have a guy coming in sometime beteween 5pm-8pm EST to take a look at it and hopefully get everything back and running.

Tuesday, September 22, 2009

No sleep for the server admin

So I was up till 4 am trying to get the fault protection working. It appeared to be working beautifully so I went to bed. Only to be awoken 3 hours later with it going nuts. I have removed it from the system for now. I think I need another piece of hardware to get everything working together the way I want. So I am putting it all on hold for now. I will order the new hardware today so it could be upwards of a week before I go at this again.

My goal is to have it all in place before my trip to Chicago.

And now I think I am getting a cold. I blame the stress and lack of sleep damn it.

Okay I lied

As is probably obvious I lied in the last updated when an idea for a solution to my problem came to my mind.

So I think I have everything setup. I am going to avoid going into specifics for security reasons but we now have much greater remote administration abilities that are no longer dependent on the server being online. I have also setup a range of automated monitoring software and utilities that will aid in both keeping track of site up time and doing some automated tasks that should allow the server to recover from all but the most serious of crashes automatically.

With that I am going to bed.

Monday, September 21, 2009

Extended maintenance

The site is going down this afternoon for extended maintenance. Running some tests, changing some options and install some new hardware. All designed to try and help deal with some of the recent outages. Running the backup first, then I will get started.

Update: Screw it I am done for tonight. Got about half of what I wanted figured out. Luckily the last half lets me keep RW up most of the time I am working. I will have to come back and keep working on this probably tomorrow which means don't freak if there is intermittent downtime for a minute or so every now and then for the next day or so.

Repairing a table in the database

Things will be locked up for a few minutes while the repair is run. I am aware of the situation and working on it, and hope to have things back up shortly.

UPDATE: Repair is done site is back online let me know if there are further issues.

Sunday, September 13, 2009

Server crash post-mortem

Time for the official post-mortem of what happened as far as the server crash goes. The official cause of the crash shall be listed as a failing power supply unit.

About a month ago the power supply for the server went completely dead. In order to get the server back up and running as quickly as possible I swapped in a spare unit I had from an older computer. It did the job beautifully. Seeing as how everything appeared to be working fine and there were no substantial problems I didn't replace it with a new unit.

About 4 days before I left for my trip back home the server shut itself down. When I booted it back up I had some problems with the MYSQL server and so chalked the problem up to that. Then I left and went away. We all know what happened next. When I got back to the server I found that it was in the same disabled state as the first crash. I got it back up and decided to watch and see what it would do. A day later the same thing happened.

So I ran some tests on the power supply and it was providing irregular power on the 12v rail, my guess is that was probably leading to a temperature triggered shutdown. Anyway, regardless, 2 days ago I purchased a high-end power supply unit and swapped it in. Everything seems to be running fine now.

Thanks to the donations everyone at RW gave or have promised to give, I have gone ahead and upgraded some of the networking hardware that was worrying me as well. I am also working on getting some hardware to allow for remote management of the server even if it is unresponsive, as well as server resets automatically if it becomes unresponsive.

So the whole thing is my fault for not swapping in a new power unit after the old one failed and instead relying on a spare one. Feel free to block me for some pi unit of time for my failing.

As a final note, if this is truly an act of God as more than one person posited, it is pretty convoluted and weak. A swarm of locust munching on my power cords would have been far more effective in both maintaining downtime and for the general "shock and awe" of it all.

And a few RationalWiki prods for the road:

No true Scotsman
Common descent

Saturday, September 12, 2009

The expanding face of RationalWiki

Awhile ago I posted a small picture of the RationalWiki server. It seemed to amuse people. Well since that time the complex of RW has expanded to take up more and more space. I decided to upload a new picture so everyone can see the new face of RationalWiki:


Woot! New network switch just arrived.

I am replacing the weird little neon hunk of plastic that I think is a network switch...but the Mandarin confuses me......with a solid linksys switch I ordered from newegg. Thanks to everyone that helped donate!

I think the switch was by far the "weakest" point in the network setup, and most likely to fail next. So this is a good upgrade.

A switch should take less than a minute to install so I am not bothering with an RW intercom message. But posting this here just in case I blow something up and the site stays down longer than expected.

After this just need to get some automatic/remote server monitoring hardware and we will be set!

Friday, September 11, 2009

New status widget and update on google

New widget

So for fun I have setup a little widget on the blog to show the status of the RationalWiki servers.

If the server is down it just says server down. If the server is up and working it will display the number of hours and minuets that the server has been up "straight." That means no reboot or power down. I am also displaying the 15 minute running average for the CPU load so people can see how busy the server has been recently.

Google update

So based on my searching we are back on top for Andrew Schlafly and Poe's Law which were two of our bigger hitters for search engine referred traffic. Not everything is reindexed yet but it looks like we should recover all right from our downtime.

Hardware replacement for real this time

Okay, so now that our magic new backup system appears to be working, I can actually do what I meant to do yesterday. So the site is down because I am replacing hardware.


Obligatory Google prod of the day:

Denyse O'Leary in honor of the first person to openly admit to considering a defamation lawsuit against us.
Esther Hicks just because.

UPDATE: It looks like everything went exactly as planned, smooth upgrade, site back online. I will continue to monitor the situation to make sure nothing weird happens.

Thursday, September 10, 2009

Hardware replacement, downtime

I think I have discovered the issues with the random shutdowns that took the site off line. As is the case with everything in life the solution to the problem is going to cost money and time.

I will be heading out to purchase some new hardware this afternoon and its installation will cause some downtime. If all goes as planned it will be less than an hour. I will give a 20-30 minute warning on RW before I take things down.

Google prodding still needed btw:

Conservapeida
Gish Gallop

UPDATE: Okay I lied. I am trying to get a few things done at once, and the completion time on task 1 is taking longer than I anticipated so the hardware replacement has been delayed till late tonight or tomorrow.

Monday, September 7, 2009

Poking Google

We got dropped by the search engines so I am going to poke a few of the important pages, and hope it prods re-indexing of the site:

Poe's Law
Andrew Schlafly
Conservapedia

If you have a blog or other dynamically updated website please consider poking these pages, or any of your favorite RationalWiki pages.

Sunday, September 6, 2009

All systems a go

Well RationalWiki should have all cylinders firing.

Sorry for the downtime, but there was naught I could do. I will be doing some thinking about how to expand remote administration of the site over the next few days. This probably means a mini-fund raiser will be in order but hopefully we can get something set up that will prevent this kind of disaster in the future.

Getting things back online now

Not a hundred percent yet.

Main thing to do is some troubleshooting and then apparently importing pages that have been made on other wikis.

Database is locked for now.

Stay tuned for updates.

Update: Everything is working. NX is going to import pages from other wikis and then open up the database for the grand reunion. I am going to get food an caffeine. We have some "rebuilding" to do though since google dropped us. I will post on RW later tonight to poke people for help getting us back to awesome again.

Sunday, August 23, 2009

When will we be back?

I have poked at the server every way I know how. It will have to wait till I get back.

I am arriving in Toronto at 3:45 pm EST on Sunday, September 6th. It is American Airlines flight 4068 if you want to track it to know when it lands.

Add an hour to get through customs and get my baggage. I then have to catch a bus back to Hamilton. I should arrive back at McMaster around 6 pm EST. Add 30 minutes to get back to my apartment. If it is a simple solution the server should be back on by 7 pm EST September 6th. If there is something more going on I will post here by 7pm to let everyone know.

It sucks that it went down. I am disappointed but its not the end of the world. Everything will be back and working well within 14 days from now.

I hope it is not to terrible for anyone, and might provide a nice break for a few people.

If this downtime has shown RW to be of particular importance to you it might be worth it to consider tossing some financial support our way. I will use that to invest in infrastructure to allow better remote maintenance options in the future.

Wednesday, August 19, 2009

Pretty much worse case scenario

As is obvious to most by now the site is inaccessible, however, the bot server appears to be working. This means it is not just an internet problem, but a potential issue with the server having shut down. If this is the case I doubt there is anything I can do for now.

Sunday, August 16, 2009

Server down

We are experience some difficulties with mysql connections at the moment which is causing the downtime.

I am trying to track down the problem as I type this.

*EDIT*

Site should be back up. Post here if you are unable to access it.

Friday, July 24, 2009

Intermitten down time

I am working on backup issues, and there maybe some intermittent downtime.

Saturday, July 11, 2009

Backup

I am running the backup.

I am close to figuring out what was causing the problems with the automated script, and so should have a scheduled backup ready to go soon.

I was thinking around 4 am EST, let me know somewhere if you disagree.

Tuesday, July 7, 2009

The loop continues

Backup again, then off to bed.

Here is my random/fun fact for tonight: Google drives about 5 times the traffic to our site that yahoo does. Windows live does about a third of yahoo. Other random search engines barely register.

Monday, July 6, 2009

Backup

This is getting repetitive right?

That's not a bad thing since the only other reason I would have to post here is something worse has happened. I am working on redoing the automated backup scripts, they were causing some problems, and until that is done the backup is manually done before I go to bed. Which is why it is not at a set time...yet. I will get it all automated soon enough and there will be a set "known" time.

Fun fact for the day: Wednesday and Thursday are by far the most popular days for RationalWiki. Saturday is the least. Almost twice as much traffic flows through on a Wednesday compared to Saturday.

Sunday, July 5, 2009

Rinse Repeat, backup alert

Nightly backup is running once again.

Fun fact of the day: Less than 25 percent of RationalWiki visitors used Internet Explorer to view our site for the month of June. Almost 10 percent of our traffic comes from Linux operation systems. Not your average web surfer!

Saturday, July 4, 2009

Guess what? Backup time!

Nightly backup is running once again.

It will make the site inaccessible for probably 10-15 minutes, and then a little slow for another 10-15 minutes. Everything should be back to normal in about half an hour.

Fun fact of the day: The text of our little wiki is now over 10 gigabytes in size. That's a lot of "talk talk talk talk" people!

Friday, July 3, 2009

Backup time again

Nightly backup is running once again.

It will make the site inaccessible for probably 10-15 minutes, and then a little slow for another 10-15 minutes. Everything should be back to normal in about half an hour.

Fun fact of the day: A google search for "esther hicks scam" or "esther hicks fraud" now returns RationalWiki's Esther Hicks article as a top rank. It is bringing in about 50-100 hits a day. Esther has a habit of attacking her attackers, maybe something interesting will come from this?

Thursday, July 2, 2009

Backup time

Backup script is running, said script will make the site inaccessible for 5-10 minutes, assuming nothing goes wrong.

Edit:

Backup done, everything should be back to normal till tomorrow. Night guys!

Wednesday, July 1, 2009

Welcome, and all is well

Just adding filler, everything is working nicely.

Here is a few interesting notes that people may or may not be aware of:

*RationalWiki runs off of a cable modem connection, with about 1000 kbps upload speed. Our dedicated IP address costs me an extra $30 month which is our only real "per month" charge anymore.

*RationalWiki now has two computers dedicated to it. The first is the main server, an Intel Pentium Dual Core 2.2 GHz processor, 2 gigabytes of ram, and 300 gigabytes of space. The second is a Intel Pentium 4 2 GHz processor, 1 gigabytes ram, and 100 gigabytes of space. The second machine is dedicated to hosting bots that run on RW. At the moment it is running wpbot, stubbot, and capturebot. If anyone has a bot they would like run on our "bot" server just let me know and I can check it out.

*Thanks to recent fund raising efforts we do have power backup finally. Which is a great thing since for most of June the fuses to my room were frequently blown. Luckily, you guys were saved the majority of it thanks to our battery backup.

*Our on site backup system is protected from fire and theft by being locked in a small fire proof case drilled into my floor, connected via a USB cable drilled in a hole in the back of the case.

*External peripherals, such as monitor, keyboards, all salvaged from equipment being tossed out of my lab. This includes the second "bot" machine that was just recently setup.

And there we go, that should cover an obligatory "content" requirements for the opening of a blog. Bookmark this site and refer back to it during periods when the site is inaccessible. If I am I aware of the problem and working on it I will post here.