I wouldn’t have wanted to be an operations guy at Turbine this past weekend. From about 1PM (US/Eastern) to about 8PM, all of the customer facing bits at Turbine were inaccessible as far as I could tell.
I noticed the outage when I came back to my desk from a short break and my lotro screen informed me that I had been disconnected. I tried to connect again but as you can guess, it was a futile attempt. Being an ops guy, I got curious about the extent of the outage. I went to all of the Turbine sites that I know of; lotro.com, ddo.com, and the Ascheron’s call site. All were unavailable. All the sites timed out while I was trying to connect. I couldn’t ping any of them either, but that could be dropped by the firewall and doesn’t say much. After a few minutes, I couldn’t even do a host lookup on the sites (web servers have to do this to find your site). This meant that even their DNS was down.
I went to a nice lunch with my wife for a couple of hours, and the whole of Turbine was still down when I came back. A couple of hours later, signs of life appeared. The maintenance page was up. One of the sites was throwing mysql errors. This was a data error, not a connect error at first. Later, the site threw connection errors, so I assume they brought down MySQL for repair.
At first, I thought this was a network outage. Maybe their ISP had an issue. Maybe their core router blew up. I’m actually leaning toward some sort of power failure because of the database issues I saw. Usually, a data center has redundant power, so failure would be remarkable.
All of this was very interesting to speculate on when I checked in on the sites, but I’m sure several poor operations guys were under a ton of stress on a Sunday… Probably their busiest day of the week… with the highest revenue…. mostly lost.