Tobold's Blog
Wednesday, September 20, 2006
WoW uptime

If you go out today and rent space on a server, the provider will most likely give you a guarantee of 99.9% uptime, that is just 9 hours per year. At the very least a server just hosting a non-critical business application is considered reliable at 99.5% uptime. In comparison World of Warcraft has at least 4 hours of scheduled downtime per *week*, 8 hours when a patch is being applied. So already just counting the scheduled downtimes, WoW's reliability is only 97%. Now add the unscheduled downtime to that score, and the reliability drops to somewhere around 95%, which isn't very good by any standard.

I never even understood what Blizzard needs 4 hours of scheduled maintenance for. When was the last time you performed scheduled maintenance on your PC? What *happens* during scheduled maintenance? Can't be a backup, because the backup has to be done continuously, otherwise unscheduled outages would lead to up to one week of rollback, and that never happens. Are they reinstalling the server software from a slow Bittorrent server every week? Or does maintenance just take 5 minutes, but they only have 1 technician, and he does it one server at a time?

I'm certainly not one to cry for a refund, I don't want 50 cents per month refunded, no thank you. But I sure think that as MMORPG move more into the mainstream, the level of uptime has to be increased to be in line with other online applications. And World of Warcraft still has a long way to go until it reaches that point.
Apparently you don't know much about huge, complex DB applications....

Every actions you take (sending in-game mail, picking-up an item, killing a mob etc.) leaves a trail in the DB. Once a week maintenance procedure is required to rebuild indexes, archive less needed data etc. W/o it performance would drop over time to unacceptable level....

Think of it as defragmentation procedure you should run once in a while on your PC. It takes time and makes PC totally unusable while it lasts.
I'm sorry Tobold, but that was a very stupid post. Don't talk about things you don't know anything about. Server maintenance is very hard, truely in WoW's dimensions - I administer servers by myself and know how much work this is.
I know databases are complicated beasts, but obviously other service providers manage to handle it. I am sure the databases of not only banks, but also companies like Amazon or Expedia are hugely more complicated than those of WoW. But if they manage over 99% uptime, why can't WoW?
In contrast to the second poster I do not think that this is "a very stupid post".

This is Tobold's Blog and not the "single source of truth"(tm). If he is wondering why Blizzard needs a 4 hour maintenance period then you can leave a comment telling him why Blizzard does.

I think your second statement, Tobold, is very true. If MMORPGs move into the mainstream they have to worry about uptime. WoW can afford unscheduled downtime because there is no real competition. People are very annoyed but will continue playing because they like the game. If there will be a real competitor to WoW which players like as much, then people will eventually move to the game with less downtime.
The reason is that the WOW software was either designed to need that kind of downtime or sort of evolved into this situation because it was unforeseen by the designers.

Apparently Blizzard polished the front-end but only on a relatively messy back-end that needs weekly cleanup. Its all about trade-offs: down-time vs. development time, back-end-performance vs. down-time, nice front-end features vs. effort on back-end to support them...

Regarding amazon or banks: They have a transaction based business, lost transaction = lost revenue. How many players quit paying their monthly fee because of 4h of downtime ?

Actually I don't necessarily assume amazon's database/application is more complex than a MMORPG's. They do not have the real-time requirements that a MMORPG has, the main databases remain fairly static and the front-end being web-based means scaling is a lot easier.
To the anonymous posters - there's no need to be rude. What you just did was shout the equivalent of "WTF? N00b! L2p!".

Does that impress you in WoW? No? Then imagine how unimpressive it looks here. And you don't have the excuse of being 14, by the sounds of it.

On the database complexity - whilst WoW doesn't have as many customers as Amazon or a major financial institution, what it does have is a hell of a lot more transactions. If I'm dealing with my bank, I'll maybe make 20-30 database calls total. If I play WoW for an hour, I'm probably making the system do that many calls a minute.

Worse than that, I'm altering the database structure a lot more. I maybe do that once or twice on Amazon or my bank's website. On WoW, every time I make a trade, bid on something, move an item in my bags, loot, disenchant, get mail, send mail, there's a good chance I'm sending an UPDATE or INSERT request. The density of those requests must be massive.
All the (reasonable) points on this post are correct, but what's been ignored so far is that WoW isn't just a database, it's an application. I think most of the maintenance falls on the application side of things. There are all sorts of intricacies involved in a custom client-server app, especially one that's as open-ended as WoW, that make it a challenge to promise any kind of high uptime. I imagine because they can't promise the full uptime (as Nick noted, diminishing returns) they simply go ahead and force the downtime to happen once a week; that way they only have to concern themselves with keeping the servers happy for another week, which gives them a bit more maintenance comfort.
Amazon has a lot more money riding on their uptime, and they have no "offpeak" hours.

Sure, you CAN do it, but those last few hours of uptime are the most expensive. It requires a whole new class of equipment, redundancy, etc.

Blizz does has offpeak hours, and it's not cost effective to get those last few hours of uptime.

As you say, you are not looking for your money back. it will only become cost effective for Bliz to get those last few hours if people DO start demanding their money back. That's the point where companies start spending money to increase uptime, until people stop demanding their money back.

it's all about the money.
yunk is 100% correct.

the downtime doesn't cost blizzard a cent so why pay to get rid of it?
When you see "99.9% uptime" quoted for services read the fine print. That generally does NOT include scheduled maintenance. Just unplanned downtime. As a result, WoW is probably not far off from that.
Also, consider the differences between 'system uptime', 'link uptime', and 'application uptime'. If all are 99% minimum, you'll suffer a minimum availability of 97%.
Most server/uplink providers offer a 99.9% availability, excluding scheduled maintenance.
I do agree that 4 hours a week seems like a lot. It seems like a lot to me, too ;) But the increase in cost - thus increase in your subscription fee - would (this is me guessing, obviously) not be worth those 4 extra hours.
Blizzard is balancing cost against performance, against cost. It's definately possible to have 99.99% uptime on the game servers - but would those 3.something extra hours a week be worth $20 to you?
Banks also do nightly batch processing, while a proxy handles the transaction flows during that time, updated in the following day's batch.
Another reason for the downtime window is to allow for problems to crop up and be dealt with without having to continually update the players as to when services will be restored - and have time to back out changes if stability can't be achieved. The 'riskier' the update, the larger the window provided. Payers may not like it, but they are reasonably assured that when the provider says the servers will be back up they will be able to log in.
Given the sheer numbers and scale behind WoW, unanticipated I might add, I'm surprised the downtime isn't more than it is.
Man nick is so right - it's all about cost structure, not technical feasability - as is so much in our world of computers. Often, the last 5% of reliability, availability, performance, or scalability takes 95% of the cost.

For example, I happen to know that Blizzard's game world is run on HP BL30p and BL35p servers (the 35's are Opteron based and what are largely being upgraded to.) This is highly reliable hardware - on par with the best you can get in the industry standard server space. But they also have a particular cost - more than consumer grade Dell servers but much cheaper than the Tandem NonStop servers used by most ATM networks. It was clear that Blizzard chose these servers because of a cost-benefit analysis (that I'm sure included the fact that developing for x86 Linux is a lot easier than developing for the NonStop OS.)

But to extend this to Tobold's question - for online gaming, when is the cost justified. There's clearly a decision already made that 4 hours per week of downtime is acceptable for this game at this time. How will that change, and what will cause it to change? Would 8 hours per week be unacceptable? 12? 24? What will cause us to find 4 hours unacceptable?

Frankly it's probably going to be driven by more of the 'instant on/always available' world we find outselves in. Today most people are completely unwilling to accept a 4 hours per week outage in telephone service (in fact the US federal government regulates this because of it.) And while I doubt many online here would accept 4 hours per week outage in our internet service there are plenty of people today that would find that a non-issue (I'm thinking of my parents.) Eventually we'll expect our ISPs to keep things running to the same level as the phone company (people moving to VOIP telephones will force that) today and that will ultimately extend to our online games as well.
Everyone appears to be trying to compare WoW database and maintenance to a banks. How about comparing WoW to other MMORPGs?

My memory is fading a bit but will try to remember some past games. UO as I recall had a daily reset but it was just a 5 minute server reboot. EQ did not need a daily or even weekly reset as I recall. It was taken down fairly often for patches but I think some times it would run a month straight. I recall even some of the big patches taking less than 4 hours for a given server. SB for a long time needed daily restarts and even that was not enough. It still was usually about 20 minutes and I don’t think is needed daily after they have finally gotten memory leaks and stuff cleaned up on the server side. I don’t recall right now the scheduled down time requirements of AC, AO, EQ2, Eve ect. I don’t think any required 4+ hours every single week.

So to rephrase Tobold’s question. What happens during WoW scheduled maintenance that the other massive multiplayer online role playing games don’t need to do every singe week for 4 hours?
Good question, but to be fair one should mention that SWG had *daily* scheduled maintenance of 1 hour. Then again SWG has always been a bug infested mess, so that isn't really surprising.
AC had a longish (iirc) downtime once a month to update the story, software, and to add new content. I don't remember how often the servers were scheduled to be shutdown for AO, EQ, and EQ2, but they certainly weren't four hours each week. I'm sure I'd remember that.

I lasted for two weeks on SWG, very disappointing game. I think it went down once when I was playing, and I didn't really care. ;-)
Nifty blog :D

EQ did have downtime each week for patching. I don't recall how many hours each server was down, as I have tried to block the trauma of playing EQ from my mind entirely. I do remember that beyond the normal weekly patches, the server I played on for 2 years was never very stable and crashed a lot, usually resulting in hours of additional downtime nearly weekly.

As far as why WoW has downtime - I too work in IT and yes, it is ghastly expensive to get that last bit of redundancy in place to avoid having to do so much downtime for maintenance. Vivendi probably has the profit margins to do it, but it's a game, not a financial company, or a hospital or some other kind of critical space - a game, so why not save the bucks and take it down one day a week.

That said, I am not sure where you get 4 hours from- maybe I should transfer to your server, LOL. I play on Alleria and our server is down every Tuesday for at least 6 hours, often as much as 8 or 9.
I still want to know what happens in the 4-8 hours of weekly maintenance. Is there some kind of fsck-like hairball operation that fixes errant data?

The discussion in this thread is reasonable and discusses why they should or should not aim for very high levels of uptime. If the server crashes once in a while, I don't find it too hard to swallow.

I do find it very odd that the system needs to be taken offline for 8 hours a week. I'm just kind of baffled as to what could take so long that needs to be done.
Well, it is annoying at times. The unscheduled downtimes I will take any day over a buggy game like Matrix online or SWG. They are trying the rolling restarts to clear data.

In the end, yes you would probably change your cable or internet provider if they were down 5% of the time. But they have a financial reason to put more money and effort into keeping their systems up. Competition. If your cable internet was down for 4 hours early Tuesday morning and when you looked at alternatives, the only other option was a 14.4 dial up connection, something tells me the only thing you would do is whine about it in a blog and keep using it.

WoW is dominating the MMO market. They have to balance the effect of how many people will switch to a different MMO, vs. how much the cost would be to implement quicker downtimes. My guess is that right now, it isn't worth it to them.

In the end they choose the time that the servers are least populated, and if you are complaining about a 4 hour window in a 168 hour week that you know ahead of time you can't play the game, I think you might be scheduling a little too much WoW in your life.
To the clowns quoting the uptime of Amazon you are actually mistaken.
Amazon employes a Publisher/Subscriber model where their website is using read only databases where by only small subsets of information are changing (quantities/volumes) which are handled as scheduled events.
WoW in every context is a high read/write from a database IO. As such a read only model doesn't work very well. Hence the required downtime.
I would be willing to pay 50% more to be able to play whenever I want to and not worry that the servers are down, etc. I am not sure how profitable Blizzard is but perhaps if they are quite profitable they should be doing more to provide a higher quality of service to their clients. Competition is of course the big equalizer in all of this, if there is enough demand for a more stable/higher available system someone will fill the void eventually.
Questioning their profitability is truly a moot point. If you consider they claim to have over (last I checked) 11 million subscribers. Based on that fact alone, let's consider that each person pays the $15/mo subscription cost (yes, I know, not likely as it's cheaper to pay for more months in advance). That equals out to $165 million each month at 11 million players, and I'm fairly certain that most of that is clearing their over-head, as they're not a huge operation (although you might think they are based on the sheer number of players). Take into consideration that fact, and I think they can justify the increased cost of keeping the servers up that extra ~5% to pamper their player-base. I think that this might be a part of the downfall of WoW, when a true competitor emerges and they've learned from the complaints of the WoW players, that keeping the servers up is a must in order to be competitive. Though, it has been stated by blue posters on the U.S. forums that they are looking into eliminating the weekly maintenance. That doesn't mean they're going to, just that they're looking into their options. I don't foresee this happening until some competition emerges and keeps their servers up 99.99% of the time, though.
Guys.. WoW's database does need maintenance, it is on par for complexity with Second Life, which uses the same type of database as OpenSim. I run an opensim region on one of the grids, and know that just to have someone move from one region to another takes nearly 20 database actions. even if WoW uses half that to move around the the world in game, that is still alot of traffic to the database... now... Second Life does NOT take a day every week and several unexpected restarts in between to keep things running, so why can't WoW keep the servers running? I see a need for a bit harder work.

As for the $0.50 (USD) that someone said that the day lost to the down time costs the user, lets do the math... that works out to pure profit for them, as in they provide no service to the user at that time of $5720000 if you count that WoW has 11.5 million users at last count and figure that on half of the patch Tuesdays I have seen in my playing turn into all day events.
I like the downtime once a week, and I think it helps the people who are "Addicted" to the game a day of rest for their mind. Even still they prob. go online, research boss fights, find new achievements to obtain, etc. I think wow should be down ALL day on maintenance day every week. But thats just my .02
Post a Comment

<< Home
Newer›  ‹Older

  Powered by Blogger   Free Page Rank Tool