Tobold's Blog
Wednesday, September 20, 2006
WoW uptime

If you go out today and rent space on a server, the provider will most likely give you a guarantee of 99.9% uptime, that is just 9 hours per year. At the very least a server just hosting a non-critical business application is considered reliable at 99.5% uptime. In comparison World of Warcraft has at least 4 hours of scheduled downtime per *week*, 8 hours when a patch is being applied. So already just counting the scheduled downtimes, WoW's reliability is only 97%. Now add the unscheduled downtime to that score, and the reliability drops to somewhere around 95%, which isn't very good by any standard.

I never even understood what Blizzard needs 4 hours of scheduled maintenance for. When was the last time you performed scheduled maintenance on your PC? What *happens* during scheduled maintenance? Can't be a backup, because the backup has to be done continuously, otherwise unscheduled outages would lead to up to one week of rollback, and that never happens. Are they reinstalling the server software from a slow Bittorrent server every week? Or does maintenance just take 5 minutes, but they only have 1 technician, and he does it one server at a time?

I'm certainly not one to cry for a refund, I don't want 50 cents per month refunded, no thank you. But I sure think that as MMORPG move more into the mainstream, the level of uptime has to be increased to be in line with other online applications. And World of Warcraft still has a long way to go until it reaches that point.
Apparently you don't know much about huge, complex DB applications....

Every actions you take (sending in-game mail, picking-up an item, killing a mob etc.) leaves a trail in the DB. Once a week maintenance procedure is required to rebuild indexes, archive less needed data etc. W/o it performance would drop over time to unacceptable level....

Think of it as defragmentation procedure you should run once in a while on your PC. It takes time and makes PC totally unusable while it lasts.
I'm sorry Tobold, but that was a very stupid post. Don't talk about things you don't know anything about. Server maintenance is very hard, truely in WoW's dimensions - I administer servers by myself and know how much work this is.
I think you're both wrong, or at least looking at this from the wrong angle. Firstly, to the above poster, it's more than possible to maintain high levels of uptime with DB-dependant applications. I work in I.T. for a massive financial organisation (around 5th biggest in the world at the last count). Now financial organisations have IT challenges that dwarf WoW's, and have stronger incentives for maintaining extremely high levels of uptime - key systems in my sphere must have at LEAST 99.9% availability, some aim for the four 9's - 99.99% or greater.

These systems are more complex than WoW by orders of magnitude, involving many mainframe plex's, countless DB instances of every flavour, glued together by a sticky mess of websphere-style middleware and software and hardware of every imaginable type. The customer base is in the region of hundreds of millions (Worldpay/Bibit), with billions of dollars processed daily.

Now the reason I quote these numbers is not to impress or denigrate the infrastructure/support staff Blizzard have put in place - far from it. The key point is risk/return. The difference between 95% availability and 99% is actually very large - it's perhaps twice as expensive to incorporate the necessary levels of redundancy for simply that extra 5%.

Now, in the case of my employer, paying out for that extra 5% availibility is justified because:
- impact of an outage is so huge (reputational damage and money literally 'lost').
- The cost is a drop in the ocean for profits of tens of billions a year.

But in the case of Blizzard, 4 hours downtime a week in an offpeak time (in contrast, the bank has no offpeak as it's transcontinental) is absolutely manageable, and frankly just not worth spending the money to improve. It's a cost effectiveness judgement, and knowing the costs intimately and having a fair idea at what their infrastructure might be, I think they're probably right.
Just to clarify, anonymous poster 2 got his post in as I was typing mine - the "both" I referred to was Tobold and the OP.

Use names you people! :)
I know databases are complicated beasts, but obviously other service providers manage to handle it. I am sure the databases of not only banks, but also companies like Amazon or Expedia are hugely more complicated than those of WoW. But if they manage over 99% uptime, why can't WoW?
In contrast to the second poster I do not think that this is "a very stupid post".

This is Tobold's Blog and not the "single source of truth"(tm). If he is wondering why Blizzard needs a 4 hour maintenance period then you can leave a comment telling him why Blizzard does.

I think your second statement, Tobold, is very true. If MMORPGs move into the mainstream they have to worry about uptime. WoW can afford unscheduled downtime because there is no real competition. People are very annoyed but will continue playing because they like the game. If there will be a real competitor to WoW which players like as much, then people will eventually move to the game with less downtime.
It's definitely not a question of technical competency or 'impossibility' - it's absolutely possible to go for very high levels of uptime with such an infrastructure.

The key point is that the last few percentage points of availability are MASSIVELY expensive to obtain, the reason being quite simple - most IT hardware and enterprise software is quite reliable as standard - even if WoW has no redundancy built into its infrastructure it would still be up 80-90% of the time. But in order to get 100% availability (or as close as is feasible), you've got to start doubling, tripling, or quadrupling up on EVERYTHING - databases will have to be on disk arrays that would be replicated real-time, multiple network interfaces, server clusters. All this must be hot-swappable. You get the picture...

Blizzard have made a judgement as to the cost of achieving higher availability, versus customer discontent at outage. To them (and I expect to anyone who looks at the actual numbers) it's just not worth shelling out for 99% uptime when the customer impact on the majority of the playerbase is minimal - and it IS minimal for most subscribers.
The reason is that the WOW software was either designed to need that kind of downtime or sort of evolved into this situation because it was unforeseen by the designers.

Apparently Blizzard polished the front-end but only on a relatively messy back-end that needs weekly cleanup. Its all about trade-offs: down-time vs. development time, back-end-performance vs. down-time, nice front-end features vs. effort on back-end to support them...

Regarding amazon or banks: They have a transaction based business, lost transaction = lost revenue. How many players quit paying their monthly fee because of 4h of downtime ?

Actually I don't necessarily assume amazon's database/application is more complex than a MMORPG's. They do not have the real-time requirements that a MMORPG has, the main databases remain fairly static and the front-end being web-based means scaling is a lot easier.
To the anonymous posters - there's no need to be rude. What you just did was shout the equivalent of "WTF? N00b! L2p!".

Does that impress you in WoW? No? Then imagine how unimpressive it looks here. And you don't have the excuse of being 14, by the sounds of it.

On the database complexity - whilst WoW doesn't have as many customers as Amazon or a major financial institution, what it does have is a hell of a lot more transactions. If I'm dealing with my bank, I'll maybe make 20-30 database calls total. If I play WoW for an hour, I'm probably making the system do that many calls a minute.

Worse than that, I'm altering the database structure a lot more. I maybe do that once or twice on Amazon or my bank's website. On WoW, every time I make a trade, bid on something, move an item in my bags, loot, disenchant, get mail, send mail, there's a good chance I'm sending an UPDATE or INSERT request. The density of those requests must be massive.
Excuse my continued posts, I'm at work and it's a slow day :)

I agree with the above post that you're probably making far more database transactions per user than in equivalent e-commerce applications. But the fact is, WoW does not compete with the sheer volume of users of certain financial applications, which run into hundereds of millions of users a day (e.g. worldpay).

In any case, I'm saying (for the 3rd time) that the impediments to high availability for WoW are financial, NOT technical. People certainly do casually underestimate the technical challenges, but the fact remains that Blizzard could provide superior availability if they felt it was worth the cost. But they don't - and knowing the likely costs I suspect they're right.
All the (reasonable) points on this post are correct, but what's been ignored so far is that WoW isn't just a database, it's an application. I think most of the maintenance falls on the application side of things. There are all sorts of intricacies involved in a custom client-server app, especially one that's as open-ended as WoW, that make it a challenge to promise any kind of high uptime. I imagine because they can't promise the full uptime (as Nick noted, diminishing returns) they simply go ahead and force the downtime to happen once a week; that way they only have to concern themselves with keeping the servers happy for another week, which gives them a bit more maintenance comfort.
Amazon has a lot more money riding on their uptime, and they have no "offpeak" hours.

Sure, you CAN do it, but those last few hours of uptime are the most expensive. It requires a whole new class of equipment, redundancy, etc.

Blizz does has offpeak hours, and it's not cost effective to get those last few hours of uptime.

As you say, you are not looking for your money back. it will only become cost effective for Bliz to get those last few hours if people DO start demanding their money back. That's the point where companies start spending money to increase uptime, until people stop demanding their money back.

it's all about the money.
yunk is 100% correct.

the downtime doesn't cost blizzard a cent so why pay to get rid of it?
The other thing we're not really considering is the inverse relationship between change and stability - that is, the more changes you introduce to a system, the less stable it becomes.

Extremely high availability (99.9%+) is possible only when you have a strict change management regime in place. This is fundamentally unsuited to an ever-developing application like WoW, and is another reason why a built-in maintenance window is actually vital.
When you see "99.9% uptime" quoted for services read the fine print. That generally does NOT include scheduled maintenance. Just unplanned downtime. As a result, WoW is probably not far off from that.
Also, consider the differences between 'system uptime', 'link uptime', and 'application uptime'. If all are 99% minimum, you'll suffer a minimum availability of 97%.
Most server/uplink providers offer a 99.9% availability, excluding scheduled maintenance.
I do agree that 4 hours a week seems like a lot. It seems like a lot to me, too ;) But the increase in cost - thus increase in your subscription fee - would (this is me guessing, obviously) not be worth those 4 extra hours.
Blizzard is balancing cost against performance, against cost. It's definately possible to have 99.99% uptime on the game servers - but would those 3.something extra hours a week be worth $20 to you?
Banks also do nightly batch processing, while a proxy handles the transaction flows during that time, updated in the following day's batch.
Another reason for the downtime window is to allow for problems to crop up and be dealt with without having to continually update the players as to when services will be restored - and have time to back out changes if stability can't be achieved. The 'riskier' the update, the larger the window provided. Payers may not like it, but they are reasonably assured that when the provider says the servers will be back up they will be able to log in.
Given the sheer numbers and scale behind WoW, unanticipated I might add, I'm surprised the downtime isn't more than it is.
Man nick is so right - it's all about cost structure, not technical feasability - as is so much in our world of computers. Often, the last 5% of reliability, availability, performance, or scalability takes 95% of the cost.

For example, I happen to know that Blizzard's game world is run on HP BL30p and BL35p servers (the 35's are Opteron based and what are largely being upgraded to.) This is highly reliable hardware - on par with the best you can get in the industry standard server space. But they also have a particular cost - more than consumer grade Dell servers but much cheaper than the Tandem NonStop servers used by most ATM networks. It was clear that Blizzard chose these servers because of a cost-benefit analysis (that I'm sure included the fact that developing for x86 Linux is a lot easier than developing for the NonStop OS.)

But to extend this to Tobold's question - for online gaming, when is the cost justified. There's clearly a decision already made that 4 hours per week of downtime is acceptable for this game at this time. How will that change, and what will cause it to change? Would 8 hours per week be unacceptable? 12? 24? What will cause us to find 4 hours unacceptable?

Frankly it's probably going to be driven by more of the 'instant on/always available' world we find outselves in. Today most people are completely unwilling to accept a 4 hours per week outage in telephone service (in fact the US federal government regulates this because of it.) And while I doubt many online here would accept 4 hours per week outage in our internet service there are plenty of people today that would find that a non-issue (I'm thinking of my parents.) Eventually we'll expect our ISPs to keep things running to the same level as the phone company (people moving to VOIP telephones will force that) today and that will ultimately extend to our online games as well.
I agree with Nick. Those last few points of reliability percentage are very, very expensive and hard to justify in a business environment where it really doesn't matter. How many people quit wow because it's "only" up 95% of the time?

Wow is also cheap compared to other solutions. It costs me thousands each month to maintain stacks of servers in a data center where my uptime guarantee is 5-9's but only a few hundred for colo space without a good SLA.
Nick, WoW is a transcontinental application, it's just that the players from outside America playing on American servers are not a large percentage of the population.

"offpeak" maintenance to them, is 1/7th of a casual players opportunity to play.

Blizzard seem to have problems managing simple database applications such as the official WoW forums, which are reguarly down for unscheduled maitenance. They've currently been unavailable now for over 24 hours.

While the frontend client and game are high quality products, you can't help but pick up on a "lump it or quit" attitude towards various problems customers experience when trying to play the game.

This is only possible while there is no alternative, and it's allowed Blizzard to virtually mint money.

If alternatives to WoW appear, will customers stay loyal next time there is a 36 hour scheduled outage, or a casual gamer reaches the level cap, and looks doubtfully at "raid or die" end-game?
"Nick, WoW is a transcontinental application, it's just that the players from outside America playing on American servers are not a large percentage of the population."

Yes, you are right of course, but the general point stands. Unless you're an Australian forced (until recently) to play on US servers.

I do believe that the website is tied into the account management and character management systems, so it's not simply a case of them being unable to manage a website. Although I do agree, the US forums are in a right state, you can barely get on them these days...

That said, I do think there's a tendency for people like us (by that I mean people who not only play WoW, but like to write about and discuss it as well) to overplay the effect such downtime has on the playerbase. I think you'll find most people don't care too much - the player demographic is decidedly skewed towards the casual. As such, I don't think WoW's stability issues have even come close to the point where it really becomes much of an issue - as you infer, stability is considered by players, but only after fun and cost.
Everyone appears to be trying to compare WoW database and maintenance to a banks. How about comparing WoW to other MMORPGs?

My memory is fading a bit but will try to remember some past games. UO as I recall had a daily reset but it was just a 5 minute server reboot. EQ did not need a daily or even weekly reset as I recall. It was taken down fairly often for patches but I think some times it would run a month straight. I recall even some of the big patches taking less than 4 hours for a given server. SB for a long time needed daily restarts and even that was not enough. It still was usually about 20 minutes and I don’t think is needed daily after they have finally gotten memory leaks and stuff cleaned up on the server side. I don’t recall right now the scheduled down time requirements of AC, AO, EQ2, Eve ect. I don’t think any required 4+ hours every single week.

So to rephrase Tobold’s question. What happens during WoW scheduled maintenance that the other massive multiplayer online role playing games don’t need to do every singe week for 4 hours?
Good question, but to be fair one should mention that SWG had *daily* scheduled maintenance of 1 hour. Then again SWG has always been a bug infested mess, so that isn't really surprising.
AC had a longish (iirc) downtime once a month to update the story, software, and to add new content. I don't remember how often the servers were scheduled to be shutdown for AO, EQ, and EQ2, but they certainly weren't four hours each week. I'm sure I'd remember that.

I lasted for two weeks on SWG, very disappointing game. I think it went down once when I was playing, and I didn't really care. ;-)
Nifty blog :D

EQ did have downtime each week for patching. I don't recall how many hours each server was down, as I have tried to block the trauma of playing EQ from my mind entirely. I do remember that beyond the normal weekly patches, the server I played on for 2 years was never very stable and crashed a lot, usually resulting in hours of additional downtime nearly weekly.

As far as why WoW has downtime - I too work in IT and yes, it is ghastly expensive to get that last bit of redundancy in place to avoid having to do so much downtime for maintenance. Vivendi probably has the profit margins to do it, but it's a game, not a financial company, or a hospital or some other kind of critical space - a game, so why not save the bucks and take it down one day a week.

That said, I am not sure where you get 4 hours from- maybe I should transfer to your server, LOL. I play on Alleria and our server is down every Tuesday for at least 6 hours, often as much as 8 or 9.
I still want to know what happens in the 4-8 hours of weekly maintenance. Is there some kind of fsck-like hairball operation that fixes errant data?

The discussion in this thread is reasonable and discusses why they should or should not aim for very high levels of uptime. If the server crashes once in a while, I don't find it too hard to swallow.

I do find it very odd that the system needs to be taken offline for 8 hours a week. I'm just kind of baffled as to what could take so long that needs to be done.
Well, it is annoying at times. The unscheduled downtimes I will take any day over a buggy game like Matrix online or SWG. They are trying the rolling restarts to clear data.

In the end, yes you would probably change your cable or internet provider if they were down 5% of the time. But they have a financial reason to put more money and effort into keeping their systems up. Competition. If your cable internet was down for 4 hours early Tuesday morning and when you looked at alternatives, the only other option was a 14.4 dial up connection, something tells me the only thing you would do is whine about it in a blog and keep using it.

WoW is dominating the MMO market. They have to balance the effect of how many people will switch to a different MMO, vs. how much the cost would be to implement quicker downtimes. My guess is that right now, it isn't worth it to them.

In the end they choose the time that the servers are least populated, and if you are complaining about a 4 hour window in a 168 hour week that you know ahead of time you can't play the game, I think you might be scheduling a little too much WoW in your life.
To the clowns quoting the uptime of Amazon you are actually mistaken.
Amazon employes a Publisher/Subscriber model where their website is using read only databases where by only small subsets of information are changing (quantities/volumes) which are handled as scheduled events.
WoW in every context is a high read/write from a database IO. As such a read only model doesn't work very well. Hence the required downtime.
I would be willing to pay 50% more to be able to play whenever I want to and not worry that the servers are down, etc. I am not sure how profitable Blizzard is but perhaps if they are quite profitable they should be doing more to provide a higher quality of service to their clients. Competition is of course the big equalizer in all of this, if there is enough demand for a more stable/higher available system someone will fill the void eventually.
Questioning their profitability is truly a moot point. If you consider they claim to have over (last I checked) 11 million subscribers. Based on that fact alone, let's consider that each person pays the $15/mo subscription cost (yes, I know, not likely as it's cheaper to pay for more months in advance). That equals out to $165 million each month at 11 million players, and I'm fairly certain that most of that is clearing their over-head, as they're not a huge operation (although you might think they are based on the sheer number of players). Take into consideration that fact, and I think they can justify the increased cost of keeping the servers up that extra ~5% to pamper their player-base. I think that this might be a part of the downfall of WoW, when a true competitor emerges and they've learned from the complaints of the WoW players, that keeping the servers up is a must in order to be competitive. Though, it has been stated by blue posters on the U.S. forums that they are looking into eliminating the weekly maintenance. That doesn't mean they're going to, just that they're looking into their options. I don't foresee this happening until some competition emerges and keeps their servers up 99.99% of the time, though.
Guys.. WoW's database does need maintenance, it is on par for complexity with Second Life, which uses the same type of database as OpenSim. I run an opensim region on one of the grids, and know that just to have someone move from one region to another takes nearly 20 database actions. even if WoW uses half that to move around the the world in game, that is still alot of traffic to the database... now... Second Life does NOT take a day every week and several unexpected restarts in between to keep things running, so why can't WoW keep the servers running? I see a need for a bit harder work.

As for the $0.50 (USD) that someone said that the day lost to the down time costs the user, lets do the math... that works out to pure profit for them, as in they provide no service to the user at that time of $5720000 if you count that WoW has 11.5 million users at last count and figure that on half of the patch Tuesdays I have seen in my playing turn into all day events.
I like the downtime once a week, and I think it helps the people who are "Addicted" to the game a day of rest for their mind. Even still they prob. go online, research boss fights, find new achievements to obtain, etc. I think wow should be down ALL day on maintenance day every week. But thats just my .02
Post a Comment

Links to this post:

Create a Link

<< Home
Newer›  ‹Older

  Powered by Blogger   Free Page Rank Tool