Sunday, January 19, 2014

The Cold Equations of Soul-Crushing Lag

Yesterday, there was a bit of a commotion in HED-GP. Thousands of pilots attempted to engage in the largest battle in EVE history, and despite the best efforts of CCP, they were devoured by the monster that is Soul-Crushing Lag.

The dreaded Lag Monster has returned to New Eden more and more often in recent months, and it will continue to do so with increased frequency in the months and years to come. This post is my attempt to explain why I believe that even heroic technical achievements like Time Dilation will only put Lag into remission; to cure it requires some fundamental changes to the core game design of EVE.

As befits players of a game that has been described as Spreadsheets in Space, let us begin with some mathematics, to explain why lag can't be conquered by technical means. For most of you, especially those who are computer programmers, this will be basic stuff but bear with me. All the numbers given below were chosen simply to illustrate the problem.

The basic problem is this: "doubling the number of people in a fight more than doubles the amount of work the server needs to do."

To give a horribly simplified example (forgive me, anyone who's taken a 200 level CS course), consider a fight with 100 people; for each person in the fight, the server needs to update their position, execute their commands, and so on. That part of the computational load is roughly linear, so adding another 100 ships would roughly double the amount of work. But then the server has to tell everyone how the universe has changed, and everyone can potentially have a different view of the universe (consider: cloaked vs. uncloaked; information about the ships you have locked; watchlists; etc). So now instead of 100 ships each getting info about 100 ships, you have 200 ships each getting info about 200 ships; that's 4 times as much information. If you have 500 ships, that's 25 times as much; 1600 ships requires 256 times as much. All that data has to be organized and sent to the clients. Things get very bad, very fast; then they get even worse, even faster.

Now in the real world of extremely clever programmers, of which CCP has more than their fair share, there are a lot of things you can do to make things (to use the technical term) "less horrible". But what you can't do is get linear scaling; doubling the number of people in a fight will always more than double the amount of work the server needs to do.

So what can be done to address this problem? Well, there are several basic approaches, and CCP can use any or all of them; they are not either-or choices:
  • Use faster computers (aka "If brute force isn't working, you aren't using enough"). Unfortunately, because of design decisions made back in the early days of EVE development (decisions that I think were made for very good reasons, btw), computers are not getting faster in a way that benefits EVE as much as everyone would like. Time Dilation is actually an example of this approach -- when TiDi hits 10%, the server has 10 times as much time to process each tick of the game clock, so it's like it's running 10 times faster.
  • Make the code more efficient, so that it scales better. In our simple example above, 2x the number of ships meant 4x the work; if that gets reduced to 3x the work, then handling 1600 ships is only 81 times more difficult instead of 256 times as hard, and that turns out to roughly double the number of people that the server can handle before the lag monster appears. A lot of work -- extremely difficult work -- is being done in this area, and IMHO CCP needs to put even more resources into it (and not just because of lag, either).
  • Change the game so that fight sizes are naturally limited to sizes the servers can handle. By "naturally limited" I mean make changes such that effective fleet commanders will have sound tactical and strategic reasons to limit their fleet sizes and/or divide into sub-fleets with different objectives.
Let's examine these possibilities in more detail.

Time Dilation was introduced about 3 years ago, and has been in-game for just over 2 years. As explained above, it basically speeds up a server by a factor of 10. Yet in only 2 years, lag is back. Why? Well, to quote that original devblog, "Here's how I envision this working for a large engagement (say, 1600 or so)". Yesterday, there were over 3400 ships in HED-GP. Even worse, the meta has changed and the type of ships being used likely made the load even worse.

TiDi sped things up by 10x, and EVE players chewed through that in 2 years. While I expect that making the code more efficient will reap great benefits, particularly in chopping the peaks off lag spikes that occur when very "expensive" events occur (such as bridging), I don't think it's going to bring a further 10x improvement. If Team Gridlock proves me wrong, then I will be very impressed and will nominate them for the Galactic Institute's Prize for Extreme Cleverness, but even so, that's just another 2 years or so before EVE players start whining about lag again.

Four years ago, when I first ran for CSM, one of the planks of my manifesto concerned Lag. Here is what I wrote at the time:
While there are clearly many cute hacks that can (and will) reduce lag, the blunt fact of the matter is that such fixes are at best temporary fixes, because as soon as you defeat the lag-monster for N-player battles, the current design of the game encourages bringing extra people to the fight -- which means you have N+500-player battles, and lag returns. In other words, "Fleets expand to fill the lag available". As the EVE population grows, the problem will only get worse.
And it's even worse than what I said back then, because since the introduction of TiDi, the size of "large engagements" has increased much faster than the number of EVE subscribers.

Anyway, the tl/dr is that technical fixes, while wonderful and needed, just address the symptoms; they don't tackle the disease. The disease itself is simple: in EVE, like Soviet Russia, quantity has a quality all its own.

So what game design changes could be made to address this? Figuring that out is why CCP devs get paid the big bucks. But until they do, lag will never go away.


11 comments:

  1. So many people out there, who think 'fixing lag' is an easy thing. And that the people at CCP haven't taken all the easy steps.

    ReplyDelete
  2. Fundamentally there needs to be an in-game reason not to bring more than a few hundred ships. Cost of transport could be a major factor. For a ship that costs billions to manufacture, I don't think costing a couple tens of millions to move it across the galaxy is a bad thing.

    I think the addition of AoE attacks to more things will in the short term increase lag, but in the long term would help encourages bringing a fewer number of ships to engage. Part of the problem is the current state of logistics means that to overcome repping you need high combined alpha. That means you need to bring a lot of ships to blap things off the field. Making it so that things are easier to blap off the field in the case of logistics means that people won't be as encouraged to bring so many. It will have additional side effects, but there needs to be some factors bringing the total number of ships down.

    ReplyDelete
    Replies
    1. Titans were balanced when they first came out with the idea that economics would limit their use/numbers. If supercap proliferation has shown us anything, its that using ISK as a limiting/balancing factor is horrible game design.

      In terms of AOE, I kind of agree. Not sure we need new kinds of AOE weapons though, but what about adding more ships to the game that can launch bombs? Not that could warp cloaked mind you, just to regular ships, a new BS type perhaps... Just include a long module reactivation delay to them so its not totally OP.

      Perhaps add a couple bomb launcher slots to titans and see what happens. Then retask defender missiles to target bombs, and see what kind of meta develops.

      Introducing a remote rep stacking penalty might also work. Buffing remote cap transfer in some way to encourage local tanking might offset some of the rage that logistics fans would inevitably spew. I think its needed though, because every MMO ever made has had the same problem of Main Tank + Healer = WIN. The more heals you can just pile onto one perhaps, the more rediculous the DPS/alpha has to be to realistically take them out.

      Delete
  3. If the technical fixes and software patches have gone as far as they can go, the next step needs to be hardware. I admit I'm not totally familiar with CCP's hardware setup for EVE, but I have to imagine that they really need to be looking at a scalable server environment for fights like this - servers whose resources come online as the number of pilots in a system increases, and those resources can be reallocated dynamically depending on where things are happening.

    The reality, and a parallel to what you had to say in your article, is that lag will always increase as the number of players in the game increases. I doubt CCP wants to see the number of players decrease, so they need to start getting ahead of the curve.

    I'm also curious as to hear any suggestions you might have... that would be an interesting read.

    ReplyDelete
  4. Competing Objectives

    Corelin used this phrase to describe a way to fix the in-game meta instead of the out of game code/machinery, and I like it. The basic idea is to create a new normal for sovereignty battles where, to put it simply, fights and other ship-based activities between opposing forces need to happen in multiple locations at the same time. Corelin gave the example of having the system's the TCU, IHUB, and Station all run on the same timers, rather than in sequence, so they all have to be fought over at the same time.

    Personally, it'd take it further, and in directions that let you simulate the real-world factors of both terrain and supply/support chains. There have been some good suggestions along these lines, but the best of them basically boil down to making sovereignty contests more like the current Faction Warfare mechanics. Instead of a single timer revolving around a single point in single system, spread those timers out to a number of locations, with the majority of those sites within differently-size-restricted complexes. And I really do mean spread them out: to create the illusion of ‘defending and attacking supply and support lines,' you might even consider scattering those complexes all around the target system’s constellation. Regardless, though, the idea is that attackers (and defenders) ignore them at their peril, as a properly defended complex might set the entire offensive back by resetting vulnerability timers or removing vulnerability altogether.

    The size restrictions on those complexes are worth talking about as well. Size restrictions mean that some critical parts of the sov fight would be frigate, destroyer, cruiser, or 'all sub-caps' fights, and set up so that those complexes need to be handled at the same time as each other and at the same time as the 'main' battle.

    The upshot of this is that the meta of the game would need to change to effectively split up fleets, spread them out, and call for different tactics (and theoretically make multiboxing to get more ships on the field far less effective, since multiboxing a frigate, a cruiser, and a carrier in three different fights/fleets/comms/systems isn't going to appeal to many people).

    Obviously the whole sov process have to change, but it feels like the best direction for (a) more manageable per-system, per-fight load and (b) more interesting/dynamic Sov gameplay.

    ReplyDelete
    Replies
    1. That idea although in principle sounds good would actually cause more lag issues than solve, the more complex the sov system the more fighting it takes to hold therefore the more pilots involved in the Taking/saving of said sov, changing it to system wide timer objectives means more needed on each objective at the same time to attack/defend, changing to constellation wide or faction based means to control such wide space you would need even more pilots and no doubt multiple nodes to be able to retain the ensuing battle mechanics, there is no simple answer to lag but with the investment players are placing in there eve carriers it is currently the only real issue that players want solved

      Delete
  5. We already kinda know some of the possible game design changes that would address this but the issue is getting the player base to swallow what are clear nerfs.

    We can make it more difficult to move several regions, particularly by nerfing jumps. This was people 12 regions away wouldn't be able to dogpile into a huge brawl.

    Or we can change or remove reinforcement timers. The problem with that though is people might wake up one morning and find that PL ground their region during the night.

    ReplyDelete
    Replies
    1. You can't change game mechanics to stop big fleets you change it people will overcome it and things will carry on almost as before, besides the fact you build features into a game and say it is a sand box well the sandbox is more like a beach now growing ever bigger. restricting actions will pull people away from the game

      Delete
    2. Game mechanics simulate reality. For example war games came from strategy games like chess and lead soldiers intended to teach officers how to move troops in battle.

      In reality when there's a fight the entire armed forces of the combatants don't dogpile into a single field.

      Delete
  6. Instead of fixing lag, CCP should focus on developing real clones and spaceships. That way, reality can handle all of the processing problems.

    ReplyDelete
  7. http://foo-eve.blogspot.com/2014/01/my-paper-napkin-solution-to-soul.html

    TLDR :

    When incoming ships may overload the grid, create multiple clones of the grid; each with a 'beacon' and a weakened objective. Original target dies when a sufficient majority of the objective clone grid targets are destroyed.

    ReplyDelete