Building multiplayer HTML5 games – Part 1: A History
Along with the recent performance upgrades in the major browsers’ implementation of the HTML5 canvas, there is one other big piece of the puzzle falling into place that will provide HTML5 games with an environment that allows them to look and feel just as much like a “real game” as a game developed for Xbox Live Arcade or Steam, particularly where multiplayer is concerned. That piece is the WebSocket, which finally provides web applications with genuine realtime 2-way communication with game servers and/or other devices. But before we get to a detailed discussion of WebSockets, let us take a look at the history of networking as it relates to games and see how we got to where we are today.
In the beginning, there were two game clients who wanted to talk to each other. My personal first experience with this was a little first person shooter called Doom. The clients each ran their own copy of the game and in multiplayer mode, each copy of the game sent messages back and forth over the network so you could see where the other players’ little guys were running around and could try to blow them up before you got blown up. You know, the usual.
When Blizzard Entertainment first set up Battle.net to allow Diablo players to find each other over the Internet and play together, they followed a similar simple networking model. Each client ran a copy of the whole game and sent sync up messages to the other players so you could keep track of what your friends were doing. As long as everyone followed the rules, this was fine. However, it quickly became clear that since each client was the final authority of the game running on that client, it was very hard to catch cheaters who hacked their local copy of the game to give their character an unfair advantage.
Enter the client/server model. In this system, the game is designed from the ground up not to trust the client. Since clients can be hacked, servers are now used to run the central game processing, with clients only having very limited control over their environment. The messages a client sends to the game world become severely restricted in this environment as well. Take, for example, a game where a player is supposed to only move up to three squares a turn. Where before, in an environment where a client might broadcast to the other players in the game, “My player is now in location 200, 200!” and the rest of the clients simply accepted this message, a hacker could easily bypass the three-square-max rule on a hacked client where the rule was not enforced. In a client/server environment, the server verifies all messages that are sent in by the client. In this way, a hacked client that tries to bypass the three-square-max restriction would get caught by the server when the client attempts to claim their player is in a location that should be impossible for the player to reach in one turn. At the very least, the illegal message can be rejected by the server and never passed on to the other clients playing the game.
Battle.net took on just such an enforcer role for Diablo 2, which is why players were given the choice to play Battle.net-only characters, who could be considered trusted by other players to have never been hacked, or they could play “local” characters with whom Battle.net did not play a watchdog enforcer role, and thus the characters may or may not have been given unfair advantage through hacking the game. Even the Blizzard behemoth World of Warcraft, at its core, uses a similar model, though hundreds of clients are allowed to connect to a World of Warcraft server at once (actually several different servers make up a World of Warcraft game world, which is why players see a loading screen when they move from continent to continent in the game–the loading screen masks the hand-off of the character across different game servers).
On the HTML side of the world, client/server communication was around long before it became clear that client/server networking was required for trustworthy multiplayer gaming. The basic form of communication between a web page and its web server has always been something called “HTTP”, which is designed around the core idea that the web browser will ask for a single resource–a web page, an image file, a video stream, etc–then hang up the connection after the resource is delivered. In the course of loading this blog post, your web browser issued several HTTP requests: one for the post itself, and one for each image embedded in the post, as well as some extra requests invisible to the reader.
As it turns out? For games, although you have the same client/server architecture the multiplayer game world migrated to, this is a terrible system. Although it already has a server in place ready to be used as the game enforcer to protect players from cheaters using hacked clients, there is a fundamental problem in the “hang up after every request” part of the HTTP design. For multiplayer games, you don’t want to hang up. Ever. You want to keep that line open so that you can be in a constant state of communication with the server: keeping it up to date with what your client is doing and getting back constant updates on what the other clients are doing. In HTTP-land, that means a whole lot of wasted bandwidth as clients constantly set up new connections to provide new updates to the server and check for updates from other players. It also means web servers have to set up and tear down a lot of connections that may not actually contain any data if nothing has happened in the game since the last time the client checked.
TL;DR: HTTP was never intended for this kind of communication.
Which leads us, at last, to WebSockets: a new protocol that allows web pages to open a connection to a server and keep that connection open, both for sending and receiving data, for long periods of time. In my next post, we will explore WebSockets in detail and see what options HTML5 developers have today to leverage this new technology.