How Socket.Select() saved our TCP server

Recently at my job we released the multiplayer update for our game First Strike. That was my first major experience with multiplayer and networking in general, but it in the end it played out very nicely. We were able to network our core gameplay pretty quickly thanks to the architectural decisions we made earlier. The most challenging and time-consuming part was the lobby: a service responsible for matchmaking players and gathering them in rooms. This will be the topic of this article.

In the later stage of the development we decided to run load-tests on the lobby, to see if it performs well under heavy load. Making this happen was another interesting experience, which I might write another article someday. The idea was to have multiple “bots” logging into the lobby and simulating activities: making rooms, entering the matchmaking queue and so on. We planned to spawn as many of those bots as we can and see what happens to the lobby.

Did the lobby handle it well? Oh boy, it failed miserably.

Roughly around 400 bots connected to the lobby, it performance dropped severely and we noticed a significant lag. Logging in to a virtual machine, we saw 100% CPU load. Something was clearly wrong. After a bit of investigation we discovered a problem: it boiled down to how we handled TCP connections. Here is the snippet of code that illustrates the core of the issue:

public void OnPlayerConnected(Socket playerSocket)
{
	var playerThread = new Thread(() => ServePlayerConnection(playerSocket));
	
	//....
}

private void ServePlayerConnection(Socket playerSocket)
{
	var buffer = new byte[512];
	while(true)
	{
		int receivedBytes = playerSocket.Receive(buffer);

		//...
	}
}

If you’re already laughing, well, this is our first TCP server. We were young and naive back then. If you don’t see a problem in this piece of code, that’s fine. Let me explain.

The problem here is this line: var playerThread = new Thread(() => ServePlayerConnection(playerSocket));. For each player that connects to the lobby we make a separate thread to serve their connection. At the time, it looked like an obvious solution to us: how would we handle multiple sockets at the same time without using multiple threads? Socket.Receive() is a blocking call after all.

What is wrong with this is that even a multi-core processor has pretty limited capabilities of actually doing work in parallel. Simply speaking, you can have as many threads running in parallel as you have processor cores. 1 Once you make more threads than you have cores, they no longer run truly in parallel. Since the CPU doesn’t have enough cores to run them all simultaneousely, it needs to make a time schedule. Each thread receives some amount of time in which it can do it’s job on a CPU core. Once the time is over, the CPU suspends the thread, saves it’s state (content of registers), and starts another thread, after loading its state into registers if it was suspended previously. This is called a “context switch” and it’s a pretty expensive operation for the CPU. And the more threads there are, the less time they get to run on a limited number of cores and the more context switches the CPU has to perform. Eventually, this added cost becomes heavier than the actual benefit we get from having multiple threads and the application starts to lag severely.

Okay, if we can’t use a thread for each socket, how do we handle them then? One solution would be to leverage the async socket API like Socket.ReceiveAsync(). This should work, but in my opinion async code in C# tends to get messy, especially when there many objects to keep track of. After some research we discovered another way of handling multiple sockets. Meet Socket.Select().

The idea of Socket.Select() is to wait until one of the specified sockets can be interacted with: for reading, writing, or for error.

In the UNIX world there is a system call select() which works in the same way, but with arbitrary file descriptors, not just sockets. And since in UNIX “everything is a file”, it makes it a lot more flexible tool for dealing with IO. However, on Windows it’s implemented only for sockets, so I assume this is the reason why .NET doesn’t provide a more generic UNIX’y version.

It is defined like this:

public static void Select (IList checkRead, IList checkWrite, IList checkError, int microSeconds);

The way to use it is the following:

  1. Create a list of sockets you need to read from.
  2. Ivoke Socket.Select() with our list as a first argument. The second and third argments can be null. The last one is timeout; provide -1 for indefinite time.
  3. At this point the thread we are running on will suspend until at least one of the sockets from the list has data to read.
  4. After the method returns, our list is altered so that it contains only those sockets which did receive some data.
  5. For each of these sockets, it is guaranteed that the next Socket.Receive() will be non-blocking.
  6. Once you have read the information from these sockets, repeat the sequence until there is at least one socket connected.

Here is the snippet which illustrates the algorithm:

var readSockets = new List<Socket>();
var receiveBuffer = new byte[512];

while (true)
{
	readSockets.Clear();
	foreach (Socket socket in _allSockets)
	{
		readSockets.Add(socket);
	}

	//Socket.Select thows an exception if the list is empty
	if (readSockets.Count == 0) 
	{
		Thread.Sleep(1000); 
		continue;
	}

	Socket.Select(readSockets, null, null, -1);

	foreach(Socket socket in readSockets)
	{
		int bytesRead = socket.Receive(receiveBuffer); //Non-blocking!

		//...
	}
}

At this point we have code that is able to efficiently handle as many sockets as needed all on a single thread! As an added benefit, if the current load is low and not much is going on, our thread will just sleep and not consume CPU resources at all. To further improve on this, we can actually spawn more threads (as many as we have CPU cores) and have each one of them run this little Socket.Select() routine in parallel.

Having applied all of this, we ran our benchmarks again. At the same 400 bots scenario, the processor load was little bit above 1%! This is a huge success! Later we reviseted the virtual machine a few times more with the actual players connected to it and it was chilling at 1-2% of CPU load.

Indeed, Socket.Select() saved our TCP server.


  1. Being more precise, as many as you have logical cores. Here is the nice article explaining the subject: https://techgearoid.com/articles/difference-between-physical-cores-and-logical-processors/ ↩︎