TCP echo server, part 4


It's multithreading time

Looks like we're finally fine within a single thread. Now it's time to scale our server across multiple threads.

To achieve that we could run io_context::run on multiple threads. Doing so we will need to add some synchronization into our session class. It may seem that we just need to add strand into the session and wrap completion handlers into it. However, it won't work.

async_read and async_write are composite operations and they consist of internal algorithms and chains of corresponding socket asynchronous operations. Wrapping async_read and async_write completion handlers into a strand will synchronize their own handlers only, but not the underlying code which implements these free functions. Underlying socket completion handlers won't be synchronized. And these functions work in parallel on the same circular_buffer. And since we're going multithreading, they will access our buffer from different threads. To synchronize them properly we would need to wrap underlying socket asynchronous function handlers into our strand. We can't do that because, well, we can't modify Boost.Asio code.

We could overcome this by one of the following:

  • Implement async_read and async_write functions by ourselves so we could wrap socket handlers into our strand;
  • Modify circular_buffer replacing std::size_t with std::atomic<std::size_t> where it is necessary. Also we should fully review this class if it's really multithreading-ready.

A better way

However, there is a better solution. Remember io_context_group class from our Multithreaded execution, part 2 lesson. The best synchronization is no need for synchronization. Our TCP echo server doesn't have shared data, all sessions are isolated from each other. They don't do heavy calculations nor memory allocations. They supposed to have a relatively short lifetime so they won't heavily occupy some CPU cores while the others stay idle. Looks like our TCP echo server fits “N pairs of 1 io_context + 1 thread” multithreading model perfectly. Each session will work within a single thread, so no synchronization required at all!

So, all we need is to replace io_context with io_context_group and slightly modify some code. Now our main function looks like that:

int main(int argc, char* argv[])
    if(argc != 3)
        std::cout << "Usage: server [Port] [Threads]\n";
        return 0;

    io_context_group io_group(boost::lexical_cast<std::size_t>(argv[2]));
    server srv(io_group, boost::lexical_cast<std::uint16_t>(argv[1]));;

    return 0;

Well, this lesson was a short one. Soon we will implement TCP echo client which will help us to measure performance of every TCP echo server from previous lessons.

Full source code for this lesson:


TCP echo server — what a trivial task at a first glance. We even don't need an application-level protocol, we don't need to look into the data being transmitted at all, we just need to send it back to the client as-is. And surprisingly how many questions are there which you should take into consideration.


Despite of the work we've done, there are still things to improve. Testing the server manually with telnet is one thing, but hundred thousands of users in real life N-gigabit environment is something entirely different. A bit later we'll give it a good stress-test and see what can be further improved.

We need to go deeper

Rate this post:
Share this page:

Learning plan

Simple straightforward implementation and discussion of TCP echo server
First approach on improvement of TCP echo server implementation: making read and write work in parallel
Second approach on improvement of TCP echo server implementation: eliminating gaps and memory copying
34. TCP echo server, part 4
Third approach on improvement of TCP echo server implementation: multithreading
An implementation of a simple terminal server which you connect to with telnet and execute commands