TCP echo server, part 3


Let's start from discussing disatvantages of the server's implementation from the previous lesson. If it's been a while since you've read it then I advice you to re-read it and refresh your memories.

A very simple explanation of that implementation is this: the server reads data all the time, and writes it back to the client inside read completion handler, but only if the previous write operation isn't currently in progress.

The problem

So, what's wrong about the current implementation? Consider the following chain of events:

  1. The client sends 10,000 bytes of data to the server;
  2. The server receives 10,000 bytes of data and scedules async_write to send it back to the client;
  3. The client is starting to send 1,000,000 bytes of data to the server;
  4. The client receives 10,000 bytes of data; by this time 50,000 bytes of data from the previous step is delivered to the server;
  5. The server could have scheduled async_write for those 50,000 bytes of data right away inside on_write handler. However the server won't do that until it receive the rest 950,000 bytes of the second packet.

Take a look at the waterfall diagram. Consider horizontal axis as a time axis:

You can see a big gap between first and second server replies. In a better world a server could've started the second reply right away because it already has first 50,000 bytes of the next shout. And the diagram could've looked like this:

In real life environment it's very unlikely that 1MB size packet could slip through within a single asynchronous task. I've made 1MB example just to illustrate the idea. In real life it most likely that you'll see many small gaps instead of a single big one. So, the issue would still be there. Just not as much obvious.

Things to improve

So, the next step of improvement is to eliminate these gaps mentioned above. And here's how we'll do this:

If there is some data received, the server should start sending it to the client in both handlers, on_read and on_write. And it should do this even if it's still receiving the data from the client, i.e. its async_read task is still in progress.

Another reason to do so is to eliminate any requirements for the client on how it should transmit data so it won't stuck. We've taken a look on this issue in the end of the previous lesson.

To achieve both of these we should implement some kind of pending queue. Read operation should push the data into the queue, while write operation should pop it from the queue and send to the client. Therefore, the server will be ready to read and write at any suitable moment.

However, we can do even better. Push the data into the queue means copy the data. Let's get rid of that as well. We won't copy the data anywhere. We will send it from the same location where it was stored when read.

But how could we do that? — you may ask. We can't touch the buffer while the asynchronous operation is in progress! And you will be right, we can't. That's why we're going to invent our own buffer 😈

The solution

We will implement a buffer according to Dynamic buffer requirements, version 1

Essentially, we need a buffer, which member function's prepare and data results aren't invalidated after commit and consume member functions call. So, the underlying buffer should never reallocate, resize or move. And we're going to achieve that with a circular (or a ring) buffer concept.

Circular buffer is a flat buffer with two additional values: head and tail. The value of head is an index in this buffer where the data begins, and tail is an index where it ends. To push data into the buffer you copy it into the buffer starting from the tail index. When you reach the end of the buffer, you continue from its beginning. To pop data out of the buffer you just need to add some value to its head index. Both head and tail values should be looped before use them as buffer's index. To loop their values you use modulo operation.

template <std::size_t Capacity>
class circular_buffer

Nice start, isn't it? First, data members:

std::array<char, Capacity> buffer;
std::size_t head = 0;
std::size_t tail = 0;

So, it's just std::array and two additional values. So simple. Next, to comply with requirements, we should define the following member functions: size, max_size, capacity, data, prepare, commit and consume. Let's start from the most obvious:

constexpr std::size_t max_size() const
    return Capacity;

constexpr std::size_t capacity() const
    return max_size();

Maximum buffer size is Capacity template parameter, and the capacity of the buffer is its maximum size. I hope it doesn't sound confusing.

Next part:

std::size_t size() const
    return tail - head;

void commit(std::size_t n)
    tail += n;

void consume(std::size_t n)
    head += n;

Still pretty simple, huh? Before continue, we need to define types for const_buffers_type and mutable_buffers_type buffer sequences for our circular_buffer. And here's how we do that:

using const_buffers_type = boost::container::static_vector<io::const_buffer, 2>;
using mutable_buffers_type = boost::container::static_vector<io::mutable_buffer, 2>;

Why static_vector and why its capacity is 2? Because we're going to face two different cases: the first case is when the requested buffer view wasn't looped over the underlying buffer, and the second case is when it was. Just like that:

Circular buffer

So, prepare and data functions should return either flat or looped memory range. The flat one is presented by a single buffer view, and the looped one is presented by a buffer sequence of two buffer views. And we're using static_vector to avoid heap allocations, so it could work blazing fast!

And that's the rest of the functions required:

auto prepare(std::size_t n)
    // Boost.Asio Dynamic buffer throws std::length_error in this case, so we'll do the same
    if(size() + n > max_size())
        throw std::length_error("circular_buffer overflow");

    return make_sequence<mutable_buffers_type>(buffer, tail, tail + n);

auto data() const
    return make_sequence<const_buffers_type>(buffer, head, tail);

make_sequence is a scary function which composes buffer sequence from a given raw indices range. Download full source code for this lesson to see how it is implemented.

Unfortunately, Boost.Asio I/O free functions work with non-owning buffer views only, which means that they hold a copy of a given object. The only exception is boost::asio::streambuf class. So, we can't pass our circular_buffer into those functions as-is — we also need to implement a view of our buffer — circular_buffer_view which holds a reference to the buffer and proxies all of it methods.

Putting all together

So, let's get back to our echo server. Now we can modify our session class so it could work according to the ideas from this lesson. Data section now looks like this:

tcp::socket socket;
bool writing = false;
circular_buffer<65536> buffer;

Read and write function look almost the same:

void read()
    // Schedule asynchronous receiving of a data
    io::async_read(socket, make_view(buffer), io::transfer_at_least(1), std::bind(&session::on_read, shared_from_this(), _1, _2));

void write()
    writing = true;
    io::async_write(socket, make_view(buffer), std::bind(&session::on_write, shared_from_this(), _1, _2));

Where make_view is a helper function which makes circular_buffer_view from a circular_buffer reference. Note that both functions work on the same buffer now.

on_read function hasn't changed since the previous lesson, so no need to repeat it here. And now — the showtime! Look at the on_write function. Now it also can schedule next async_write operation even if async_read is still in progress:

void on_write(error_code error, std::size_t bytes_transferred)
    writing = false;


So, async_write function writes data from memory range located from head till tail, and async_read reads data into memory range located from tail till head. All within the same underlying buffer.

So, no more data transmission restrictions, no more gaps, no allocations, no redundant data copying. We're damn good today.

In the current lesson session's circular_buffer member parametrized with a value of 65,536 bytes size. In real life such parameter should be configurable from the outside. Which means that this parameter should be run-time, not compile-time. Simple replacement of std::array with std::vector as internal buffer's type means one additional heap allocation per session. Here is your homework: invent some solution which has runtime-configurable buffer size and doesn't cause additional heap allocation.

Source code for this lesson has grown, so it's time to split it into multiple files. Don't worry, no nerdy makefiles applied. You still have to compile just a single .cpp file. Learn how does it work before continue because there's still something to improve.

Full source code for this lesson:

Rate this post:
Share this page:

Learning plan

A short notes on Boost.Asio server application quality issues
Simple straightforward implementation and discussion of TCP echo server
First approach on improvement of TCP echo server implementation: making read and write work in parallel
33. TCP echo server, part 3
Second approach on improvement of TCP echo server implementation: eliminating gaps and memory copying
Third approach on improvement of TCP echo server implementation: multithreading
An implementation of a simple terminal server which you connect to with telnet and execute commands