boost coroutine server crashes when writting data to client

Question

boost coroutine server crashes when writting data to client

337 Views Asked by aj3423 At 10 November 2019 at 20:02

I made my server based on boost coroutine echo server example, simply receives and writes back some data. It crashes when writing data to client, and more strangely, it only crashes when using mutiple cores.

Here's the server, it reads 4 bytes and write back "OK", within 1 second as timeout:

#include <winsock2.h>
#include <windows.h>

#include <iostream>
using namespace std;

#include <boost/thread/thread.hpp>
#include <boost/asio.hpp>
#include <boost/asio/spawn.hpp>
using namespace boost;
using namespace boost::asio;
using namespace boost::asio::ip;

#define SERVER_PORT 1234
#define DATA_LEN_4 4

#define TIMEOUT_LIMIT 1 // second

struct session : public std::enable_shared_from_this<session>
{
    tcp::socket socket_;
    boost::asio::steady_timer timer_;
    boost::asio::strand<boost::asio::io_context::executor_type> strand_;

    explicit session(boost::asio::io_context& io_context, tcp::socket socket)
    : socket_(std::move(socket)),
      timer_(io_context),
      strand_(io_context.get_executor())
    { }

    void go()
    {
        auto self(shared_from_this());
        boost::asio::spawn(strand_, [this, self](boost::asio::yield_context yield)
        {
            try
            {
                timer_.expires_from_now(std::chrono::seconds(TIMEOUT_LIMIT));

                // recv data
                string packet;
                packet.resize(DATA_LEN_4); // alloc memory

                size_t received_len = 0;

                // read data
                {
                    size_t rs;
                    while(received_len < DATA_LEN_4) { // recv 4 bytes
                        boost::system::error_code ec;

                        rs = socket_.async_read_some(
                            boost::asio::buffer((char*)(packet.c_str()+received_len), DATA_LEN_4-received_len), yield[ec]);
                        if(ec==boost::asio::error::eof)
                            break; //connection closed cleanly by peer
                        else if(ec) {
                            throw "read_fail";
                        }
                        received_len += rs;
                    }
                }
                if(received_len < DATA_LEN_4) {
                    throw "recv too short, maybe timeout";
                }
                // write back "OK"
                {
                    boost::system::error_code ecw;
                    boost::asio::async_write(socket_, boost::asio::buffer(string("OK")), yield[ecw]);
                    if(ecw==boost::asio::error::eof)
                        return; //connection closed cleanly by peer
                    else if(ecw)
                        throw "write_fail"; // some other error
                }
            }
            catch (const char* reason) 
            {
                printf("exception reason: %s\n", reason);
                boost::system::error_code ecw;

                /*
                 * Question 1: why this 'async_write' line causes crash?
                 */
                // write the error reason to client
                boost::asio::async_write(socket_, boost::asio::buffer(string(reason)), yield[ecw]);

                socket_.close();
                timer_.cancel();
            }
            catch (...)
            {
                printf("unknown exception\n");
                socket_.close();
                timer_.cancel();
            }
        });

        boost::asio::spawn(strand_, [this, self](boost::asio::yield_context yield)
        {
            while (socket_.is_open())
            {
                boost::system::error_code ignored_ec;
                timer_.async_wait(yield[ignored_ec]);
                if (timer_.expires_from_now() <= std::chrono::seconds(0))
                    socket_.close();
            }
        });
    }
};

int main() {
    boost::asio::io_context io_context;

    boost::asio::spawn(io_context, [&](boost::asio::yield_context yield)
    {
        tcp::acceptor acceptor(io_context,
        tcp::endpoint(tcp::v4(), SERVER_PORT));

        for (;;)
        {
            boost::system::error_code ec;

            tcp::socket socket(io_context);
            acceptor.async_accept(socket, yield[ec]);
            if (!ec) 
                std::make_shared<session>(io_context, std::move(socket))->go();
        }
    });

    /*
     * When run on 1 CPU, it runs fine, no Crash 
     */
    // io_context.run();

    /*
     * Question 2:
     * But when run on multiple CPUs, it Crashes !!!
     * Why?
     */
    auto thread_count = std::thread::hardware_concurrency();
    boost::thread_group tgroup;
    for (auto i = 0; i < thread_count; ++i)
        tgroup.create_thread(boost::bind(&boost::asio::io_context::run, &io_context));
    tgroup.join_all();
}

Please note, 4-bytes-packet and 1 second timeout is just to illustrate the problem, the real server uses large packets which may cause timeout on bad network condition. To simulate this, client writes 1 byte per second to trigger the read timeout on server.

The client:

#include <iostream>
#include <boost/asio.hpp>
using namespace std;

using boost::asio::ip::tcp;

#define SERVER "127.0.0.1"
#define PORT "1234"

int main() {
    boost::asio::io_context io_context;

    unsigned i = 1; 
    while(1) {
        try {
            tcp::socket s(io_context);
            tcp::resolver resolver(io_context);
            boost::asio::connect(s, resolver.resolve(SERVER, PORT));

            // to simulate the bad network condition,
            // write 4 bytes in 4 seconds to trigger the receive timeout on server, which is 1 second
            for(int i=0; i<4; i++) { 
                boost::asio::write(s, boost::asio::buffer(string("A")));
                std::this_thread::sleep_for(std::chrono::seconds(1)); // sleep 1 second
            }

            // read echo
            char x[64] = {0};
            s.read_some(boost::asio::buffer(x, sizeof(x)));
            cout << i++ << ". received: " << x << endl;
        } catch (...) {
            cout << i++ << " exception" << endl;
        }
    }

    return 0;
}

Question 1:

Why this lines causes crash ?

boost::asio::async_write(socket_, boost::asio::buffer(string(reason)), yield[ecw]);

Question 2:

Why the server doesn't crash when it runs on 1 cpu: io_context.run(); ?
And crashes on multiple CPUs using thread_group ?

My environment: Win10-64bit, boost-1.71.0-64bit, VisualStudio-2017-Community

Original Q&A

There are 1 best solutions below

**sehe** · Accepted Answer · 2019-11-12T00:41:19.407000

Question 1

ba::async_write(socket_, ba::buffer(string("OK")), yield[ecw]);

This invokes undefined behaviour because you pass a temporary string as the buffer, but the asynchronous operation (by definition) doesn't complete before the async_write call returns.

Therefore the buffer is a stale reference to something destructed on the stack or whatever now lives there.

The send buffer would logically be part of the self object to get a more proper lifetime. Or, since you're doing coroutines and you're going to end the session anyhow, just use write instead of async_write.

Question 2

That because undefined behaviour is Undefined Behaviour. Anything can happen.

The Unasked

Instead of read_some use read with transfer_exactly(DATA_LEN_4), or read_until with an appropriate completion condition.
Instead of buffer(reserved_string) you can dynamic_buffer.

Instead of throwing magical strings you can just catch system_error where code signifies what condition arose:

try {
    timer_.expires_from_now(std::chrono::seconds(TIMEOUT_LIMIT));

    // read data
    std::string packet;
    auto received_len = ba::async_read(socket_,
            ba::dynamic_buffer(packet),
            ba::transfer_exactly(DATA_LEN_4), yield);

    assert(received_len == DATA_LEN_4); // guaranteed

    // write back "OK"
    ba::write(socket_, ba::buffer("OK"s));
}
catch (boost::system::system_error const& e) {
    if (e.code() == ba::error::operation_aborted)
        std::cout << "canceled (timeout)" << std::endl;
    else if (e.code() == ba::error::eof)
        std::cout << "eof" << std::endl;
    else throw std::runtime_error(e.code().message());
}

So, now you could wrap that with your generic exception handling block:

try {
    // ...
} catch (std::exception const& e) {
    std::cout << "exception: " << std::quoted(e.what()) << std::endl;

    boost::system::error_code ignore;
    ba::async_write(socket_, ba::buffer(std::string(e.what())), yield[ignore]);

    socket_.close();
    timer_.cancel();
}

But!

it seems highly dubious that informing your client is useful or even wise
not catching the exception in the coro is going to destroy the self instance anyways so you can simply let it escape

Timers

The time completion error_code already signifies whether the timer was expired or canceled:

while (socket_.is_open()) {
    boost::system::error_code ec;
    timer_.async_wait(yield[ec]);

    if (ba::error::operation_aborted != ec) // timer was not canceled
        socket_.close();
}

Note however regular return paths from the session coro do NOT call .cancel() on the time_. That will lead the socket to be kept open another <1s until the timer expires.

Exceptions

If you want to let exceptions escape from the coros (you can, and you should consider that it happens), you must improve the thread loops by handling exceptions: Should the exception thrown by boost::asio::io_service::run() be caught?

Suggested Code For Server

Combining the coros, and greatly simplifying all condition handling:

#include <iostream>
#include <iomanip>

#include <boost/thread/thread.hpp>
#include <boost/asio.hpp>
#include <boost/asio/spawn.hpp>
#include <boost/scope_exit.hpp>

using namespace std::literals;
namespace ba = boost::asio;
using ba::ip::tcp;

static constexpr unsigned short SERVER_PORT = 1234;
static constexpr std::size_t    DATA_LEN_4 = 4;
static constexpr auto           TIMEOUT_LIMIT = 1s;

struct session : public std::enable_shared_from_this<session>
{
    tcp::socket socket_;
    ba::steady_timer timer_;
    ba::strand<ba::io_context::executor_type> strand_;

    explicit session(ba::io_context& io_context, tcp::socket socket)
    : socket_(std::move(socket)),
      timer_(io_context),
      strand_(io_context.get_executor())
    { }

    void go() {
        ba::spawn(strand_, [this, self = shared_from_this()](ba::yield_context yield) {

            spawn(yield, [this, self](ba::yield_context yield) {
                timer_.expires_from_now(TIMEOUT_LIMIT);
                while (socket_.is_open()) {
                    boost::system::error_code ec;
                    timer_.async_wait(yield[ec]);
                    if (ba::error::operation_aborted != ec) // timer was not canceled
                        socket_.close();
                }
            });

            try {
                // read data
                std::string packet;
                ba::async_read(socket_,
                        ba::dynamic_buffer(packet),
                        ba::transfer_exactly(DATA_LEN_4), yield);

                // write back "OK"
                ba::write(socket_, ba::buffer("OK"s));
            }
            catch (boost::system::system_error const& e) {
                if (e.code() == ba::error::operation_aborted)
                    std::cout << "canceled (timeout)" << std::endl;
                else if (e.code() == ba::error::eof)
                    std::cout << "eof" << std::endl;
                else // throw std::runtime_error(e.code().message());
                    std::cout << "other: " << e.code().message() << std::endl;
            }

            socket_.close();
            timer_.cancel(); // cancel the other coro so we don't keep the session alive
        });
    }
};

int main() {
    ba::io_context io_context;

    ba::spawn(io_context, [&](ba::yield_context yield) {
        tcp::acceptor acceptor(io_context, tcp::endpoint(tcp::v4(), SERVER_PORT));

        for (;;) {
            boost::system::error_code ec;

            tcp::socket socket(io_context);
            acceptor.async_accept(socket, yield[ec]);
            if (!ec) 
                std::make_shared<session>(io_context, std::move(socket))->go();
        }
    });

    boost::thread_group tgroup;
    for (auto i = 0u; i < std::thread::hardware_concurrency(); ++i)
        tgroup.create_thread([&io_context] {
            for (;;) {
                try { io_context.run(); break; } // exited normally
                catch (std::exception const &e) { std::clog << "[eventloop] exception caught " << std::quoted(e.what()) << std::endl; } 
                catch (...)                     { std::clog << "[eventloop] unknown exception caught" << std::endl;                   } 
            }
        });

    tgroup.join_all();
}

With a randomized Client

Changing the sleep to be random, so that it sometimes works and sometimes times out:

std::mt19937 prng { std::random_device{}() };
for (int i = 0; i < 4; i++) {
    ba::write(s, ba::buffer(std::string("A")));
    std::this_thread::sleep_for(std::uniform_int_distribution<>(200, 400)(prng) * 1ms);
}

Printed on my system:

1. received: OK
2. received: OK
3. received: OK
canceled (timeout)
4 exception read_some: End of file
5. received: OK
canceled (timeout)
6 exception read_some: End of file
7. received: OK
8. received: OK

Look Ma, No Hands

Even simpler, leaving off the special-case messages, doesn't actually change much:

ba::spawn(strand_, [this, self = shared_from_this()](ba::yield_context yield) {
    try {
        ba::steady_timer timer(strand_, TIMEOUT_LIMIT);
        timer.async_wait([this](error_code ec) {
            if (ba::error::operation_aborted != ec) 
                socket_.close();
            });

        std::string packet;
        ba::async_read(socket_,
                ba::dynamic_buffer(packet),
                ba::transfer_exactly(DATA_LEN_4), yield);

        ba::write(socket_, ba::buffer("OK"s));
    } catch(std::exception const& e) {
        std::clog << "error " << std::quoted(e.what()) << std::endl;
    }
});

Note how we don't even need timer_ as a member any more, and its destructor will automatically correctly cancel the timer as well, on reaching the end of scope.

The output doesn't actually change much:

1. received: OK
2. received: OK
3. received: OK
error "Operation canceled"
4 exception read_some: End of file
5. received: OK
6. received: OK
7. received: OK
error "Operation canceled"
8 exception read_some: End of file
error "Operation canceled"
9 exception read_some: End of file

boost coroutine server crashes when writting data to client

There are 1 best solutions below

Question 1

Question 2

The Unasked

Timers

Exceptions

Suggested Code For Server

With a randomized Client

Look Ma, No Hands

Related Questions in BOOST

Related Questions in BOOST-ASIO

Related Questions in BOOST-COROUTINE

Related Questions in BOOST-EXCEPTION

Trending Questions

Popular # Hahtags

Popular Questions