Unexpected behavior when converting a character array to a string

443 Views Asked by At

So I have this simple class, it takes a character array and parses it into a JSON object. It then stores that object internally and provides a getter.

class JSONContainer {
public:
    explicit JSONContainer(const char* const json) {
        std::string t(json);

        _json = new nlohmann::basic_json(json);
    }

    ~JSONContainer() {
        delete _json;
    }

    nlohmann::json *j() {
        return _json;
    }

private:
    nlohmann::json* _json;
};

If I instantiate the class with something simple like ...

{"data": [100,100]}

it works but if this string grows to the length of ~1000+ the incoming character array gets corrupted when I try to parse json to a string.

                      // incoming json {"data": [100,100,100,100,100...
std::string t(json);  // turns into "ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ..." after this line

I have no idea what could be causing this. The one thing I though to check was the existence of the null terminator at the end of json and I always found it.

Appreciate the help!

Additional context for comments ...

This is the method calling the constructor above ...

std::shared_ptr<void> JSONSerDes::deserialize(const char *serializedData) {
    auto *ct = new JSONContainer(serializedData);
    return std::shared_ptr<void>(ct);
}

and then going up the stack to the main function, note this line deserializedData = t->deserialize(serializedData); ...

...
    // declare intermediate data
    const char* serializedData;
    std::shared_ptr<void> deserializedData;

    // for each data set size, run each test
    for (const int testSize: sizeTestsB) {
        // generate the test data, imitate data coming from python program
        PyObject* td = data(testSize);

        for (const std::unique_ptr<SerDesTest>& t: tests) {
            // log the start
            startTest(t->type(), testSize, currentTest, totalTests);

            // mark start, ser/des mark end
            start = std::chrono::steady_clock::now();

            serializedData = t->serialize(td);                                      // Python -> Redis
            checkpoints.push_back(checkpoint(t->type(), testSize,  "PythonToRedis", start));

            deserializedData = t->deserialize(serializedData);            // Redis -> Container
            checkpoints.push_back(checkpoint(t->type(), testSize,  "RedisToContainer", start));
...

This is the function used to turn the python object into a character array. dumps is a method from pythons json module. I may be misunderstanding what the lifecycle of the character array is.

const char* JSONSerDes::serialize(PyObject * pyJson) {
    // convert pyobject to boost python object
    boost::python::object d = boost::python::extract<boost::python::object>(pyJson);

    // call the dumps function and capture the return value
    return boost::python::extract<const char*>(dumps(d));
}
1

There are 1 best solutions below

0
Tyler Weiss On

I figured out what was wrong. When this handle

boost::python::object rv = dumps(d);

falls out of scope the data seems to get deleted, making me thing the "extracted" pointer was just referencing the internal data, not data copied out out as the provided type.

I just changed my serialize method to copy the data over to a new buffer I allocated on the heap.

const char* JSONSerDes::serialize(PyObject * pyJson) {
    // convert pyobject to boost python object
    boost::python::object d = boost::python::extract<boost::python::object>(pyJson);

    // capture the return value of the return value
    boost::python::object rv = dumps(d);
    const char* prv = boost::python::extract<const char*>(rv);
    size_t prvLen = strlen(prv);

    // copy and return
    char* rvBuffer = new char[prvLen + 1];
    rvBuffer[prvLen] = '\0';
    strncpy(rvBuffer, prv, strlen(prv));

    // call the dumps function and capture the return value
    return rvBuffer;
}