I'm creating a native node extension for RocksDB, I've pinned down an issue which I can not explain. So I have the following perfectly functioning piece of code:
std::string v;
ROCKSDB_STATUS_THROWS(db->Get(*options, k, &v));
napi_value result;
NAPI_STATUS_THROWS(napi_create_buffer_copy(env, v.size(), v.c_str(), nullptr, &result));
return result;
But when I introduce an optimization that reduces one extra memcpy
I get segfaults:
std::string *v = new std::string();
ROCKSDB_STATUS_THROWS(db->Get(*options, k, v)); // <============= I get segfaults here
napi_value result;
NAPI_STATUS_THROWS(napi_create_external_buffer(env, v->size(), (void *)v->c_str(), rocksdb_get_finalize, v, &result));
return result;
Here's Get
method signature:
rocksdb::Status rocksdb::DB::Get(const rocksdb::ReadOptions &options, const rocksdb::Slice &key, std::string *value)
Any thoughts why does this issue might happen?
Thank you in advance!
Edit
Just to be sure, I've also checked the following version (it also fails):
std::string *v = new std::string();
ROCKSDB_STATUS_THROWS(db->Get(*options, k, v));
napi_value result;
NAPI_STATUS_THROWS(napi_create_buffer_copy(env, v->size(), v->c_str(), nullptr, &result));
delete v;
Edit
As per request in comments providing more complete example:
#include <napi-macros.h>
#include <node_api.h>
#include <rocksdb/db.h>
#include <rocksdb/convenience.h>
#include <rocksdb/write_batch.h>
#include <rocksdb/cache.h>
#include <rocksdb/filter_policy.h>
#include <rocksdb/cache.h>
#include <rocksdb/comparator.h>
#include <rocksdb/env.h>
#include <rocksdb/options.h>
#include <rocksdb/table.h>
#include "easylogging++.h"
INITIALIZE_EASYLOGGINGPP
...
/**
* Runs when a rocksdb_get return value instance is garbage collected.
*/
static void rocksdb_get_finalize(napi_env env, void *data, void *hint)
{
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get_finalize (started)";
if (hint)
{
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get_finalize (finished)";
delete (std::string *)hint;
}
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get_finalize (finished)";
}
/**
* Gets key / value pair from a database.
*/
NAPI_METHOD(rocksdb_get)
{
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (started)";
NAPI_ARGV(3);
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (getting db argument)";
rocksdb::DB *DECLARE_FROM_EXTERNAL_ARGUMENT(0, db);
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (getting k argument)";
DECLARE_SLICE_FROM_BUFFER_ARGUMENT(1, k);
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (getting options argument)";
rocksdb::ReadOptions *DECLARE_FROM_EXTERNAL_ARGUMENT(2, options);
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (declaring v variable)";
std::string *v = new std::string();
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (getting value from database)";
ROCKSDB_STATUS_THROWS(db->Get(*options, k, v));
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (wrapping value with js wrapper)";
napi_value result;
NAPI_STATUS_THROWS(napi_create_external_buffer(env, v->size(), (void *)v->c_str(), rocksdb_get_finalize, v, &result));
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (finished)";
return result;
}
The code that launches the above method is implemented in TypeScript and runs in NodeJS, here is complete listing:
import path from 'path';
import { bindings as rocks, Unique, BatchContext } from 'rocksdb';
import { MapOf } from '../types';
import { Command, CommandOptions, CommandOptionDeclaration, Persist, CommandEnvironment } from '../command';
// tslint:disable-next-line: no-empty-interface
export interface PullCommandOptions {
}
@Command
export class ExampleCommandNameCommand implements Command {
public get description(): string {
return "[An example command description]";
}
public get options(): CommandOptions<CommandOptionDeclaration> {
const result: MapOf<PullCommandOptions, CommandOptionDeclaration> = new Map();
return result;
}
public async run(environment: CommandEnvironment, opts: CommandOptions<unknown>): Promise<void> {
// let options = opts as unknown as PullCommandOptions;
let window = global as any;
window.rocks = rocks;
const configPath = path.resolve('log.conf');
const configPathBuffer = Buffer.from(configPath);
rocks.logger_config(configPathBuffer);
rocks.logger_start();
let db = window.db = rocks.rocksdb_open(Buffer.from('test.db', 'utf-8'), rocks.rocksdb_options_init());
let readOptions = window.readOptions = rocks.rocksdb_read_options_init();
let writeOptions = window.writeOptions = rocks.rocksdb_write_options_init();
// ===== The line below launches the C++ method
rocks.rocksdb_put(db, Buffer.from('Zookie'), Buffer.from('Cookie'), writeOptions);
// ===== The line above launches the C++ method
console.log(rocks.rocksdb_get(db, Buffer.from('Zookie'), readOptions).toString());
let batch: Unique<BatchContext> | null = rocks.rocksdb_batch_init();
rocks.rocksdb_batch_put(batch, Buffer.from('Cookie'), Buffer.from('Zookie'));
rocks.rocksdb_batch_put(batch, Buffer.from('Pookie'), Buffer.from('Zookie'));
rocks.rocksdb_batch_put(batch, Buffer.from('Zookie'), Buffer.from('Zookie'));
rocks.rocksdb_batch_put(batch, Buffer.from('Hookie'), Buffer.from('Zookie'));
await rocks.rocksdb_batch_write_async(db, batch, writeOptions);
batch = null;
let proceed = true;
while (proceed) {
await new Promise(resolve => setTimeout(resolve, 1000));
}
}
}
Basically this code represents an implementation of KeyValueDatabase->Get("Some key") method, you pass a string into it you get a string in return. But it's obvious the issue is dancing around new std::string()
call, I thought that I might get some explanations regarding why it's bad to go this way? How it is possible to move string value without a copy from one string into another?
It's unclear which extra
memcpy
you think you are optimizing out.If the string is short, and you are using
std::string
with short-string optimization, then indeed you will optimize out a shortmemcpy
. However, dynamically allocating and then deletingstd::string
is likely much more expensive than thememcpy
.If the string is long, you don't actually optimize anything at all, and instead make the code slower for no reason.
The fact that adding
v = new std::string; ... ; delete v;
introduces aSIGSEGV
is a likely indication that you have some other heap corruption going on, which remains unnoticed until you shift things a bit. Valgrind is your friend.