Working with Ubuntu 20.04 x86_64, GCC 9.4.0, Tensorflow C API Linux CPU only.
A simple C++ tensorflow model prediction code with cppflow wrap.
//this is predict.cpp
#include <iostream>
#include "cppflow/cppflow.h"
int predict() {
auto input = cppflow::decode_jpeg(cppflow::read_file(std::string("../my_cat.jpg")));
input = cppflow::cast(input, TF_UINT8, TF_FLOAT);
input = cppflow::expand_dims(input, 0);
cppflow::model model("../model");
auto output = model(input);
std::cout << "It's a tiger cat: " << cppflow::arg_max(output, 1) << std::endl;
return 0;
}
int main(){
predict();
return 0;
}
compiled with g++
as executable file, it works fine.
g++ -o predict.out predict.cpp -ltensorflow
./predict.out
However, when I compile this code as a .so
library.
//this is predict.cpp
#include <iostream>
#include "cppflow/cppflow.h"
extern "C" int predict() {
auto input = cppflow::decode_jpeg(cppflow::read_file(std::string("./my_cat.jpg")));
input = cppflow::cast(input, TF_UINT8, TF_FLOAT);
input = cppflow::expand_dims(input, 0);
cppflow::model model("./model");
printf("model created\n");
auto output = model(input);
printf("model predicted\n");
std::cout << "It's a tiger cat: " << cppflow::arg_max(output, 1) << std::endl;
return 0;
}
with a g++ command
g++ -fPIC -shared -o predict.so predict.cpp -ltensorflow
load and execute from another .cpp
file, for example:
#include <dlfcn.h>
#include <stdio.h>
int (*predict)();
int main()
{
void* handle=dlopen("./predict.so",RTLD_LAZY);
predict=(int(*)(void))dlsym(handle, "predict");
dlclose(handle);
predict();
return 0;
}
I got is error.
model created
terminate called after throwing an instance of '__gnu_cxx::recursive_init_error'
terminate called recursively
terminate called recursively
since cppflow is just a wrap of Tensorflow C AIP, I looked into the auto output = model(input);
line which caused this error. It turned out the execution of below sentence of TF C API triggered this error
TF_SessionRun(this->session.get(), /*run_options*/ NULL,
inp_ops.data(), inp_val.data(), static_cast<int>(inputs.size()),
out_ops.data(), out_val.get(), static_cast<int>(outputs.size()),
/*targets*/ NULL, /*ntargets*/ 0, /*run_metadata*/ NULL,
this->status.get());
put it in another words, when a .cpp
file containing TF_SessionRun()
is compiled with g++ as .out
, it works. Exported and compiled as.so
library and executed in another .cpp
file, it fails.
A more interesting thing is, when execute the function from .so
library multiple times, the error message isn't always the same, sometimes, it prints
model created
terminate called after throwing an instance of '__gnu_cxx::recursive_init_error'
terminate called recursively
terminate called recursively
and some times
model created
terminate called recursively
terminate called after throwing an instance of '__gnu_cxx::recursive_init_error'
look like it is a random wrong memory access error. I checked the error information and the document says:
'If control re-enters the declaration (recursively) while the object is being initialized, the behavior is undefined.'
Still, I can't figure it out why the.out
works while the .so
failed, when both are compiled with one same compiler. Is it the way I exported the .so
library caused this problem? Looking for some clues, appreciate.