I'm doing some detection using YoloV4/C++/OpenCV and it's running pretty good. Hower, to improve time consumption I'm trying to move everything to NVIDIA TensorRT and I'm feeling lost there.
I converted the .weights file to ONNX using the TensorRT tools, then converted the ONNX model to TensorRT engine like this :
void ONNXConvert()
{
MyLogger logger;
nvinfer1::IBuilder* builder = nvinfer1::createInferBuilder(logger);
nvinfer1::INetworkDefinition* network = builder->createNetworkV2(1U << static_cast<int>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));
// Load ONNX model
const auto parser = nvonnxparser::createParser(*network, logger);
// Parse the ONNX model
// Some code here...
std::ifstream onnxFile(onnxModelFile, std::ios::binary);
if (!onnxFile)
{
std::cerr << "Error opening ONNX model file. " << onnxModelFile << std::endl;
return;
}
onnxFile.seekg(0, onnxFile.end);
const size_t modelSize = onnxFile.tellg();
onnxFile.seekg(0, onnxFile.beg);
// Allocate buffer to hold the ONNX model
std::vector<char> onnxModelBuffer(modelSize);
onnxFile.read(onnxModelBuffer.data(), modelSize);
if (!parser->parse(onnxModelBuffer.data(), modelSize))
{
std::cerr << "Error parsing ONNX model." << std::endl;
return;
}
// Create a builder configuration
nvinfer1::IBuilderConfig* config = builder->createBuilderConfig();
// Set configuration options as needed
config->setMemoryPoolLimit(nvinfer1::MemoryPoolType::kWORKSPACE, 1 << 30);
nvinfer1::IHostMemory* serializedEngine = builder->buildSerializedNetwork(*network, *config);
std::cout << "Number of layers in the network: " << network->getNbLayers() << std::endl;
std::ofstream outFile("yolov4.engine", std::ios::binary);
outFile.write(reinterpret_cast<const char*>(serializedEngine->data()), serializedEngine->size());
outFile.close();
builder->destroy();
network->destroy();
serializedEngine->destroy();
}
This done, I can load the generated engine and perform the inference, everything seems to goes well until I try to parse the detection results.
I want to know the classes probabilities and the bounding boxes coordinates, but everything I have is inconsistent values.
From my YoloV4 config, I know I have :
- 20 classes
- Input width = 608
- Input height = 608
- Channels = 3
- 9 anchors with dimensions { 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 }
After the inference, I have 2 output buffers :
- a 1x22743x1x4 where I guess I will find the bounding boxes coordinates
- a 1x22743x20 where I guess I will find the classes probabilities
And this where I'm getting lost. Why are there 22743 detections ? How is this number calculated ? How must I parse the detections to correctly compute the coordinates and classes probabilities ?
I innocently tried to directly parse the outputs like that :
for (int d = 0; d < 22743; d++)
{
float maxProb = -1000.0f;
int classId = -1;
for (int c = 0; c < 20; c++)
{
if (classes[d * 20 + c] > maxProb)
{
maxProb = classes[d * 20 + c];
classId = c;
}
}
if (maxProb > CONFIDENCE_THRESHOLD)
{
float boxX = boxes[d * 4];
float boxY = boxes[d * 4 + 1];
float boxW = boxes[d * 4 + 2];
float boxH = boxes[d * 4 + 3];
}
}
But everything I got is tiny probabilities (like < 1E-05), and tiny and sometimes negatives boxes coordinates.
I understand I'm supposed to use what I know about the anchors but I'm really not sure how.
Could someone give me hand about that ? Every help will really be appreciated.