I'm currently working on publishing images generated in Unreal to ROS, but now I'm facing strange performance issues.
I'm using Unreal 5.2 on Ubuntu 20, with a Ryzen 5 5600X, a RTX 3070 and 40GB of RAM.
My method currently looks like this
void FROSOutputServer::PublishImage(ros::Publisher& ImagePublisher, TSharedPtr<FSensorDataBase>& SensorData){
TRACE_CPUPROFILER_EVENT_SCOPE(FROSOuputServer::PublishImage);
FCameraData* CameraData = static_cast<FCameraData*>(SensorData.Get());
sensor_msgs::ImagePtr ImgMsgPtr = boost::make_shared<sensor_msgs::Image>();
ros::Time TimeStamp;
TimeStamp.fromSec(CameraData->Timestamp.ToDouble());
ImgMsgPtr->header.stamp = TimeStamp;
ImgMsgPtr->header.frame_id = "test";
ImgMsgPtr->step = CameraData->Width * 3;
ImgMsgPtr->height = CameraData->Height;
ImgMsgPtr->width = CameraData->Width;
ImgMsgPtr->encoding = "bgr8";
ImgMsgPtr->is_bigendian = 0;
{
TRACE_CPUPROFILER_EVENT_SCOPE(FROSOuputServer::PublishImage::Copy);
ImgMsgPtr->data.resize(CameraData->ImageData.Num());
uint8* DestPtr = ImgMsgPtr->data.data();
uint8* SrcPtr = CameraData->ImageData.GetData();
FMemory::Memcpy(DestPtr, SrcPtr, CameraData->ImageData.Num());
}
{
TRACE_CPUPROFILER_EVENT_SCOPE(FROSOuputServer::PublishImage::Publish);
ImagePublisher.publish(ImgMsgPtr);
}
// Only for debugging purposes, this would be called implicitly by the shared pointer destructor
{
TRACE_CPUPROFILER_EVENT_SCOPE(FROSOuputServer::PublishImage::FreeUnrealPtr);
SensorData.Reset();
}
{
TRACE_CPUPROFILER_EVENT_SCOPE(FROSOuputServer::PublishImage::FreeBoostPtr);
ImgMsgPtr.reset();
}
}
With these Unreal data structs:
struct FSensorDataBase
{
Utils::Time Timestamp;
};
struct FCameraData : public FSensorDataBase
{
TArray<uint8> ImageData;
uint32 Width;
uint32 Height;
};
For testing purposes, I created a 1000x1000 image, resulting in a 3MB TArray.
I expected modern CPUs to take very little time to copy such "small" amounts of data, but when profiling I encountered very poor results.
I used Unreal Insights for profiling. (See TRACE_CPUPROFILER_EVENT_SCOPE macro above)
So the copy operation takes about 13.5ms and freeing the boost pointer after it is published takes another 20.8ms . Unreals shared pointer isn't freed in this scope.
Is there anyway to optimize this or am I simply running into CPU constraints due to Unreals overhead?