MMDetection3D and nuScenes: output format, conversion and comparision

135 Views Asked by At

I am trying to run 3D object detection and my primary task is to get a confusion matrix that show's the performance of whatever model.

To do so I am using MMDetection3D to run inference on the nuScenes dataset (LIDAR specifically) and this works perfectly. MMDetection3D then returns a file that has numerous annotations but I am unable to decipher the units or format in which these are output in the JSON and I was wondering how these are stored. i could not find any helpful information on their website.

Here is an output JSON file

{"labels_3d": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 
4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
 9, 9, 9, 9, 9], 

"scores_3d": [0.9643934965133667, 0.9516704678535461, 0.939872682094574, 0.9227145314216614, 
0.8962982892990112, 0.8947009444236755, 0.7297248840332031, 0.681861937046051, 0.5708820819854736, 
0.3877566456794739, 0.37655529379844666, 0.3539758622646332, 0.3132390081882477, 
0.3026883602142334, 0.2756301462650299, 0.18868379294872284, 0.18692995607852936, 
0.1733768880367279, 0.14151839911937714, 0.12198230624198914, 0.12176891416311264, 
0.1208779513835907, 0.11904411017894745, 0.1072428748011589, 0.10460904985666275, 
0.10336094349622726, 0.10262171924114227, 0.10034439712762833, 0.0979783907532692, 
0.09782646596431732, 0.09688100963830948, 0.0967518761754036, 0.09104109555482864, 
0.09014609456062317, 0.0899345651268959, 0.08243735134601593, 0.08240745216608047, 
0.08000791072845459, 0.07832382619380951, 0.07400976866483688, 0.07252516597509384, 
<<.... OMMITED....>> 
0.10082884877920151, 0.07867483049631119, 0.07804480940103531, 0.0752241387963295, 
0.07462912797927856, 0.07323057949542999, 0.07187190651893616, 0.06647778302431107, 
0.06521017849445343, 0.06434381753206253, 0.06294619292020798, 0.061906829476356506, 
0.05730915442109108, 0.05593761429190636, 0.055751606822013855, 0.05546196922659874, 
0.055023666471242905, 0.05173450708389282, 0.05133531987667084], 

"bboxes_3d": [[2.7894937992095947, -5.6129255294799805, -1.984688639640808, 4.402218818664551, 
1.795907735824585, 1.5440824031829834, 1.5714409351348877, 0.00040857819840312004, 
0.007234238553792238], [-4.21560001373291, 2.253385305404663, -1.7578086853027344, 
4.383520126342773, 1.8513178825378418, 1.6987131834030151, 1.5745370388031006, 
0.0024222619831562042, 0.08641806244850159], [-4.373297214508057, -6.250703811645508, 
-1.9725176095962524, 4.265439033508301, 1.8188081979751587, 1.6568591594696045, 
1.4571964740753174, 0.0037363539449870586, 0.010523795150220394], [2.751802921295166, 
-11.21645736694336, -2.1436007022857666, 4.437937259674072, 1.876691222190857, 1.869486927986145, 
1.5793282985687256, -0.002047546673566103, 0.008112283423542976], [-7.495792865753174, 
-1.6582443714141846, -1.8903460502624512, 4.34182596206665, 1.8267078399658203, 
1.7117701768875122, 1.5855417251586914, -0.00017483974806964397, 0.0006951251998543739], 
[-7.582278728485107, -7.891511917114258, -2.0099053382873535, 4.314764976501465, 
1.7949130535125732, 1.5743098258972168, 1.5461225509643555, 0.0003853405360132456, 
0.005822155624628067], [-40.35822296142578, -5.325357913970947, -1.4565951824188232, 
4.360017776489258, 1.8995375633239746, 1.6350754499435425, 3.3603575229644775, 
-0.01209242083132267, 0.00029582070419564843], 
<<.... OMMITED....>> 
[7.263882160186768, 37.240760803222656, 0.3970103859901428, 0.5283771753311157, 
3.5624496936798096, 0.9090756177902222, 0.22167176008224487, 0.024363433942198753, 
-0.020366592332720757], [6.8328447341918945, 40.2145881652832, 0.3075522184371948, 
0.6144049763679504, 3.465810775756836, 0.9300158023834229, -0.061834633350372314, 
-0.027105869725346565, -0.0548381544649601]], "box_type_3d": "LiDAR"}

As said, my primary goal is to compare the annotations given by nuScenes (2D bbox for LIDAR) and MMDetection3D and convert this into a confusion matrix. I have tried looking at their code but there's too much. I feel like I should not need to reinvent the wheel and implement his myself, and that both should have a way to do so.

My questions:

  1. What format does MMDetection3D output its predictions in?
  2. is there a utility that can compare 3D bounding boxes and generate a confusion matrix?

Things I've tried

  • Running tools/test.py from mmdetection3d does not work for me because it complains about not being able to find metadata and ann_info keys.
  • Running MMDetection3D's BEVFuion demo, but to no avail
  • Tried doing this manually and I am making progress but I don't really want to compare 3D bounding boxes, calculate IOU for each and go from there. Seems very inefficient.
1

There are 1 best solutions below

0
zlenyk On

You have 3 lists of size N, where N is number of annotated objects.

  • labels_3d[i] is the label (from 0 to 9 in Nuscenes classes) of i-th box
  • scores_3d[i] is the model's confidence (from 0.0 to 1.0), which is the class probability
  • boxes_3d[i] is description of the 3d box in format: (x, y, z, l, w, h, yaw)