Print bounding box coordinates of dynamic object that detect using YOLOv5

30 Views Asked by At

I need to print the bounding box coordinates of a walking person in a video. Using YOLOv5 I detect the persons in the video. Each person is tracked. I need to print each person's bounding box coordinate with the frame number. Using Python how to do this.

The following is the code to detect, track persons and display coordinates in a video using YOLOv5.

      #display bounding boxes coordinates
import cv2
from ultralytics import YOLO

# Load the YOLOv8 model
model = YOLO('yolov8n.pt')

# Open the video file
cap = cv2.VideoCapture("Shoplifting001_x264_15.mp4")

#get total frames
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print(f"Frames count: {frame_count}")

# Initialize the frame id
frame_id = 0

# Loop through the video frames
while cap.isOpened():
    
    # Read a frame from the video
    success, frame = cap.read()

    if success:
        # Run YOLOv8 tracking on the frame, persisting tracks between frames
        results = model.track(frame, persist=True,classes=[0])

        # Visualize the results on the frame
        annotated_frame = results[0].plot()
        
        # Print the bounding box coordinates of each person in the frame
        print(f"Frame id: {frame_id}")
        for result in results:
              for r in result.boxes.data.tolist():
                    if len(r) == 7:
                        x1, y1, x2, y2, person_id, score, class_id = r
                        print(r)  
                    else:
                        print(r)
            
        # Display the annotated frame
        cv2.imshow("YOLOv5 Tracking", annotated_frame)

        # Increment the frame id
        frame_id += 1

        # Break the loop if 'q' is pressed
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    else:       # Break the loop if the end of the video is reached
        break

# Release the video capture object and close the display window
cap.release()
cv2.destroyAllWindows()

The above code is working and display the coordinates of tracked persons.

But the problem is in some videos it is not working properly

0: 384x640 6 persons, 187.2ms
Speed: 4.1ms preprocess, 187.2ms inference, 4.0ms postprocess per image at shape (1, 3, 384, 640)
Frame id: 2
[830.3707275390625, 104.34822845458984, 983.3080444335938, 366.95147705078125, 1.0, 0.8544653654098511, 0.0]
[80.94219207763672, 76.50841522216797, 254.93991088867188, 573.0479736328125, 2.0, 0.8748959898948669, 0.0]
[193.58871459960938, 60.9941291809082, 335.6488342285156, 481.8208312988281, 3.0, 0.8484305143356323, 0.0]
[470.2035827636719, 92.78453826904297, 732.5341796875, 602.9578857421875, 4.0, 0.8541176319122314, 0.0]
[719.50537109375, 227.52276611328125, 884.10498046875, 501.5626525878906, 5.0, 0.6705026030540466, 0.0]
[365.58099365234375, 47.774330139160156, 600.3360595703125, 443.5860595703125, 6.0, 0.785051703453064, 0.0]

This output is correct.

But for another video there are only three people in the video but at the beginning of the video at 1st frame identify as 6 person.

0: 480x640 6 persons, 810.5ms
Speed: 8.0ms preprocess, 810.5ms inference, 8.9ms postprocess per image at shape (1, 3, 480, 640)
Frame id: 0
[0.0, 10.708396911621094, 37.77726745605469, 123.68929290771484, 0.36418795585632324, 0.0]
[183.0453338623047, 82.82539367675781, 231.1952667236328, 151.8341522216797, 0.2975049912929535, 0.0]
[154.15158081054688, 74.86528778076172, 231.10934448242188, 186.2017822265625, 0.23649221658706665, 0.0]
[145.61187744140625, 69.76246643066406, 194.42532348632812, 150.91973876953125, 0.16918501257896423, 0.0]
[177.25042724609375, 82.43289947509766, 266.5430908203125, 182.33889770507812, 0.131477952003479, 0.0]
[145.285400390625, 69.32669067382812, 214.907470703125, 184.0771026611328, 0.12087596207857132, 0.0]

Also, the output does not show the person ID here. Only display coordinates, confidence score, and class id. What is the reason for that?

0

There are 0 best solutions below