How to get classes along with IDs when using yolov8+SORT for object detection and tracking?

2k Views Asked by At

I am new to yolo and SORT. I am trying to detect how many persons don't have these protective equipments. But just using yolo would give out outputs for each frame and won't track the persons. Then I tried implementing SORT but now am stuck on how to detect classes of IDs detected by SORT. I tried saving class array and then iterating through SORT results but the classes were not in the same arrangement before and after SORT processes it.

class_names = ['Hardhat', 'Mask', 'NO-Hardhat', 'NO-Mask', 'NO-Safety Vest', 'Person', 'Safety Cone', 'Safety Vest', 'machinery', 'vehicle']

def process_video(video_path: str, model: YOLO):
    tracker = Sort(max_age=2000,min_hits= 3,iou_threshold=0.01)
    cap = cv2.VideoCapture(video_path)
    
    while True:
        # read a frame from the video
        ret, img = cap.read()
        if not ret:
            break

        # process the frame with YOLO
        results = model(img, stream=True)

        detections = np.empty((0,5))
        for r in results:
            boxes = r.boxes
            i = 0
            for box in boxes:
                x1, y1, x2, y2 = box.xyxy[0]
                x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
                w,h = x2-x1 , y2-y1
                # getting confidence level
                conf = math.ceil((box.conf[0] * 100))
                # class names
                cls = class_names[int(box.cls[0])]
                if cls in ['Person','NO-Hardhat', 'NO-Mask', 'NO-Safety Vest']:
                    currentArray = np.array([x1,y1,x2,y2,conf])  #check conf[0]
                    detections = np.vstack((detections,currentArray))
        resultsTracker = tracker.update(detections)

        for result in resultsTracker:
                x1, y1, x2, y2, id = result
                x1, y1, x2, y2, id = int(x1), int(y1), int(x2), int(y2), int(id)
                (text_width, text_height), _ = cv2.getTextSize(f'{id}', cv2.FONT_HERSHEY_PLAIN, fontScale=0.5, thickness=1)
                text_offset_x = x1
                text_offset_y = y1 - text_height
                cv2.rectangle(img,(x1,y1),(x2,y2),(000,000,255),1)
                cv2.putText(img, f'{id} ', (text_offset_x, text_offset_y+6), cv2.FONT_HERSHEY_PLAIN, fontScale=0.5, color=(255, 255, 255), thickness=1)
  
    
        cv2.imshow('video',img)
        if cv2.waitKey(1) & 0xFF == ord('q'): # press 'q' to quit
            break
        elif cv2.waitKey(0):
            pass
    cap.release()
    cv2.destroyAllWindows()

Do correct me if my approach seems wrong. Any suggestions are welcome. Thank you

1

There are 1 best solutions below

0
afaulconbridge On

Have you tried using model.track() instead of model()? That means you don't have to manage the tracking itself, and YOLOv8 will do it for you. So instead of having two lists to match together, you can get tracking and classification from a single list of results. See https://docs.ultralytics.com/modes/track/#tracking