I have the following code where the user can press p
to pause the video, draw a bounding box around the object to be tracked, and then press Enter (carriage return) to track that object in the video feed:
import cv2
import sys
major_ver, minor_ver, subminor_ver = cv2.__version__.split('.')
if __name__ == '__main__' :
# Set up tracker.
tracker_types = ['BOOSTING', 'MIL','KCF', 'TLD', 'MEDIANFLOW', 'GOTURN', 'MOSSE', 'CSRT']
tracker_type = tracker_types[1]
if int(minor_ver) < 3:
tracker = cv2.Tracker_create(tracker_type)
else:
if tracker_type == 'BOOSTING':
tracker = cv2.TrackerBoosting_create()
if tracker_type == 'MIL':
tracker = cv2.TrackerMIL_create()
if tracker_type == 'KCF':
tracker = cv2.TrackerKCF_create()
if tracker_type == 'TLD':
tracker = cv2.TrackerTLD_create()
if tracker_type == 'MEDIANFLOW':
tracker = cv2.TrackerMedianFlow_create()
if tracker_type == 'GOTURN':
tracker = cv2.TrackerGOTURN_create()
if tracker_type == 'MOSSE':
tracker = cv2.TrackerMOSSE_create()
if tracker_type == "CSRT":
tracker = cv2.TrackerCSRT_create()
# Read video
video = cv2.VideoCapture(0) # 0 means webcam. Otherwise if you want to use a video file, replace 0 with "video_file.MOV")
# Exit if video not opened.
if not video.isOpened():
print ("Could not open video")
sys.exit()
while True:
# Read first frame.
ok, frame = video.read()
if not ok:
print ('Cannot read video file')
sys.exit()
# Retrieve an image and Display it.
if((0xFF & cv2.waitKey(10))==ord('p')): # Press key `p` to pause the video to start tracking
break
cv2.namedWindow("Image", cv2.WINDOW_NORMAL)
cv2.imshow("Image", frame)
cv2.destroyWindow("Image");
# select the bounding box
bbox = (287, 23, 86, 320)
# Uncomment the line below to select a different bounding box
bbox = cv2.selectROI(frame, False)
# Initialize tracker with first frame and bounding box
ok = tracker.init(frame, bbox)
while True:
# Read a new frame
ok, frame = video.read()
if not ok:
break
# Start timer
timer = cv2.getTickCount()
# Update tracker
ok, bbox = tracker.update(frame)
# Calculate Frames per second (FPS)
fps = cv2.getTickFrequency() / (cv2.getTickCount() - timer);
# Draw bounding box
if ok:
# Tracking success
p1 = (int(bbox[0]), int(bbox[1]))
p2 = (int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3]))
cv2.rectangle(frame, p1, p2, (255,0,0), 2, 1)
else :
# Tracking failure
cv2.putText(frame, "Tracking failure detected", (100,80), cv2.FONT_HERSHEY_SIMPLEX, 0.75,(0,0,255),2)
# Display tracker type on frame
cv2.putText(frame, tracker_type + " Tracker", (100,20), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (50,170,50),2);
# Display FPS on frame
cv2.putText(frame, "FPS : " + str(int(fps)), (100,50), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (50,170,50), 2);
# Display result
cv2.imshow("Tracking", frame)
# Exit if ESC pressed
k = cv2.waitKey(1) & 0xff
if k == 27 : break
Now, instead of having the user pause the video and draw the bounding box around the object, how do I make it such that it can automatically detect the particular object I am interested in (which is toothbrush in my case) whenever it is introduced in the video feed, and then track it?
I found this article which talks about how we can detect objects in video using ImageAI and Yolo.
from imageai.Detection import VideoObjectDetection
import os
import cv2
execution_path = os.getcwd()
camera = cv2.VideoCapture(0)
detector = VideoObjectDetection()
detector.setModelTypeAsYOLOv3()
detector.setModelPath(os.path.join(execution_path , "yolo.h5"))
detector.loadModel()
video_path = detector.detectObjectsFromVideo(camera_input=camera,
output_file_path=os.path.join(execution_path, "camera_detected_1")
, frames_per_second=29, log_progress=True)
print(video_path)
Now, Yolo does detect toothbrush, it is among the 80 odd objects that it can detect by default. However, there are 2 points about this article that makes it not the ideal solution for me:
This method first analyses each video frame (takes about 1-2 seconds per frame, so about 1 minute to analyse a 2-3 second video stream from the webcam), and saves the detected video in a separate video file. Whereas, I want to detect the toothbrush in the webcam video feed in real time. Is there a solution for this?
The Yolo v3 model being used can detect all 80 objects, but I want only 2 or 3 objects detected - the toothbrush, the person holding the toothbrush and the background possibly, if needed at all. So, is there a way in which I can reduce the model weight by selecting only these 2 or 3 objects to detect?
If you want a quick and easy solution, you can use one of the more lightweight yolo files. You can get the weights and config files (they come in pairs and must be used together) from this website: https://pjreddie.com/darknet/yolo/ (don't worry, it looks sketch but it's fine)
Using a smaller network will get you much higher fps, but also worse accuracy. If that's a tradeoff you're willing to accept then this is the easiest thing to do.
Here's some code for detecting toothbrushes. The first file is just a class file to help make using the Yolo network more seamless. The second is the "main" file that opens up a VideoCapture and feeds images to the network.
yolo.py
main.py
There are a few options for you if you don't want to use a lighter weights file.
If you have an Nvidia GPU you can use CUDA to drastically increase your fps. Even modest nvidia gpu's are several times faster than running solely on cpu.
A common strategy to bypass the cost of constantly running detection is to only use it to initially acquire a target. You can use the detection from the neural net to initialize your object tracker, similar to a person drawing a bounding box around the object. Object trackers are way faster and there's no need to constantly do a full detection every frame.
If you run Yolo and object tracking in a separate thread then you can run as fast as your camera is capable of. You'll need to store a history of frames so that when the Yolo thread finishes a frame you can check the old frame to see if you're already tracking the object, and so you can start the object tracker on the corresponding frame and fast-forward it to let it catch up. This program isn't simple and you'll need to make sure you're managing data between your threads correctly. It's a good excercise for becoming comfortable with multithreading though, which is a huge step in programming.