i've created over 1200 images with labels for yolo detection and the problem is every image size is 800x600 and all the objects with labels are in the middle of the image. so i wanna crop the rest of the part since objects are placed in the middle. so the size of images would be something like 400x300 (crop left, right, top, bottom equally) but the objects will still be in the middle. but how do you convert or change the coordinates other than labeling all over again?
# (used labelimg for yolo)
0 0.545000 0.722500 0.042500 0.091667
1 0.518750 0.762500 0.097500 0.271667
heres one of my label .txt. sorry for my bad english!
I was just working this out myself, so here is a complete explanation of why the formula at the bottom is correct.
let's go over how these Annotations are formatted.
Each line is 5 numbers sperated by a space:
n x y w h
withW and H mean the width and height of the original image. A normalized value is relative to the width or height of the image.Not in pixels or other unit. It a proportion. For example the x value is normalized like this x[px]/W[px] = x normalized.
a few advantages of this:
The y axes goes from top to bottom. everything else is like your standard coordinate system.
Now to cropping. let's take this picture of a tree:
scaling
We will now crop to the top left quarter of the tree image.
our new image width W' is now only half of the original W. also H'= 0.5*H. The center of the old image is now the bottom left corner. We know the center of the image p is at (0.5,0.5). The bottom left corner is at p' =(1,1). If we would crop so (0.3,0.3) in the old image is the new bottom richt the new coordinate would also be at (1,1). 0.5 is also ½ . To get from 0.5 to 1 we need to multiply by 2, for ⅓ *3 , ¼ *4 . We see that if we reduce the the width or height by a/b be need to multiply by b/a.
translation
But we also want to move the top left of the image, our coordinate origin O. Lets crop to the tree trunk:
W is 7 characters. the new width is W' is 3. H=5 and H' is 2. The new origin O is (0,0) of course and O' is at (2,3) in characters, normalized to the original image ([![2 over 7][2]][2], [![3 over 5][3]][3]) or (0.285,0.6). O' is (0.285,0.6) but should be (0,0) so we reduce by x and y by 0.285 and 0.6 respectively before we scale the new value. This is not very interesting because 0 times anything is 0.
Let's do another example. the bottom right of our new cropped image of the tree trunk. Let's call this point q we know that q in our new system of the cropped image must be q' =(1,1) , it's the bottom right after all.
We already measured: W=7 W'=3 H=5 H'=2
By how much did we reduce height and width as a proportion?
(W-W'/W) is (7-3/7) is (4/7) or 0.571 . We know we have to scale W by 7/4 or 1.75 or 0.571^-1 . For H : 3/5 -> 5/3 -> 1.6 repeating. lets call these scaling factors s_h =5/3 and s_w=7/4
q' is at (5,7) in O . lets put our formula to the test. we moved hour origin by 2 in x/w and 3 in y/h direction lets call this Δw=2 and Δh=3.
For q'_x we remove 2 from q_x because Δw=2. we get 5-2=3. now we normalize 3 by dividing by 5. so we get q_x is 3/5. now we scale by s_h= 5/3 and yes 5/3 times 3/5 is indeed 1. Now that we checked our logic we can write an algorithm.
The algorithm
We already have normalized values so the matter is simpler.
For a point p in the original we can calculate p' in the new image like this:
in python:
correcting annotations
We could crop out annotations that we need to drop, or adjust to being partially cropped out.
As mentioned before all values must be in the interval [0,1].
Completely cropped out annotations will have 1+Δw/2>x<Δw/2 and 1+Δw/2>y<Δh/2
partially cropped
if you want to include annotations with only 1/4 or less area visible or drop annotations in the range [0,25,1) it will be more complicated.
intersection area in cropped image
we can view this problem as calculating the intersection area between two rectangles. For convenience the function also returns the percentage of area in frame.