Train EAST text detector on custom data

3.3k Views Asked by At

How can I train the EAST text detector on my custom data. There aren't any blogs online that shows step by step procedure to do the same. What I have currently.

I have a folder that contains all the images and corresponding xml file for each of our images that tells where our text are located.

Example :

<annotation>
    <folder>Dataset</folder>
    <filename>FFDDAPMDD1.png</filename>
    <path>C:\Users\HPO2KOR\Desktop\Work\venv\Patent\Dataset\Dataset\FFDDAPMDD1.png</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>839</width>
        <height>1000</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>522</xmin>
            <ymin>29</ymin>
            <xmax>536</xmax>
            <ymax>52</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>510</xmin>
            <ymin>258</ymin>
            <xmax>521</xmax>
            <ymax>281</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>546</xmin>
            <ymin>528</ymin>
            <xmax>581</xmax>
            <ymax>555</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>523</xmin>
            <ymin>646</ymin>
            <xmax>555</xmax>
            <ymax>674</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>410</xmin>
            <ymin>748</ymin>
            <xmax>447</xmax>
            <ymax>776</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>536</xmin>
            <ymin>826</ymin>
            <xmax>567</xmax>
            <ymax>851</ymax>
        </bndbox>
    </object>
    <object>
        <name>text</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>792</xmin>
            <ymin>918</ymin>
            <xmax>838</xmax>
            <ymax>945</ymax>
        </bndbox>
    </object>
</annotation>

Also I have the parsed xml file for each one of my images in the format which is used to train YOLO models.

Example

C:\Users\HPO2KOR\...\text\FFDDAPMDD1.png 522,29,536,52,0 510,258,521,281,0 546,528,581,555,0 523,646,555,674,0 410,748,447,776,0 536,826,567,851,0 792,918,838,945,0 660,918,706,943,0 63,1,108,24,0 65,51,110,77,0 65,101,109,126,0 63,151,110,175,0 63,202,109,228,0 63,252,110,276,0 63,303,110,330,0 62,353,110,381,0 65,405,109,434,0 90,457,110,482,0 59,505,101,534,0 64,565,107,590,0 61,616,107,644,0 62,670,103,694,0 62,725,104,753,0 63,778,104,804,0 62,831,100,857,0 87,887,106,912,0 98,919,144,943,0 240,916,284,943,0 378,915,420,943,0 520,918,565,942,0
C:\Users\HPO2KOR\...\text\FFDDAPMDD2.png 91,145,109,171,0 68,192,106,218,0 92,239,111,265,0 69,286,108,311,0 92,333,107,357,0 66,379,110,405,0 90,424,111,451,0 69,472,107,497,0 91,518,109,545,0 66,564,109,590,0 90,613,110,637,0 121,644,140,670,0 279,643,322,671,0 446,645,490,668,0 615,642,661,669,0 786,643,831,667,0 954,643,997,672,0 820,22,866,50,0 823,73,866,103,0
C:\Users\HPO2KOR\...\text\FFDDAPMDD3.png 648,1,698,30,0 68,64,129,91,0 55,144,128,168,0 70,218,129,247,0 56,300,127,326,0 71,377,125,404,0 58,459,127,482,0 109,535,130,560,0 140,568,160,594,0 344,568,382,594,0 563,566,581,591,0 760,568,800,593,0 982,569,1000,591,0

What is the procedure to train this EAST text detector for my custom dataset. I am on windows.

1

There are 1 best solutions below

2
On BEST ANSWER

According to the documentation in the readme file, custom training the keras implementation of EAST requires a folder of images with an accompanying text file for each image named gt_IMAGENAME.txt. (Replace IMAGENAME with the name of the image it maps to.)

In each text file, "the ground truth is given as separate text files (one per image) where each line specifies the coordinates of one word's bounding box and its transcription in a comma separated format." This quotation is from https://rrc.cvc.uab.es/?ch=4&com=tasks, which is linked in the read me to the tensorflow implementation of EAST at https://github.com/argman/EAST. The bounding box is expressed as coordinates for the four corners.

You seem to have all the information you need to construct training data in the right format. There could be a tool out there to convert everything, but a quick python script will work just fine as well. Something like ...

  1. Loop all xml files
  2. For each xml file, create a text file named as the documentation requires
  3. Use BeautifulSoup to parse xml
  4. Use find_all to get all object tags
  5. Use the xmin, xmax, ymin, and ymax values to express the x,y coordinates of all corners. (upper-left is xmin, ymax; upper-right is xmax, ymax; etc.) The order, based on https://github.com/argman/EAST/blob/master/training_samples/img_1.txt, appears to be lower-left, lower-right, upper-right, upper-left
  6. For each object tag, write a new line in the text file with the following format: x1, y1, x2, y2, x3, y3, x4, y4, transcription or x1, y1, x2, y2, x3, y3, x4, y4, ### (followed by a \n for newline)
  7. run python train.py with all the command line arguments set the way the "execution example" is setup, but change the value after --training_data_path= to your path