Tesseract - tesstrain can't find ground truth txt files

102 Views Asked by At

I'm following the tesstrain readme at https://github.com/tesseract-ocr/tesstrain.

When I run make training, I get the following error:

File not found - *.gt.txt
File not found - *.gt.txt
    You are using make version: 4.4.1
Makefile:224: *** found no data/foo-ground-truth/*.gt.txt for data/foo/all-gt.  Stop.

I don't understand this error, because I have triple-checked that the sample data (which includes many .gt.txt files) is in data/foo-ground-truth.

Here's what I've done so far that the readme says to do:

  • I've installed everything it says I need (make, wget, find, bash, unzip, bc) and added them all to my PATH.
  • I cloned the tesstrain repo, made a subdirectory data/foo-ground-truth, and unzipped the contents of ocrd-testset.zip into it.
  • I ran make tesseract-langdata. This successfully added a bunch of unicharset files to data/langdata.

Any ideas why it might not be able to find the .gt.txt files that are in the proper directory? I've hit a wall on my troubleshooting.

I'm on Windows 10 and I have Make version 4.1.1. and Python version 3.11.5.

1

There are 1 best solutions below

0
On BEST ANSWER

I found the answer. C:/Program Files/Git/usr/bin (which contains find.exe) needs to be first in the PATH, and I had added it to the top of my User PATH, which is listed after the System PATH. Once I added it to the top of my System PATH, everything worked.