I'm following the tesstrain readme at https://github.com/tesseract-ocr/tesstrain.
When I run make training
, I get the following error:
File not found - *.gt.txt
File not found - *.gt.txt
You are using make version: 4.4.1
Makefile:224: *** found no data/foo-ground-truth/*.gt.txt for data/foo/all-gt. Stop.
I don't understand this error, because I have triple-checked that the sample data (which includes many .gt.txt files) is in data/foo-ground-truth.
Here's what I've done so far that the readme says to do:
- I've installed everything it says I need (
make
,wget
,find
,bash
,unzip
,bc
) and added them all to my PATH. - I cloned the tesstrain repo, made a subdirectory data/foo-ground-truth, and unzipped the contents of ocrd-testset.zip into it.
- I ran
make tesseract-langdata
. This successfully added a bunch of unicharset files to data/langdata.
Any ideas why it might not be able to find the .gt.txt files that are in the proper directory? I've hit a wall on my troubleshooting.
I'm on Windows 10 and I have Make version 4.1.1. and Python version 3.11.5.
I found the answer. C:/Program Files/Git/usr/bin (which contains find.exe) needs to be first in the PATH, and I had added it to the top of my User PATH, which is listed after the System PATH. Once I added it to the top of my System PATH, everything worked.