I have trained my own model for Urdu language using jtessboxeditor to create tiff/box file and then used Serak tesseract trainer for creating trainedata file, Model is recognizing urdu language but there are 2 issues mainly other than accuracy(accuracy will be tested after solving following 2 issues).
- model is not recognizing the spaces b/w the words.
- model is showing the text in LTR form (Urdu is RTL language, similar to arabic) I know that domain have very specific group of peoples but I just want a Hint to right direction so any help will be greatly appreciated. thanks in advance.