I am a beginner currently using the Tika 2.9.1 server version and need the output of the OSD in my metadata, particularly the value of the script (Latin, Cyrillic, etc.). So my questions are the following: Does my server version of Tika integrate it? Is it possible? If yes, how can I configure my Tika server? Thanks for your work (and also english is not m'y native language)
I found this topic but i don't see how i can integrate it to my Dockerfile to build an image that will allow me to return the content of osd from tesseract in the metadata after a request to tika server. https://github.com/apache/tika/pull/246/commits/8eb7f93324b20a641b488a4b2d64731db39e717c#diff-8e0377396ab503c58862153ead9a186b611d715d8c2e2025874ae07a4e27c565
Ok problem solved, i used a custom tika config yml file to set psm 0 and in thé rmeta i get the content of thé osd script.