getting same junk when extracting hindi / devnagri text from pdf by pdftotext or pdfparser

119 Views Asked by KJA At 18 April 2019 at 05:47

I am using php Pdfparser and pdftotext to extract hindi/ devnagri text from pdf. But I am getting the same kind of junk or garbage using both of the above mentioned.

Junk, for example :

f{kfrt114; rhanz feJ dk tUe lu~ 1977 esa v;ksè;k (mÙkj izns"k) esa gqvkA mUgksaus y[kumQ fo"ofo|ky;] y[kumQ ls ¯gnh esa ,e-,- fd;kA os vktdy Lora=k ys[ku osQ lkFk v¼Zokf"kZd lfgr if=kdk dk laiknu dj jgs gSaA lu~ 1999 eas lkfgR; vkSj dykvksa osQ lao¼Zu vkSj vuq"khyu osQ fy, ,d lkaLÑfrd U;kl ^foeyk nsoh iQkmaMs"ku* dk lapkyu Hkh dj jgs gSaA ;rhanz feJ osQ rhu dkO;&laxzg izdkf"kr gq, gSaμ;nk&dnk] v;ksè;k rFkk vU; dfork,¡] M~;ks<+h ij vkykiA blosQ vykok "kkL=kh; xkf;dk fxfjtk nsoh osQ thou vkSj laxhr lk/uk ij ,d iqLrd fxfjtk fy[khA jhfrdky osQ vafre izfrfuf/ dfo f}tnso dh xzaFkkoyh (2000) dk lg&laiknu fd;kA oq¡Qoj ukjk;.k ij osaQfnzr nks iqLrdksa osQ vykok fLid eSosQ osQ fy, fojklr&2001

If I paste this junk in google it shows the correct hindi page. May be the garbled words are correct but it is in a different language.

If anybody can support to extract the exact readable text from pdf to text.

Original Q&A

getting same junk when extracting hindi / devnagri text from pdf by pdftotext or pdfparser

There are 0 best solutions below

Related Questions in PHP

Related Questions in PDF

Related Questions in PDFTOTEXT

Related Questions in PDFPARSER

Trending Questions

Popular # Hahtags

Popular Questions