I know, Many people have asked about splitting sentence questions. But my question is slightly different. I got some unknown character in String data (unknown for me, and looks like tab character) and I am trying to use it as delimiter for splitting.
Source Text is : (* try to select the blank spaces portion, may see effect)
The President Profile of the President Swearing in of the President Assets of the President Speeches Speeches Foreign Visits Press Releases Gallery Photo Gallery Video Gallery Rashtrapati Bhavan Panoramic View
I was thinking the that blank space portion may be tab character. but I was wrong. I tried to match with tab but no effect.
Then I opened this string in Notepad ++ and set true to show all character. There I found this character. Kindly refer below image.
In above digram, One can clearly see something arrow symbol ("----->") in orange color, which symbol is this? and Its width is not fixed. So how can I split some sentences? is anybody face this problem?
Unknowingly I got the answer. The spaces or arrow shows in above pics is nbsp; Html Entity. That is why I was unable to break the sentence. The above shown output came from Tika parser where I tried to hit html url and extract the html page data. Finally break it into sentences.