urdu strings looking same but in comparison found unequal python3

338 Views Asked by At

In my application, I've list of (Urdu) words in text file, (currently single word like this)

enter image description here

and I've another text file having string of urdu (currently single word like this and exactly same)

enter image description here

Now I need to find if string file's string has any word that exists in word's file. For this, I'm reading both file into lists like this;

// reading text file of strings...

fileToRead = codecs.open('string.txt', mode, encoding=encoding)
fileData = fileToRead.read()
lstFileData = fileData.split('\n')


wordListToRead = codecs.open('words.txt', mode, encoding=encoding)
wordData = wordListToRead.read()
lstWords = wordData.split('\n')

I'm simply traversing list like this;

for string in lstFileData:
    if string in lstWords:
        // do further work

and its not working And I don't know Why? Although string is 'فلسفے' and lstWords has this string in it. Do I need to add some encoding? Any kind of help will be appreciated.

2

There are 2 best solutions below

1
On BEST ANSWER

May be it helped out someone like me

Although it sounds like fun but Issue was in file encoding type. I opened up file in simple notepad to make some changes and saved it. It changed my file from utf-8 to utf-8 BOM. And my code wasn't working on it. Once I created new file in notepad++ in utf-8, Same code started working fine. (Because issue was not in code, it was in file encoding)

3
On

Just tried it out in python3 and it seems to work for me:

lstWords = ['a', 'فلسفے', 'b']
string = 'فلسفے'
if string in lstWords:
    print("yes")

Edit: Again, just tested your updated code with file IO and it works fine (I did not specify an encoding). Here is a link of it working: https://trinket.io/python3/3890d8b261