I want to find a substring like (میں چند ممالک ایک ایسے گیا
) from a paragraph
but the paragraph line is not exactly same to the substring line so if more than two words are match from the line of the paragraph give that line as match line
fullstringlist =(" ادھر کی رات صرف چار گھنٹے کی ہے- جہاں دن کا دورانیہ بیس گھنٹے تک ہے- میں چند ایک ممالک ایسے گیا ")
test_list = fullstringlist.split("-")
print("The original list is : " + str(test_list))
subs_list = ['ادھر رات صرف چار گھنٹے کی ہے','میں چند ممالک ایک ایسے گیا']
res = []
for sub in test_list:
flag = 0
for ele in subs_list:
# checking for non existence of
# any string
if ele not in sub:
flag = 1
break
if flag == 0:
res.append(sub)
# printing result
print("The extracted values : " + str(res))
You can achieve that using
Threshold
variable that indicates half number of words plus one in each substring.Example:
ادھر رات صرف چار گھنٹے کی ہے
contains 7 words so its threshold about 5 words, if we find 5 matches words or more we will consider it a match substringOutput:
Note: you can change threshold calculation approach according to your requirements.