Replace string only if all characters match (Thai)

124 Views Asked by At

The problem is that มาก technically is in มาก็. Because มาก็ is มาก + ็.

So when I do

"แชมพูมาก็เยอะ".replace("มาก", " X ")

I end up with

แชมพู X  ็เยอะ 

And what I want

แชมพู X เยอะ 

What I really want is to force the last character ก็ to count as a single character, so that มาก no longer matches มาก็.

1

There are 1 best solutions below

0
On

While I haven't found a proper solution, I was able to find a solution. I split each string into separate (combined) characters via regex. Then I compare those lists to each other.

# Check is list is inside other list
def is_slice_in_list(s,l):
    len_s = len(s) #so we don't recompute length of s on every iteration
    return any(s == l[i:len_s+i] for i in range(len(l) - len_s+1))

def is_word_in_string(w, s):
    a = regex.findall(u'\X', w)
    b = regex.findall(u'\X', s)
    return is_slice_in_list(a, b)

assert is_word_in_string("มาก็", "พูมาก็เยอะ") == True
assert is_word_in_string("มาก", "พูมาก็เยอะ") == False

The regex will split like this:

พู ม า ก็ เ ย อ ะ
ม า ก

And as it compares ก็ to ก the function figures the words are not the same.

I will mark as answered but if there is a nice or "proper" solution I will chose that one.