parsing a sentence - match inflections and skip punctuation

Question

parsing a sentence - match inflections and skip punctuation

132 Views Asked by merav At 20 August 2025 at 22:31

I'm trying to parse sentences in python- for any sentence I get I should take only the words that appear after the words 'say' or 'ask' (if the words doesn't appear, I should take to whole sentence) I simply did it with regular expressions:

sen = re.search('(?s)(?<=say|Say).*$', current_game_row["sentence"], re.M | re.I)

(this is only for 'say', but adding 'ask' is not a problem...)

The problem is that if I get a sentence with punctuations like comma, colon (,:) after the word 'say' it takes it too. Someone suggested me to use nltk tokenization in order to define it, but I'm new in python and don't understand how to use it. I see that nltk has the function RegexpParser but I'm not sure how to use it. Please help me :-)

** I forgot to mention that- I want to recognize 'said'/ asked etc. too and don't want to catch word that include the word 'say' or 'ask' (I'm not sure there are such words...). In addition, if where are multiply 'say' or 'ask' , I only want to catch the first token in in the sentence. **

Original Q&A

There are 1 best solutions below

**Razzle Shazl** · Answer 1

Everything after a Keyword

We can deal with the unwanted punctuation by using \w to eat up all non-unicode.

sentence = "Hearsay? With masked flasks I said: abracadabra"

keys = '|'.join(['ask', 'asks', 'asked', 'say', 'says', 'said'])
result = re.search(rf'\b({keys})\b\W+(.*)', sentence, re.S | re.I)

if result == None:
    print(sentence)
else:    
    print(result.group(2))

Output:

abracadabra

case-sensitive: You have case-insensitive flag re.I, so we can remove Say permutation.

multi-line: You have re.M option which directs ^ to not only match at the start of your string, but also right after every \n within that string. We can drop this since we do not need to use ^.

dot-matches-all: You have (?s) which directs . to match everything including \n. This is the same as applying re.S flag.

I'm not sure what the net effect of having both re.M and re.S is. I think your sentence might be a text blob with newlines inside, so I removed re.M and kept (?s) as re.S

parsing a sentence - match inflections and skip punctuation

There are 1 best solutions below

Everything after a Keyword

Related Questions in PYTHON

Related Questions in PARSING

Related Questions in NLTK

Related Questions in TEXT-CHUNKING

Trending Questions

Popular # Hahtags

Popular Questions