I'd like to get the gerund form of a string. I have not found a straightforward way to invoke a library to get the gerund.
I applied the rules for words ending in 'ing`, but because I am getting some errors due to exceptions. Then, I am checking against the cmu words to ensure the generated gerund word is correct. The code looks as follows:
import cmudict
import re
ing= 'ing'
vowels = "aeiou"
consonants = "bcdfghjklmnpqrstvwxyz"
words=['lead','take','hit','begin','stop','refer','visit']
cmu_words= cmudict.words()
g_w = []
for word in words:
if word[-1] == 'e':
if word[:-1] + ing in cmu_words:
g_w.append(word[:-1] + ing)
elif count_syllables(word) == 1 and word[-2] in vowels and word[-1] in consonants:
if word.__len__()>2 and word[-3] in vowels:
if word + ing in cmu_words:
g_w.append(word + ing)
else:
if word + word[-1] + ing in cmu_words:
g_w.append(word + word[-1] + ing)
elif count_syllables(word)>1 and word[-2] in vowels and word[-1] in consonants:
if word + word[-1]+ ing in cmu_words:
g_w.append(word + word[-1]+ ing)
else:
if word + ing in cmu_words:
g_w.append(word + ing)
print(g_w)
The rules are as follow:
when a verb ends in "e", drop the "e" and add "-ing". For example: "take + ing = taking".
when a one-syllable verb ends in vowel + consonant, double the final consonant and add "-ing". For example: "hit + ing = hitting".
When a verb ends in vowel + consonant with stress on the final syllable, double the consonant and add "-ing". For example: "begin + ing = beginning".
Do not double the consonant of words with more than one syllable if the stress is not on the final
Is there a more efficient way to get the gerunds of a string if exists?
Thanks
Maybe this is what you are looking for. Library called
pyinflectThere is a variety of tags available for getting inflections including the 'VBG' tag (Verb, Gerund) you are looking for.
Here is a sample implementation.
NOTE: The authors have setup a more sophisticated and benchmarked library which does both lemmatization and inflections called
LemmInflect. Do check this out if you want something more reliable than the above library. The syntax is pretty much the same as above.