I want to separate phonetic, word break and word join keywords from a list of keywords using Python. Example:
Input List:
rice1kg
oil
cooking oil
oliv oil
flour5kg
buther
baking povder
Leg umes
Expected Output:
rice 1kg - word break
olive oil - phonetic
flour 5kg - word break
butter - phonetic
baking powder - phonetic
legumes - word join
This is a very complicated problem without a definite answer. In it's simplest form it essentially breaks down into categorising strings into three categories:
wo rdwordbreaktoyletThere are a large number of problems that would need to be solved to complete this categorisation. I'll list some of them below:
pullover. Is that correctly spelled, ie.pullover: a knitted garment, or a word break, ie.pull over? This is just an example, but there are a lot of cases where a string could be placed logically into multiple categories.200mm,5kg,20", etc.oopronounced as it is in the wordfood, or the wordgood? The English language is stitched together from many sources, with borrowed words and linguistic roots in multiple places, so it doesn't follow a set of fixed rules when pronouncing words. This, in my opinion, is what makes this task so difficult.I can't offer solutions to these problems, but knowing what challenges you'll face when you approach this task is a good way to start.