Combining phrases from list of words Python3

48 Views Asked by At

doing my best to grab information out of a lot of pdf files. Have them in a dictionary format where the key is a given date and the values are a list of occupations.

looks like this when proper:

'12/29/2014': [['COUNSELING',
                 'NURSING',
                 'NURSING',
                 'NURSING',
                 'NURSING',
                 'NURSING']]

However, occasionally there are occupations with several words which cannot be reliably understood in single word-form, such as this:

'11/03/2014': [['DENTISTRY',
                 'OSTEOPATHIC',
                 'MEDICINE',
                 'SURGERY',
                 'SOCIAL',
                 'SPEECH-LANGUAGE',
                 'PATHOLOGY']]

Notice that "osteopathic medicine & surgery" and "speech-language pathology" are the full text for two of these entries. This gets hairier when we also have examples of just "osteopathic medicine" or even "medicine."

So my question is this - How should I go about testing combinations of these words to see if they match more complex occupational titles? I can use the same order of the words, as I have maintained that from the source.

Thanks!

0

There are 0 best solutions below