e.g. let's assume we have something like:
WOULD | YOU | LIKE | A | CUP | OF | TEA
w ʊ d | j uː | l a ɪ k | ə | k ʌ p | ʊ v | t iː
W UH D | Y UW | L AY K | AH | K AH P | AH V | T IY
And besides that I need to solve P2G problem, I also want to get some mapping of each phoneme and corresponding grapheme (letter or group of letters). Could you please help me to understand whether I can get this P2G correspondance in English using some python tools? Thanks a bunch in advance!
You can use CMU pronouncing dictionary and aspell or enchant spell checker. CMU pronouncing dictionary is a list of English words and their pronunciations, where each pronunciation is a list of phonemes.
The pronunciation dictionary can be downloaded in text format here: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
The raw text is not in a very useful format, so it is more convenient to download it already parsed. I used the cmudict.dict file from the CMU pronouncing dictionary on nltk.
You can also use enchant spell checker to check if a string of letters is a word. This is useful because the CMU pronouncing dictionary does not contain all possible words and has some errors.
If you have enchant installed, you can use the following code to test it: