I am writing a program in which there are sometimes lists that create issues during voice synthesis, for example the output from voice synthesis is displayed as follows: "Suggestions for restaurants:1. Pizza2. Burger3. Sushi4. Noodles...". The voice synthesis interprets the numbers as part of the word, resulting in awkward pronunciation. To resolve this, whitespaces should be inserted between the numbers and the words. Additionally, the output should not be too lengthy; it would be better to limit the list to the first three suggestions.
I have tried this code:
import re
def post_processing(text):
"""
Post-processes a text string to address formatting issues for voice synthesis.
Args:
text: The input text string.
Returns:
The processed text string.
"""
# Process lists with improved handling
parts = text.split(":")
if len(parts) > 1:
# Split based on newlines, limiting to 3 items
items = parts[1].strip().split("\n")[:3]
# Remove trailing spaces, handle punctuation, and add spaces correctly
items = [
f"{item.strip()[:-1].rstrip('.')}{' ' if item.strip()[-1].isdigit() or item.strip()[-1] == '.' else ''}{item.strip()[-1:]}"
for item in items
]
text = ": ".join(items)
else:
text = text.strip() # Remove leading/trailing whitespace
# Remove URLs completely
text = re.sub(r"https?://\S+", "", text)
return text
So when I input the following as input: text = "Suggestions for restaurants: 1 . Pizza2. Burger3. Sushi4. Noodles...." text = post_processing(text)
Following output should be there: print(text) # Output: 1. Pizza 2. Burger 3. Sushi
but what I get is as follows: 1 . Pizza2. Burger3. Sushi4. Noodles .
If you want to ensure there's a space both before and after the list numbering, you can adjust the formatting in the list comprehension. Try this