I am trying to build a real time speech recognizer with azure SDK and FastAPI with websocket, I am sending base64 encoded binary string as input, The azure session starts recognizes text and prints in connected events, but I want to send back recognized text to websocket so I have a callback ,but looks like the print inside callback is working but the send is not working.
Please let me know what is the issue if some oen is able to help
async def process_stream(stream,data,speech_recognizer,websocket):
def recognized_callback(evt):
recognized_text = evt.result.text
print("I am in websocket callback : " + str(recognized_text)+" "+str(websocket))
async def send_data():
await websocket.send_text(recognized_text)
asyncio.gather(send_data())
# The number of bytes to push per buffer
n_bytes = 4096
data = io.BytesIO(data)
speech_recognizer.recognized.connect(recognized_callback)
# Start pushing data until all data has been read from the file
try:
speech_recognizer.start_continuous_recognition()
while True:
frames = data.read(n_bytes // 2)
print('read {} bytes'.format(len(frames)))
if not frames:
speech_recognizer.stop_continuous_recognition()
break
stream.write(frames)
await asyncio.sleep(0.03)
finally:
stream.close()
@app.websocket("/asr/en")
async def root(websocket:WebSocket):
await websocket.accept()
audio_format = AudioStreamFormat(
channels=1,
samples_per_second=16000,
bits_per_sample=16
)
stream = speechsdk.audio.PushAudioInputStream(audio_format)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config,
audio_config=speechsdk.audio.AudioConfig(stream=stream))
try:
while True:
# Receive audio data from the client
data = await websocket.receive_bytes()
break
await process_stream(stream,data,speech_recognizer,websocket)
except Exception as e:
logger.exception(f"An error occurred: {e}")
LOGS:
INFO: Shutting down INFO: Waiting for application shutdown. INFO: Application shutdown complete. INFO: Finished server process [62395] INFO: Started server process [62529] INFO: Waiting for application startup. INFO: Application startup complete. INFO: ('127.0.0.1', 53888) - "WebSocket /asr/en" [accepted] INFO: connection open SESSION STARTED: SessionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=4011e2273ad742aa9e2df99eb3e8a854, text="thank you for", reason=ResultReason.RecognizingSpeech)) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=8bc91a19b5c2433d9fcc4dbe5125ea9c, text="thank you for contact", reason=ResultReason.RecognizingSpeech)) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=8ad3f7d5a8dc40e09fdab8dbaa13fc89, text="thank you for contacting", reason=ResultReason.RecognizingSpeech)) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=49d386798e3d4ebb8172a9b681d929fc, text="thank you for contacting us", reason=ResultReason.RecognizingSpeech)) RECOGNIZED: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=f1d16bcf07da4b4ea2345e38b35c0300, text="Thank you for contacting us.", reason=ResultReason.RecognizedSpeech)) /Users/parikshit.mukherjee/PycharmProjects/pythonProject/./main.py:37: RuntimeWarning: coroutine 'root..send_text_async' was never awaited send_text_async(evt.result.text) RuntimeWarning: Enable tracemalloc to get the object allocation traceback RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=a16037fd2bec4790be32ec31b7430126, text="lands", reason=ResultReason.RecognizingSpeech)) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=508fa0fc387c43779c548dcc966133f7, text="lands had", reason=ResultReason.RecognizingSpeech)) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=8de79e5f5ac54b919cb8987dde425bae, text="yan's incorrectly", reason=ResultReason.RecognizingSpeech)) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=bfa9673b38be481d9a07263df3643794, text="lands had currently busy", reason=ResultReason.RecognizingSpeech)) RECOGNIZED: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=382841509bba4ae68d5507cd594f2a9d, text="Yan's incorrectly busy.", reason=ResultReason.RecognizedSpeech)) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=5c05c9b20acc4d679a547bef5c97b473, text="how", reason=ResultReason.RecognizingSpeech)) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=c3b7c690ff61450ca2cb02f9a17d70e5, text="how pain", reason=ResultReason.RecognizingSpeech)) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=b1ee047ae4c143b78f17b75e6d306dca, text="how pain is", reason=ResultReason.RecognizingSpeech)) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=f396e7705c894f63a485378d0490d7f1, text="how pain is very", reason=ResultReason.RecognizingSpeech)) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=b502f32f7a1d47508a9af27b47abe24f, text="how pain is very important", reason=ResultReason.RecognizingSpeech)) RECOGNIZING: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=618e549690534b7ca20e07d087de202f, text="how pain is very important to us", reason=ResultReason.RecognizingSpeech)) RECOGNIZED: SpeechRecognitionEventArgs(session_id=02f165702334418a8635a40c4c16ea1d, result=SpeechRecognitionResult(result_id=8af7a291290e45329b38bd172a1ddf65, text="How pain is very important to us.", reason=ResultReason.RecognizedSpeech)) /Users/parikshit.mukherjee/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/azure/cognitiveservices/speech/speech.py:652: RuntimeWarning: coroutine 'root..session_stopped_cb' was never awaited cb(payload) RuntimeWarning: Enable tracemalloc to get the object allocation traceback INFO: connection closed
I tried the FastAPI with WebSocket code below to convert speech-to-text using a Dockerfile.
Code :
Output :
The following code ran successfully:
I received the text output with the input base64 data as follows:
Next, I added the Dockerfile below to the code:
Dockerfile :
I successfully built, ran, and checked the logs of the Dockerfile using the following commands: