why is Twilio Connect getting weird voice latency with stream(websocket)-python? and how can I fix this?

49 Views Asked by At

I have a code which receives a call from twilio and responds using gpt, deepgram and elevenlabs. (It works fine but has latency when streaming audio back, The latency is in getting my voice back to the speech to text function).

it establishes a connection with websocket to send in and out data, everything works fine the first time, after I stream the first audio back to the call, I get 5 second latency to get my voice back to the program.

I think streaming the audio response to the call is generating some latency or I'm doing something wrong.

#note If I run the wait_for_user_input alone if the loop it gets the transcript with no latency, the problem is when I stream the audio to twilio call.

@application.websocket('/stream')
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    while True:
        #get speach to text recon using deepgram
        messages, stream_sid = await wait_for_user_input(websocket,lang="es-MX")
        print(str(f"user input: {messages}"))
        
        #send text to gpt -> stream it to elevenlabs-> stream it back to twilio
        await chat_completion(messages, websocket, stream_sid, model='gpt-3.5-turbo')
        

Stream call to twilio:

async def stream(audio_stream, twilio_ws, stream_sid):
    async for chunk in audio_stream:
        if chunk:
            audio = AudioSegment.from_file(io.BytesIO(chunk), format='mp3')
            if audio.channels == 2:
                audio = audio.set_channels(1)
            resampled = audioop.ratecv(audio.raw_data, 2, 1, audio.frame_rate, 8000, None)[0]
            audio_segment = AudioSegment(data=resampled, sample_width=audio.sample_width, frame_rate=8000, channels=1)
            pcm_audio = audio_segment.export(format='wav')
            pcm_data = pcm_audio.read()
            ulaw_data = audioop.lin2ulaw(pcm_data, audio.sample_width)
            message = json.dumps({'event': 'media', 'streamSid': stream_sid,
                                  'media': {'payload': base64.b64encode(ulaw_data).decode('utf-8'), }})
            await twilio_ws.send_text(message)

I tried using another speech to text solution and I end up with the same problem.

UPDATE: I did more tests, there is no lag or extra latency, somehow the elevelabs audio is being streamed for a few seconds extra in silence. I disrupted the audio to hear the silence and there it is, maybe I'm having an issue while I stream the audio back to the call?

0

There are 0 best solutions below