I'm working on a React project where I have a component, PresenterNotes, that displays a script and highlights words in real-time as they are recognized by a speech recognition service. For speech recognition, I'm utilizing the react-hook-speech-to-text library. The component listens for updates from words, an array that receives recognized words from the speech recognition service, and updates the highlighted word in the script accordingly.
The issue I'm facing is that the updates to words seem to be delayed or batched, causing multiple words to be highlighted at once, rather than in real-time as each word is recognized. Ideally, as soon as a word is recognized and added to the interimWords array, it should be highlighted in the component.
I'm looking for advice on how to ensure that the highlighting updates in real-time with each word recognized by the speech service. How can I improve the responsiveness of my component to interim results from the speech recognition service using react-hook-speech-to-text?
Any insights or suggestions on how to tackle this would be greatly appreciated. Thank you in advance for your help!
PresentationNoteSample.js
const carouselItems = useMemo(
() => [
{
slideIndex: 1,
noteindex: '1/1',
title: 'Greetings',
content:
'Good morning, everyone! Today, I want to explore the concept of "The Design Thinking Process in User Experience Design."',
},...
PresenterNotes.js
import React, { useState, useEffect, useRef } from 'react';
import { useSpeech } from '../SpeechContext';
import styled from 'styled-components';
import {
ScriptTitle,
FontSizeButton,
NotesWrapper,
PresenterNotesContainer,
Title,
Word,
BottomRightText,
Content,
HighlightedText,
} from './PresenterNotesStyled';
function PresenterNotes({
noteindex,
title,
content,
index,
isActive,
setActiveItemIndex,
totalItems,
isPresentationMode,
}) {
const { interimResult, words } = useSpeech();
const notesRef = useRef(null);
const [fontSizes, setFontSizes] = useState(() =>
new Array(totalItems).fill(16)
);
const contentWords = content.split(' ');
const displayContent = content.split(/(\s+|[.,!?:;])/).filter(Boolean); // The content to be displayed on the screen
const [currentWordIndex, setCurrentWordIndex] = useState(0);
const [matchedIndices, setMatchedIndices] = useState([]);
const [highlightedIndex, setHighlightedIndex] = useState(0);
const currentFontSize = fontSizes[index];
useEffect(() => {
// Create a new array from contentWords, removing punctuation and converting to lowercase
const processedContentWords = contentWords.map((word) =>
word.replace(/[.,!?:;"]/g, '').toLowerCase()
);
let newMatchedIndices = [];
let lastIndex = -1;
// Iterate through the words array to find indices that match with processedContentWords
words.forEach((word) => {
const wordIndex = processedContentWords.indexOf(
word.toLowerCase(),
lastIndex + 1
);
if (wordIndex !== -1 && !newMatchedIndices.includes(wordIndex)) {
newMatchedIndices.push(wordIndex);
lastIndex = wordIndex; // Update the starting point for the next search
}
});
setMatchedIndices(newMatchedIndices);
}, [words]);
useEffect(() => {
const intervalId = setInterval(() => {
setCurrentWordIndex((prevIndex) => (prevIndex + 1) % words.length);
}, 1000); // E.g., update index every second
return () => clearInterval(intervalId); // Cleanup
}, [words]);
useEffect(() => {
if (isActive && !isPresentationMode) {
const updateHighlightingBasedOnSpeech = () => {
// Here, execute the logic to update highlighting based on recognized words.
// Logic for matching recognized words with the script, updating matchedIndices state
// This logic will continuously update based on changes in the words state.
};
updateHighlightingBasedOnSpeech();
// If there is a highlighted element, adjust scroll to make it visible to the user.
const highlightedElement =
notesRef.current?.querySelector('.highlighted');
if (highlightedElement) {
const container = notesRef.current;
const containerHeight = container.clientHeight;
const elementHeight = highlightedElement.clientHeight;
const scrollPosition = container.scrollTop;
const elementTop = highlightedElement.offsetTop;
const elementBottom = elementTop + elementHeight;
if (elementTop < scrollPosition) {
container.scrollTo({ top: elementTop, behavior: 'smooth' });
} else if (elementBottom > scrollPosition + containerHeight) {
container.scrollTo({
top: elementBottom - containerHeight,
behavior: 'smooth',
});
}
}
// Automatically transition to the next slide when the last word is highlighted
if (
matchedIndices.length > 0 &&
matchedIndices[matchedIndices.length - 1] === contentWords.length - 1
) {
// If the last word is reached, transition to the next slide
if (index < totalItems - 1) {
setTimeout(() => {
// Add a slight delay before transitioning to the next slide
const nextIndex = (index + 1) % totalItems;
setActiveItemIndex(nextIndex);
}, 1000); // Transition to the next slide after 1 second
}
}
}
}, [
words,
isActive,
isPresentationMode,
matchedIndices,
contentWords.length,
index,
totalItems,
]);
// console.log('contentWords', contentWords);
const prevMatchedIndicesRef = useRef(); // Ref to store the previous matchedIndices value
const prevWordsRef = useRef(); // Ref to store the previous words value
useEffect(() => {
// Execute console.log only when matchedIndices or words change
if (
prevMatchedIndicesRef.current !== matchedIndices.join(',') ||
prevWordsRef.current !== words.join(',')
) {
console.log('PresenterNotes-interimResult', interimResult);
console.log('words', words);
console.log('matchedIndices:', matchedIndices);
console.log(
'Last matched index:',
matchedIndices[matchedIndices.length - 1]
);
console.log('contentWords length:', contentWords.length);
// Update previous values to current values
prevMatchedIndicesRef.current = matchedIndices.join(',');
prevWordsRef.current = words.join(',');
}
}, [words]); // Detect changes in matchedIndices and words
const increaseFontSize = () => {
setFontSizes((prevSizes) => {
return prevSizes.map((size) => Math.min(size + 2, 26)); // Increase font size by 2px for all slides
});
};
const decreaseFontSize = () => {
setFontSizes((prevSizes) => {
return prevSizes.map((size) => Math.max(size - 2, 14)); // Decrease font size by 2px for all slides
});
};
const goToPreviousNote = () => {
// Stop previous slide move if it's the first PresenterNote
if (index > 0) {
setActiveItemIndex((index - 1 + totalItems) % totalItems);
}
};
const goToNextNote = () => {
// Stop animation & next move if it's the last PresenterNote
if (index < totalItems - 1) {
setActiveItemIndex((index + 1) % totalItems);
}
};
return (
<>
<ScriptTitle>
<h2
style={{
fontSize: '25px',
marginLeft: '10px',
display: 'inline-block',
}}
>
Your Script
</h2>
<div>
<FontSizeButton onClick={increaseFontSize}>+</FontSizeButton>
<FontSizeButton onClick={decreaseFontSize}>-</FontSizeButton>
<FontSizeButton onClick={goToPreviousNote}>◀︎</FontSizeButton>
<FontSizeButton
onClick={goToNextNote}
disabled={index === totalItems - 1}
>
▶︎
</FontSizeButton>
</div>
</ScriptTitle>
<NotesWrapper>
<PresenterNotesContainer
ref={notesRef}
style={{ fontSize: `${currentFontSize}px` }}
>
<Title>{title}</Title>
<Content>
{contentWords.map((word, idx) => (
<Word
key={idx}
className={matchedIndices.includes(idx) ? 'highlighted' : ''}
highlighted={matchedIndices.includes(idx)}
>
{word}
</Word>
))}
</Content>
<BottomRightText>{noteindex}</BottomRightText>
</PresenterNotesContainer>
</NotesWrapper>
</>
);
}
export default PresenterNotes;
SpeechContext.js
import React, { createContext, useState, useContext, useEffect } from 'react';
import useSpeechToText from 'react-hook-speech-to-text';
import axios from 'axios';
const SpeechContext = createContext();
export const useSpeech = () => useContext(SpeechContext);
export const SpeechProvider = ({ children }) => {
const [speechResults, setSpeechResults] = useState([]);
const [words, setWords] = useState([]);
const [interimWords, setInterimWords] = useState([]);
const { error, startSpeechToText, stopSpeechToText, results, interimResult } =
useSpeechToText({
continuous: true,
useLegacyResults: false,
speechRecognitionProperties: { lang: 'en-US', interimResults: true },
});
// startRecording, stopRecording
const startRecording = () => {
navigator.mediaDevices
.getUserMedia({ audio: true })
.then((stream) => {
console.log('Microphone access has been granted.');
axios
.post('http://localhost:8000/start_recording')
.then((response) => {
console.log(response.data);
startSpeechToText();
setSpeechResults([]);
console.log('interimResult', interimResult);
})
.catch((error) => {
console.error('Error starting recording:', error);
});
})
.catch((err) => {
console.error('Microphone Access Denied or Error:', err);
});
};
const stopRecording = () => {
axios
.post('http://localhost:8000/stop_recording')
.then((response) => {
console.log(response.data);
stopSpeechToText();
localStorage.setItem('savedWords', JSON.stringify(words));
setSpeechResults([]);
setWords([]);
})
.catch((error) => {
console.error('Error stopping recording:', error);
});
};
useEffect(() => {
if (results.length > 0) {
setSpeechResults(results.map((result) => result.transcript));
}
}, [results]);
// console.log('speechResults', speechResults);
useEffect(() => {
const newWords = speechResults
.flatMap((result) => result.split(' '))
.filter((word) => word.trim().length > 0);
setWords(newWords);
}, [speechResults]);
useEffect(() => {
if (interimResult) {
const interimWords = interimResult
.split(' ')
.filter((word) => word.trim().length > 0);
}
}, [interimResult]);
return (
<SpeechContext.Provider
value={{
interimResult,
speechResults,
words,
startRecording,
stopRecording,
}}
>
{children}
</SpeechContext.Provider>
);
};
My goal is to highlight each word in the script immediately as it is recognized.
Here's an overview of my implementation approach:
- Speech Recognition Setup: I'm utilizing react-hook-speech-to-text to capture real-time speech and store recognized words in an array called words.
- Highlighting Logic: In the PresenterNotes component, I compare each word from the words array to the content of a carousel (converted to an array), and if the index matches, the word is highlighted.
Issue Encountered: The main challenge I'm facing is with the real-time aspect of highlighting words. Currently, words are not highlighted immediately as they are recognized; instead, there's a delay between a word being recognized (and added to the words array) and the word being highlighted in the component. This results in a batch-like highlighting rather than an individual, real-time update.
Attempts to Resolve:
I've tried directly mapping interimResult to the words array, assuming this would make the highlighting more responsive to the real-time speech input. I've ensured that the component's state is updated based on the words array to reflect changes in the speech recognition results. Despite these efforts, the immediate, real-time highlighting of words as they are spoken has not been achieved. The updates seem delayed, causing a less interactive and engaging user experience.