In my application, I need to record a conversation between people and there's no room in the physical workflow to take a 20 second sample of each person's voice for the purpose of training the recognizer, nor to ask each person to read a canned passphrase for training. But without doing that, as far as I can tell, there's no way to get speaker identification.
Is there any way to just record, say, 5 people speaking and have the recognizer automatically classify returned text as belonging to one of the 5 distinct people, without previous training?
(For what it's worth, IBM Watson can do this, although it doesn't do it very accurately, in my testing.)
If I understand your question right then Conversation Transcription should be a solution for your scenario, as it will show the speakers as
Speaker[x]and iterate for each new speaker, if you don't generate user profiles beforehand.You can get started with the real-time conversation transcription quickstart.