I'm trying to figure out if there is a good way to merge two HMMs into one, when the underlying states are the same, but the observations aren't temporally linked.
I have two independent observation streams describing the same hidden state space. The underlying order of each observation stream remains the same, but they are not emitted at the same time.
For instance, say I have audio recordings of two separate speakers reading aloud the same passage of text, where the hidden state space becomes the letters in the text, while the stream of phonemes from each audio comprise the observation space. Each speaker records the audio separately, and use a different cadence when reading.
I can clearly make a prediction of the text using each speaker independently, and try and reconcile the results after the fact... but I sense that combining the observation streams into a single HMM may produce a better result.
Does anyone know a good way to reconcile this?
Merging the states would require aligning these streams first... ie some kind of log-likelihood optimization. But its possible to use statistics from multiple streams to predict the "observations" - modern data compressors basically do just that. Eg. see http://www.mattmahoney.net/dc/dce.html#Section_432