I have been trying to work with the prosody pitch attribute but doesn't seem straightforward or seem to work. I want to create a simple "do re mi" following the g-major scale. The results do not turn out as expected using the various Hz values. Sometimes it seems to do what it wants regardless of what I put. Example:
<prosody pitch="0Hz">A</prosody><break time="100ms" />
<prosody pitch="+2st">E</prosody><break time="100ms" />
<prosody pitch="+4st">I</prosody><break time="100ms" />
<prosody pitch="+6st">O</prosody><break time="100ms" />
<prosody pitch="+8st">U</prosody><break time="100ms" />
Looking at all the alternatives, Amazon, Google, etc, they all say that Neural voices do not fully support pitch. I suspect the same with the SpeechSynthesizer, which explains the inconsistent results. Microsoft, please update your documentation accordingly.
The following MS documentation is not completely true:
Important Pitch contour changes are now supported with neural voices.