I was just reading the Q and A page on the Cepstral web site regarding SSML tags, and it just hit me that you already have a way to implement vocal gestures for Cepstral voices. By vocal gestures, I meen those sounds that we all make that are not words.
Now that there is a trend toward making synthetic voices more emotional, and since I have nothing better to do just this here now minute, I thought I would compile a list of vocal gestures that I might use from time to time with my Amy voice. I do realize that her voice model is not there to record the wave files that are needed. Anyway, as I said, I have nothing better to do.
Here's a list of vocal gestures that I would wish for Amy:
|| '''Gesture ID''' || '''Gesture content''' ||
|| g0001_001 || cough ||
|| g0001_002 || cough twice ||
|| g0001_003 || cough hard ||
|| g0001_004 || clear throat ||
|| g0001_005 || breath in ||
|| g0001_006 || sharp intake of breath ||
|| g0001_007 || breath in through teeth ||
|| g0001_008 || sigh happy ||
|| g0001_009 || sigh sad ||
|| g0001_010 || hmm question ||
|| g0001_011 || hmm yes ||
|| g0001_012 || hmm thinking ||
|| g0001_013 || umm ||
|| g0001_014 || ummm ||f|| g0001_015 || err ||
|| g0001_016 || errr ||
|| g0001_017 || giggle ||
|| g0001_018 || giggle sarcastic ||
|| g0001_019 || laugh small ||
|| g0001_020 || laugh big ||
|| g0001_021 || laugh demonic ||
|| g0001_022 laugh hysterical ||
|| g0001_023 cry small ||
|| g0001_024 cry big ||
|| g0001_025 || ah positive ||
|| g0001_026 || ah negative ||
|| g0001_027 || yeah question ||
|| g0001_028 || yeah positive ||
|| g0001_029 || yeah resigned ||
|| g0001_030 || sniff ||
|| g0001_031 || sniff twice ||
|| g0001_032 || argh ||
|| g0001_033 || arrgh ||
|| g0001_034 || ugh ||
|| g0001_035 || ocht ||
|| g0001_036 || yay ||
|| g0001_037 || oh positive ||
|| g0001_038 || oh negative ||
|| g0001_039 || sarcastic noise ||
|| g0001_040 || yawn ||
|| g0001_041 || yawn big ||
|| g0001_042 || snore ||
|| g0001_043 || snore phew ||
|| g0001_044 || zzz ||
|| g0001_045 || raspberry ||
|| g0001_046 || raspberry twice ||
|| g0001_047 || brrr cold ||
|| g0001_048 || snortf ||
|| g0001_049 || ha ha (sarcastic) ||
|| g0001_050 || doh ||
|| g0001_051 || gasp ||
I think this is how to implement them:
<audio src='clear throat .wav' />"Excuse me,<audio src='cough twice.wav'Hello/>"
Anyway, I think that I brought this up with Cepstral some years ago, when I first found out about Loquendo and how they did it, but no one took me quite seriously. I think that the voice models used for this task would have to, well, not be afraid to "let there hair down." Now that I think about it, I don't know if Amy's voice model would have been up to the task.