Race conditions & modulated syllables.

How-to questions and answers

Race conditions & modulated syllables.

Postby tcrop » Fri Dec 15, 2006 11:30 am


I'm evaluating Cepstral_Callie_i386-linux_4.1.4.

cpu: AMD Athlon
cpu MHz: 900.202
MemTotal: 906276 kB

First, your voices are very nice and I hope that I can work out some issues I'm having.

The following is an example of text I have trouble understanding.

"If the questions are proper we get proper answers, but if they are not proper questions we get indefinite answers or no answers at all."

I believe what makes it a little difficult to understand is that "proper we" get run together. I find that I can correct this with the insertion of a comma. But since I don't compose the text that I would like read, I'm wondering what other options are available to make this more understandable. An example would be helpful.

Another issue that I am experiencing is with somewhat of a shift in tone during the syllables of a word or perhaps it can be described as a hollow vibrato or rolling effect depending on the phrase. Anyway I hear it in the following examples.

"subtle yet"
"to do so we use all means"
"belief of many that numerous"


Posts: 3
Joined: Fri Dec 15, 2006 1:22 am

Inter-Word Effects

Postby TaoPhoenix » Sat Jan 20, 2007 9:23 am

I think TextAloud has an option that allows for fractional-second delays between words. Would that kind of option help your users tell the individual words apart?

Tone is a little harder to handle. I think there are some advanced prounciation editing options out there that affect tone, but you will have to research a while to make your choices.

My key app is making my own audiobooks. When a particular word pair has proven bumpy, I either substitute a made up word that comes out closer to the way I want, use punctuation, or even alter the order of words slightly.
Refuse "1984" and "Fahrenheit 451".
Posts: 9
Joined: Mon Jan 15, 2007 11:18 pm

Postby MultiPort » Wed Jan 31, 2007 6:26 pm

Tcrop's post is good because it touches on multiple aspects of pronunciation modification.

The recommended way to add control to Cepstral voices is via SSML (Speech Synthesis Markup Language). Adding a break tag should be pretty straight forward. (Note- SSML is not compatible with Microsoft SAPI. However, as TaoPhoenix points out, many 3rd party SAPI based TTS apps provide some equivalent. The SwiftTalker program bundled with Cepstral's Windows voices is an exception and natively handles SSML.)

Another point to raise is that all TTS voices are sensitive to punctuation and word order and verb choice, etc. Some pronunciation issues might be handled by massaging the prompt. Cepstral puts out several new releases each year. These releases typically involve work on the voice databases and may clear up some issues. For instance, version 4.2 is scheduled for early February 2007. To see your version, type "swift --voices" on a command line.

Yet another free option available to users is to create their own custom lexicon entries. Here's a link describing the phonetic structure of Cepstral's lexicon.

For a fee, Cepstral offers professional services and will perform voice tuning. Cepstral's internal tools provide deeper control over smoothing and tuning. Tuning may even mean having the original human talent record certain prompts and then add them into the voice database.


Return to How do I... ?

Who is online

Users browsing this forum: No registered users and 1 guest