|
Known Issues under Mac OS X
Speech Manager Implementation:
- The voices may go silent due to the system's audio properties having been adjusted to incompatible settings. For instructions on how to correct this, please see "Why have my Cepstral voices gone silent" in our FAQ.
- Cepstral voices stop speaking sometimes when used in VoiceOver. They will not speak again through VoiceOver until you log out and log back in. This problem occurs mostly if you use Command+Tab to switch between tasks or if you use Ctrl+Option+D to move the cursor to the dock and cycle through dock items.
- When reading text copied from Safari, line breaks may occasionally be pronounced as question marks. This is due to a limitation in Apple's Speech Synthesis Manager.
- On longer texts, the timing of callbacks (and thus word highlighting, etc.) falls slightly behind.
- Word highlighting text positions are wrong when the volume is not set to 1.0.
- Text position information passed to user callbacks is incorrect with certain settings, causing word highlighting to break.
- Apple's pronunciation dictionaries are not supported. You can still use cross-platform Cepstral user lexicons, however.
- Synthesis to phonemes -- the TextToPhonemes() API call -- is not supported.
- Error and synchronization callbacks are not supported.
- When stopping speech, the synthesizer will always halt immediately, instead of waiting until the next word or sentence when requested.
- Tuned phonetic input -- the [[inpt tune]] embedded command -- is ignored.
- In phonemes mode, all prodosic control symbols, with the exception of primary syllable stress, are ignored.
- Some linebreaks are pronounced as "dot question mark."
- If you use the Migration Assisant to transfer data and programs from one Mac to another, including Cepstral voices, your Cepstral voices may not function properly on the new computer.
- If you enable "Announce the Time" under Date & Time preferences and set it to use a Cepstral voice, you cannot change the voice afterwards, even to a different Cepstral voice. Even if you set it to "Use System Voice," it continues to use your selection. Logging out and/or restarting do not solve the problem.
SSML Implementation:
- "interpret-as" attributes of "<say-as>" elements are not properly handled.
- If the input text contains back-to-back <break> SSML elements, swift only handles the last one in the list. For instance, the following only pauses for three seconds instead of five:
"This is a test and now there should be a 5 second pause <break time="1s" /> <break time="1s" /> <break time="3s" /> Now i am back after a 5 second pause."
- RIFF audio files with a sampling rate that differs from the native sampling rate of the voice are not correcly resampled when specified using the <audio> SSML element.
- Applications using Swift TTS exit silently when a sentence that begins with an SSML <break> or <mark> tag is fed to a non-US English voice for synthesis.
- Swift does not currently support the xml:lang attribute of SSML tags that make use of it, such as <speak>, <voice>, etc.
- Swift does not reject bad SSML markup. It will complain about blatant syntactic errors, such as missing close tags, but not about semantic errors, such as invalid tags and attributes.
- The SSML <s> and <p> tags may not have the proper effect. This is supposed to be read as two sentences, but we read it as one:
"<speak>one sentence <s>and another</s></speak>"
- Swift does not currently support the SSML <lexicon> tag for adding additional lexicons.
- The SSML tag:attribute pair <voice gender="neutral"> should select a neuter-gendered voice, but it currently does nothing.
- The "age" attribute to the SSML <voice> tag is looking for an exact match, when it probably ought to be looking for the closest match. For instance, <voice age="10"> will only select a ten-year-old voice, or fall back to the default voice if one is not found.
- Cannot provide multiple voice selections in the SSML <voice> tag. The tag <voice name="David William"> should request David, and use William as a fallback. Instead, Swift interprets as a request for a voice named "David William."
- The default SSML voice may not be the default voice for that gender. To illustrate, Swift may switch voices while executing the following SSML code, when it should not. (Assume the default voice is David.)
<speak>This is the default voice, which should also be the <voice gender="male">default voice</voice></speak>
- Changing voice in SSML resets prosody and pitch. In the following examples, the <voice> tag should not reset the values of the <prosody> tag:
"<prosody range='-95%'><voice name='David'>The cat jumped over the moon.</voice></prosody>"
-and-
"<prosody pitch='+200Hz'><voice name='David'>The cat jumped over the moon.</voice></prosody>"
- The duration attribute of the SSML <prosody> tag is not supported. The following examples should play for exactly 1 second:
"<prosody duration='1s'>hello world</prosody>"
-and-
"<prosody duration='1000ms'>hello world</prosody>"
- Prosody as specified in the SSML prosody element is not applied to output generated using the SSML phoneme element. For example:
"<prosody rate='-55%'>Test. <phoneme ph='t eh1 s t'> Test.</phoneme></prosody>"
- Pitch contours passed in via the SSML <prosody> tag with values not falling between 0% and 100% are supposed to be ignored. Swift attempts to synthesize with these bogus values.
- SAMPA stress markers in SSML <phoneme> tag not supported.
- The SSML <mark> tag causes a crash in Canadian French, Americas Spanish, and Italian voices.
- Text position synthesis event information is incorrect when the <sub> SSML element is encountered.
Swift TTS Engine:
- audio/sampling-rate parameter is ignored when synthesizing to the audio device. If you are outputting to an audio file, this parameter is honored.
- Swift cannot play .au audio files.
API issues:
- swift_port_stop() and swift_port_pause() take a parameter of an event at which to stop, such as SWIFT_EVENT_SENTENCE to stop before the next sentence. However, it stops or pauses more-or-less immediately regardless of what event you pass it.
- The global "swift_version" variable exported through the Swift API shows the version to be "4.1.0-beta." It is not a beta build; This should show "4.1.0-release."
swift command line tool:
- Setting the 'audio/sampling-rate' parameter does not resample if the output is played to the audio device. If, however, you use it in conjunction with the '-o output.wav' option to direct the output to a wave file, the audio is indeed resampled.
Installation:
- Licensed voices become unlicensed when you upgrade the voice to the current version. You must re-enter your license information for the voice in the "Cepstral Voices" section of System Preferences.
- The voice's lexicon.txt file is overwritten when you upgrade to the current version.
- Example SFX files are not included in the Macintosh installers. You can download them here. For more information on how to use the SFX files, please see this entry in our FAQ.
Third Party Applications:
- Adobe Reader: In Adobe Reader 7, the Read Out Loud feature produces overly rapid speech by default. This issue is not specific to Cepstral voices. It can be worked around by going to the Reading pane in Reader's preferences, deselecting "Use default speech attributes," and setting the rate to 170 words per minute.
- iChat: If you use the "Speak text" feature in the Actions preference pane of iChat, it may later stick after you try to turn it off. This is a documented known issue in iChat. The complete issue report can be found here.
I have an issue which isn't covered here
For all other technical support inquiries, please use our Support Request Form. Please provide as much technical information as possible. Thank you!
|
|
|
|