Telephony
  • What is SSML?

  • SSML, or Speech Synthesis Markup Language, provides users with a standardized method for controlling different aspects of speech synthesis output. For example, with SSML, one can alter prosody attributes, such as rate, pitch, and volume, insert pauses of any length, change the speaking voice while reading, and control many other aspects of how the text is read by the synthetic voice. More information can be found on on the W3's SSML 1.0 specificiation
    page.



  • How do I use SSML with Cepstral TTS products?

  • There are several ways to affect pronunciation, and which one to use depends on how you are using the application.

    If you are using the Swift command line application to process text, or almost any application that calls Swift directly, you are using our native interface. Swift supports the
    Speech Synthesis Markup Language (SSML) as the default input mode for the synthesizer, with our own phoneme set for specifying pronunciations. With this you can put in-line pronunciations, and other mark-up defined in SSML.

    Our phonetic alphabet is the one that you also use when making entries into a swift voice dictionary (lexicon.txt). You can find more about this here.

    Example:
      Welcome to <phoneme ph="k eh1 p s t r ah0 l">Cepstral</phoneme>.
    Of course, this example is contrived, because our engine already says "Cepstral" properly.



  • When can SSML be used?

  • The Cepstral Swift TTS engine supports SSML natively, and by default it parses all input text for SSML. However, whether or not SSML is honored depends greatly on the context in which the Cepstral voice is used. If the application that is using the voices does not support SSML, the SSML markup will not make it through to the Swift TTS Engine for parsing. Particularly, SSML does not work in the following highly-used contexts:
    SSML does work with with Cepstral voices in any application that has been written to access the Cepstral Swift TTS Engine directly, without interacting with SAPI 5.1 or the Apple Speech Manager. SSML can be used with Cepstral voices in the following contexts:
    • Swift - The Cepstral command-line interface
      Installed with every Cepstral voice for Microsoft Windows, Apple Macintosh OS X, and Linux is a command-line utility called "Swift." By default, any text arguments or input text files sent through Swift are parsed for SSML content.

    • SwiftTalker
      The SwiftTalker application that is bundled with Cepstral voices for Microsoft Windows and Windows CE supports SSML.

    • Cepstral Tools
      SSML can be used in the text you provide to test a voice in the "Voices" tab of the Cepstral Tools applet for the Windows Control Panel.

    • Asterisk PBX
      SSML can be used with Cepstral voices in Asterisk by simply embedding the markup into the input text.



  • Common Usage Examples

  • This section lists many of the most comman uses of SSML with Cepstral Voices. The examples are shown as context-free text containing SSML markup. These examples can be used in any context in which SSML works with Cepstral Voices (See "When can SSML be used?"). For more detailed descriptions of how the elements and attributes used in these examples work, see the official W3C SSML Specification:

    http://www.w3.org/TR/speech-synthesis/

    1. Inserting silence / pauses
      "This is not <break strength='none' /> a pause."
      "This is a <break strength='x-weak' /> phrase break."
      "This is a <break strength='weak' /> phrase break."
      "This is a <break strength='medium' /> sentence break."
      "This is a <break strength='strong' /> paragraph break."
      "This is a <break strength='x-strong' /> paragraph break."
      "This is a <break time='3s' /> three second pause."
      "This is a <break time='4500ms' /> 4.5 second pause."
      "This is a <break /> sentence break."
    2. Changing Voices
      "This is the default voice. <voice name="David">This is David.</voice> This is the default again. <voice name="Callie">Callie here.</voice>"
    3. Adjusting Speech Rate
      "I am now <prosody rate='x-slow'>speaking at half speed.</prosody>"
      "I am now <prosody rate='slow'>speaking at 2/3 speed.</prosody>"
      "I am now <prosody rate='medium'>speaking at normal speed.</prosody>"
      "I am now <prosody rate='fast'>speaking 33% faster.</prosody>"
      "I am now <prosody rate='x-fast'>speaking twice as fast</prosody>"
      "I am now <prosody rate='default'>speaking at normal speed.</prosody>"
      "I am now <prosody rate='.42'>speaking at 42% of normal speed.</prosody>"
      "I am now <prosody rate='2.8'>speaking 2.8 times as fast</prosody>"
      "I am now <prosody rate='-0.3'>speaking 30% more slowly.</prosody>"
      "I am now <prosody rate='+0.3'>speaking 30% faster.</prosody>"
    4. Adjusting Voice Pitch
      "<prosody pitch='x-low'>This is half-pitch</prosody>"
      "<prosody pitch='low'>This is 3/4 pitch.</prosody>"
      "<prosody pitch='medium'>This is normal pitch.</prosody>"
      "<prosody pitch='high'>This is twice as high.</prosody>"
      "<prosody pitch='x-high'>This is three times as high.</prosody>"
      "<prosody pitch='default'>This is normal pitch.</prosody>"
      "<prosody pitch='-50%'>This is 50% lower.</prosody>"
      "<prosody pitch='+50%'>This is 50% higher.</prosody>"
      "<prosody pitch='-6st'>This is six semitones lower.</prosody>"
      "<prosody pitch='+6st'>This is six semitones higher.</prosody>"
      "<prosody pitch='-25Hz'>This has a pitch mean 25 Hertz lower.</prosody>"
      "<prosody pitch='+25Hz'>This has a pitch mean 25 Hertz higher.</prosody>"
      "<prosody pitch='75Hz'>This has a pitch mean of 75 Hertz.</prosody>"
    5. Adjusting Output Volume
      "<prosody volume='silent'>This is silent.</prosody>"
      "<prosody volume='x-soft'>This is 25% as loud.</prosody>"
      "<prosody volume='soft'>This is 50% as loud.</prosody>"
      "<prosody volume='medium'>This is the default volume.</prosody>"
      "<prosody volume='loud'>This is 50% louder.</prosody>"
      "<prosody volume='x-loud'>This is 100% louder.</prosody>"
      "<prosody volume='default'>This is the default volume.</prosody>"
      "<prosody volume='-33%'>This is 33% softer.</prosody>"
      "<prosody volume='+33%'>This is 33% louder.</prosody>"
      "<prosody volume='33%'>This is 33% louder.</prosody>"
      "<prosody volume='33'>This is 33% of normal volume.</prosody>"
    6. Adding Emphasis to Speech
      "This is <emphasis level='strong'>stronger</emphasis> than the rest."
      "This is <emphasis level='moderate'>stronger</emphasis> than the rest."
      "This is <emphasis level='none'>the same as</emphasis> than the rest."
    7. Inserting Recorded Audio Files
      "Please leave your message after the tone <audio src='beep.wav' />"
      "<audio src='non_existing_file.au'>File could not be played.</audio>"
    8. Applying Cepstral Special Effects
      "Hello. <cepstral:sfx file='/path/to/my_sfx.sfx'>Howdy, sir. How are you?</cepstral:sfx> I am fine."
      "Sit! <voice name='Dog' sfx_file='/path/to/my_sfx.sfx'>Woof!</voice> Good boy."
    9. Inserting Bookmarks
      "Place a bookmark <mark name='mark37' /> here."
    10. Spelling Words Phonetically
      "You say <phoneme ph='t ah0 m ey1 t ow0'>tomato</phoneme>, I say <phoneme ph='t ah0 m aa1 t ow0'>tomato</phoneme>"

      For a complete list of available phonemes for your language, please see the "Lexicon Tutorial".