I did some performance analysis of the Callie build we currently have available and found some interesting information.
First of all, the first time using the voice to synthesize text is radically slow because the voice needs to be loaded into memory. Subsequent syntheses however take advantage of the voice data in inactive memory and synthesize much much faster.
I wanted to test swift's performance on the Pi at normal processor speeds and overclocked speeds.
Set up: Create file with the phrase "This is a test of the Cepstral Swift Engine on the Raspberry Pi at various processor speeds. We hope for the best and wish you good day." in a text file. To test time to hear any synthesis we set up speakers and had two people start a timer with their cell phone at the same time I executed a swift command. They would stop the timer when they heard the voice start speaking. Doing this several times gave a rough approximation. I also did the following command to time the actual length of time it takes to synthesize the entire audio:
- Code: Select all
time swift -n Callie -f input.txt
Results:
Overclock Setting ||| Time to Hear ||| Time To Synth
None ||| 1.5s ||| 2.182s
Medium ||| 1.3s ||| 1.928s
Turbo ||| 1.1s ||| 1.798s
We find that a 1s delay is not very noticeable thus making Callie very suitable for usage in a Real Time TTS application in our opinions.