Page 1 of 1

Accented characters

PostPosted: Mon Mar 18, 2013 5:30 pm
by kenberry
I am having problems with adding words with accented characters to the lexicon file. For example, using the Marta voice the work "niña" is always pronounced "n i1 nj a0" no matter what I specify in the lexicon file. Is there something specific I need to do in order to get the engine to read accented characters from the lexicon file?


Also, reading the tutorials at http://www.cepstral.com/en/tutorials/view/lexicon, all accented characters are replaced by "?" so the Spanish tutorial has sample words like "ni?a" instead of "niña." This really should be fixed.

Re: Accented characters

PostPosted: Wed Mar 20, 2013 11:22 am
by ChrisM
Hi,

Just wanted to weight in about the "?" characters showing up on our website. It seems when moving the information from our older site to the new one some encoding problems occurred. I will be looking into fixing that later today. Thank you for letting us know about the problem.

Re: Accented characters

PostPosted: Wed Mar 20, 2013 11:48 am
by NicholeH
Hi there,

One problem here is that the default encoding of the lexicon.txt file is UTF-8, and something about this encoding is not being interpreted correctly by the engine when you are using the Spanish characters. I was able to fix the issue by changing the encoding of the lexicon file with the following command:

iconv -f UTF-8 -t ISO-8859-15 lexicon.txt > new.txt

Then just make sure to move new.txt to your lexicon.txt location. Thank you for the bug report; we will try to address this in the next release of our voices.

--The Cepstral Support Team

Re: Accented characters

PostPosted: Thu Mar 21, 2013 3:38 pm
by kenberry
Odd. Your solution works if I convert the lexicon file on a Win 7 64 bit machine but does not work if I do it on a WinXP 32 bit machine.

Well, that's not your issue and you did give me a fix so thank you.

Re: Accented characters

PostPosted: Fri Mar 22, 2013 9:35 am
by NicholeH
I'm glad this resolves your issue. Please let us know if you need any further assistance.

--The Cepstral Support Team

Re: Accented characters

PostPosted: Fri Mar 22, 2013 1:06 pm
by NicholeH
One last option that might work:

If you are using the command line to call swift, you can also pass in a -e parameter for the encoding, instead of changing the lexicon file itself. Perhaps this would work for your 32 bit machine.


-e <string> Text Encoding to assume for input. Common encoding types
include: "utf-8", "utf-16", "iso8859-1", "iso8859-15",
and "us-ascii". The default is "us-ascii".
Note: This does not cause swift to convert text to the
specified encoding, but rather tells swift to expect the
input text to be of the specified encoding.

Re: Accented characters

PostPosted: Fri Mar 22, 2013 2:20 pm
by kenberry
Thanks but we are using the Microsoft Speech API from a custom Windows app so this doesn't look like an option for us.

Re: Accented characters

PostPosted: Thu Jun 20, 2013 2:00 am
by gilroy
The user lexicon serves as an absolute override. Furthermore, there is no SAPI dependency in this mechanism. The lexicon should work everywher It seemed a little shaky to me. Not to say that he does not have this lexicon





______________
Our brain dump tutorials and 200-101 dumps guide will give beginners a core overview of Adobe photoshop. Learn all about photoshop plugins using VMWARE free resources. For more detail California Institute of Technology Best wishes