BIOVI Text To Speech (TTS) project

blackeningfourAI and Robotics

Oct 19, 2013 (3 years and 8 months ago)

123 views

BIOVI
Text

To

Speech (TTS) project





Nordisk sprogmøde

26.


30. August 2013



Kristinn Halldór Einarsson

project manager and chairman for

Blindrafélagið,


Icelandic organization of the visually
impaired (BIOVI)

Overwiev







Life quality taken for granted.





Visually impaired people and Text to Speech systems.




BIOVI Text to Speech project.




Listening examples and tools presentation.





Quality of life



How would it affect us if we would lose our ability to read?




This is something that will most likely happen to some of us in our


retirement years.




What can be done to limit the huge negative impact on the life quality of


those who are going to lose their ability to read in a conventional manner?

AND



Can it be accepted that an increasing part of our population could lose their


ability to enjoy reading in an independent manner?

Who are they?



5% of people 70 years and older are affceted by later stage of Age


Macular Degeneration (AMD). No effective treatments available today.




AMD affects mainly the central vision (reading vision).




There are around 800 visually impaired individuals in Iceland as a


result of later stage AMD. In 2030 the number is expected to double, be


1600. Total poulation of visually impaired in Iceland is 1600.




The organization of people with dyslexia in Iceland claims that up to 25%


of grown ups are dealing with dyslexia.

... a bit of history



1003
The first known tales of effort to build a talking machine.



1968 The first computer speech synthesizer is built.



1988 The Universities of Iceland and Stockholm start cooperation.



1990 The Swedish company Infovox releases Sturla, the first Icelandic TTS voice.



2000 Snorri, an updated and improved version of Sturla is released.



2006 Ragga, a new Icelandic TTS voice is released by Nuance.



2012 Dóra and Karl, new male and female voices, are released by Ivona.

Text To Speech (TTS) technology?



TTS systems are linguistic tools that transforms text in a digital format to speech.




Modern TTS systems need to be able to operate on different operating systems


and tools such as: computers, tablets, smart phones, AMD
´
s, mp3 players and


other computing tools.




TTS voices are built for each language and need to be available in different


sizes and qualities.




Quality of TTS voices is measured wrt. listening qualities & closeness to natural


reading.


ICT, accessibility & quality of life



ICT (Information and Communication technology) can increase independence and


life quality of visually impaired people tremendously as it opens up a whole new


educational, leisure and employment possibilities.




A key element is well designed TTS system in the mother tongue of those who are


to benefit. The mother tongue is an essential part of every nation's identity, and


legal rights.




TTS system is not only beneficiary to visually impaired people but also the much


larger learning disability population.

TTS voices are marketing commodities



Producers of TTS voices expect return on investments.




Languages spoken by many people represent a market with a big demand that can


generate big supply and attractive business opportunities.




Language spoken by few people represent a market, with little demand and little


or no supply, that offers little or no business opportunities.




What is the situation with languages spoken by few people, in terms of having


modern ICT linguistic tools that are becoming more and more important in modern


communications?

Mother tongue


If you talk to a man in a language he
understands, that goes to his head.


If you talk to him in his language, that goes to his
heart”



Nelson Mandela.

BIOVI Text
-
to
-
Speech project

The project was based on two pillars:


Improved life quality


&

Cultivation of the Icelandic language



Project
´
s
main definitions













Multiple usage options
.




Very good listening qualities.




License fee arrangement.




Open to further development .




Some control over future development.




Sustainable business model.




Selecting TTS producer



After exploring and taking stock on different TTS producers the Polish


company
Ivona

was selected to build the new Icelandic TTS voices.




Royal National Institution of Blind People in UK (RNIB) have enjoyed very


good cooperation with Ivona. Ivona was finishing building welsh TTS voices.




The Ivona voices have received many rewards for the accuracy and listening


quality they possess.


Ivona

compaired

(
arsnews
.
com
)

Technology
-

BrightVoice




BrightVoice
-

a new age for Text
-
to
-
Speech.




BrightVoice technology guarantees a smooth natural speech




New language models

provide intelligent text interpretation




Up to
10 times faster

speech generation




Crystal clear sound

due to noise and distortions reduction




Technology


Rapid Voice Devolopment

Rapid Voice Development


fast building of IVONA Voices




RVD technology (Rapid Voice Development) makes the process of building


IVONA Voices fast and relatively cheap.




It uses a set of tools modeling a linguistic issues such as subvocalization,


accentuation, intonation.




It also allows to efficiently, quickly and accurately determine the speech signal


in original speech recordings.




The Ivona tecnology



Development in number of Ivona voices


18
languages

Operation systems and the Ivona voices

The Ivona voices are capable of operating on
:






Windows XP/Vista/7/8




Mac



Unix



iOS (Apple iPhone & iPad)



Android



Windows mobile


The project in steps



December 2010


March 2011:

Ivona visited, agreement drawn up and signed.



Summer 2011:
10.000 sentences selected from the Icelandic corpus in Leipzig,


voice talents selected, recording of sentences. Voices named Dora & Karl.



February 2012:
Ca. 900 sentences released. Valuation and feedback by team


of linguistics and users. Beta 1.



Apríl 2012:
Valuation and feedback on Beta 2 is concluded.



June 2012:
Beta version 3 is released and distribution starts.



October 2012
: 10.000 additional pronunciation examples added to the corpus.



June 2013
: Final version of Dora and Karl released.






Cost and plans



Total cost was 500.000 Euros (85 million IKR).




The project was close to fully financed when agreement was signed.




Cost and delivery times where according to plans and turned out to be


accurate.




Financal contributors



Blindrafelagid (inheritance from Dora Stefánsdottir)

25,0 m.kr. 29%



Lions, national colection The Red feather



19,3 m.kr. 23%



Foundation for disability related projects




15,0 m.kr. 17%



Ministries of welfare and education



11,3 m.kr. 13%



The diability oragnization of Iceland



10,0 m.kr. 12%



Blindravinafélagið (Friends of the blind)






5,0 m.kr 6%



Total







85,6 m.kr. 100%

Valuable contributors





Among valuable advisers, contributors and co
-
workers where:




Eiríkur Rögnvaldsson, Icelandic professor at the University of Iceland and his


people.




Sigrún Helgadóttir at Árnastofnun.




The people behind the Icelandic corpus at the University of Leipzig.




Mrs Vigdís Finnbogadóttir, former president of Iceland, who acted as the


project’s patron.

Sustainable business model



The Icelandic Ivona voices, along with Ireader, are given free of charge to


all Icelanders who are visually impaired or are dealing with reading


impairment. Others can buy the Ireader and the voices for around 50 Euros.





BIOVI handles all sales of the Icelandic Ivona voices and different tools like


the text reader, recording studio and the webreader. Customers are


individuals, schools, institutions and businesses. Additional voices in other


languages can easily be bought from Ivona and added to one’s voice portfolio.




Profits from the sales of the Icelandic voices are meant to finance further


development and extra additions that might bee needed.

Linguistic challenges





Dialects:

South or north pronunciation?




Emphasize in pronunciation:
Difficult to deal with compound words as the


rules for stress placement in Icelandic compounds are unclear.




Numbers:
Difficult because of so many declensions forms.





Abbreviations:
Read them or interpret them?




Foreign words:
Solved with an additional dictionary

Main tools



SAPI 5 voices for Windows and reader and mini reader.






Webreader that reads from the cloud.




Android voices for smart phones and tablets.




Recording studio.




Ivona SDK (Software devolopment kit) and voices for,


telephone answering, AMD and other computing tools.



Ivona An Amazon company





On the 24th of January 2013 Amazon announced that it has acquired the


leading text
-
to
-
speech technology company IVONA.




This acquisition strengthens and protects the position of Ivona on a market


where there are some much bigger players then Ivona.




Amazon acquiring Ivona is in a way confirmation that others have seen the


same thing as we did when it comes to the potential of Ivona TTS products.

Listening examples and tools precentation

Snorri



Ragga


Karl



Dóra


IReader

Takk fyrir