Blog

synthetic-voices

Synthetic voices, or artificial voices, are part of our lives (who has never talked to iPhone's Siri, Amazon Alexa, or Microsoft Cortana?). Conversational AI, which largely relies on these synthetic voices, is currently experiencing a boom and its market is expected to grow even more in the next four years.

However, even though synthetic voices might look like a very new invention, the truth is that they have been around for a long, long time (more than 200 years, actually!).

The very beginning

The first attempt to produce human speech by machine can be traced back to the 18th century (yup, that long ago!). In the second half of that century, the Austro-Hungarian inventor Wolfang von Kempelen built a "speaking machine" by replicating the human vocal tract.

The machine consisted of a bellows (representing the lungs), a reed, as well as a fake mouth, throat and nasal cavity. Von Kempelen was mainly interested in the clinical applications of its invention, such as assisting those who couldn't speak.

von-Kempelen

Although the machine looks very rudimentary, it took him about 20 years to build. The final version of von Kempelen's speaking machine was able to speak four languages: French, English, Italian, and German.

New advancements

At the beginning of the 19th century, some other speaking machines were built following von Kempelen's model: machines would replicate the human vocal tract to produce a variety of sounds in different languages.

The most famous one from this period is Joseph Faber's "Euphonia" (originally called "Fabulous Talking Machine" though). Besides including an artificial human throat and vocal organs, Euphonia also had a keyboard attached, resembling a piano. Users would press the keys to activate the mechanism and produce sounds. Euphonia was exhibited at different locations, where it would be covered with a female mask to make the mechanism more realistic.

Euphonia

Now, the greatest breakthrough of the 19th century was Edison's phonograph, without a doubt. The phonograph was the first machine able to record actual human sound and play it back.

A fun fact about the phonograph is that it was later used to create talking dolls ("Edison's Phonograph Dolls", specifically). However, these dolls were a sales failure, since people found them frightening.

Edison’s-doll

The boom of electronic speech

At the beginning of the 20th century, Homer Dudley invented the VODER, the first machine ever built for electronic speech without human input. Basically, Dudley tried to break up natural human speech into its acoustic components and then tried to reproduce those sound patterns electronically. Still, highly trained assistants were needed to operate the machine.

VODER-machine

Throughout the 20th century, synthetic speech continued developing. An iconic example from this period is Stephen Hawking's artificial voice.

In the 21st century, the field has continued evolving thanks to progress made in artificial intelligence. New products are developed every year and a lot of people interact with synthetic voices on a daily basis nowadays. In the coming years, it is expected that synthetic voices will become more personalized. For example, Lyrebird is able to mimic any voice after a quick 60-second analysis. Although this personalization is great, it can also be dangerous (e.g., deepfakes).

 

 

If you found this post interesting, you can check out my other blog posts here!