Inturn, these realistic programs inspire the 音声合成 research with this area which goal is to develop supreme quality TTS system. In most cases, an average text-to-speech synthesis process contains three primary elements: text pre-processing, text to phonetic-prosodic interpretation, and speech synthesizer. Usually, the feedback text of the device is a series of unrestricted characters, comprising quantity, mark, acronyms, and abbreviation. Then, the text normalizer translates them into whole simple text. As an example, '3:15' will be developed into 'a quarter previous three' ;.

At the very first view, this job seems to be very easy. Nevertheless, an important problem is frequently undergone during this interpretation: semantic ambiguities. Among common instances is the translation of 'Dr' ;.It may symbolize 'Doctor' or 'Drive' according to their certain context. The translation from text to pronunciation is key to the full text-to-speech system. This module switches the pre-processed text right into a phonetic transcription with the prosodic information (like intonation and rhythm) as well. It is really a somewhat complex method and at a large degree establishes the last quality of the production speech.

Generally speaking, digital speech synthesis is a built-in engineering for simulating the human operations that generates presentation variety symbolic representation of utterance to acoustic waveforms. With the rapid progress in text-to-speech program lately, the chance for presentation synthesis has increased dramatically, because the text prepared in ordinal form could be described with some phonological representation that will be not so difficult to understand. In these days, there are lots of text-to-speech techniques on the industrial market and some of them are even multi-linguistics systems.

In these sections, two trusted presentation synthesis strategies is likely to be introduced. Traditionally, formant synthesis can also be called the source-filter synthesis. It identifies the speech by a series of parameters, most that are connected formant or anti-formant wavelengths and bandwidths as well as glottal waveforms. These formant and anti-formant wavelengths are much like the frequency result features of the vocal tract. Therefore, it is very essential to master some basic familiarity with human's presentation generation before my further discussion on the formant synthesis.

Figure 2 illustrates human's presentation generation system. It is actually consists of lungs, windpipe, pharyngeal hole (including larynx), oral cavity, and nasal cavity. In the debate, we generally combine oral and nasal hole together called vocal tract. Larynx may be the organ that generates the sound. It has two pieces of cartilage called oral creases that may over and over open and close since the air expelled from lung is pushed through the starting between them. Yet another important organ is the velum at the trunk of nasal cavity.