2010 (EN)
Παρεμβολή ομιλητών και μετατροπή φωνής δια παράλληλα κείμενα
On speaker interpolation and speech conversion for parallel corpora

Γρέκας, Γεώργιος Αντωνίου
Grekas, Giorgios

Στυλιανού, Γιάννης

In daily speech the linguistic information plays a major role in the communication between people. However, voice quality and individuality are important in speech recognition and understanding. For instance, it is exceptionally significant to understand and discriminate between two or more speakers in a radio or a television program. Voice individuality, apart from providing the aforementioned advantages in communication, enriches our daily life with variety. For a number of modern applications it is important to create and maintain data bases for different speakers, for example, in gaming, in text-to-speech synthesis and in cartoon movies. This may be time consuming and expensive, depending on the requirements of the application. Speaker interpolation (SI) is the process of producing an intermediate voice between two or more speakers, while voice conversion (VC) is the technique of processing the voice of one person, namely the source speaker, such that his/her voice resembles the voice of another person, namely the target speaker. Moreover, the converted or interpolated speech should sound natural and intelligible. Despite the extended research in VC, high-quality voice conversion has not been achieved yeet. A number of reasons explain this current shortcoming, with the main ones being a) the oversmoothing effect by using of statistical modeling b) inaccurate estimation of the speaker-depended features and c)the inadequacy of the used synthesis methods. Voice conversion methods are based on spectral envelope information, which represents the vocal tract, since it has an important role on speech individuality. In conventional VC the excitation signal of the source speaker is ex- tracted first by inverse filtering. Then this excitation signal is filtered from the vocal tract of the target speaker. In speech interpolation the excitation signal is filtered from an interpolated vocal tract of the given speakers. The scope of this thesis is to deal with this research gap and achieve high quality speech interpolation and voice conversion of parallel corpora using accurate meth- ods for spectral envelope estimation (true envelope), time and frequency alignment (piecewise linear time and frequency warping), and speech synthesis (interpolated lattice filter or overlap and add). With the use of precise methods in each processing step it was expected to reduce the artifacts currently met in voice conversion. In speech interpolation the produced vocal tract is not just an interpolation between the given speakers, but the vocal tract length can be altered, producing a broad range of voices. Hence, given a limited data base a substantially larger one that contains individual speakers for every use can be created. (EN)

Τύπος Εργασίας--Μεταπτυχιακές εργασίες ειδίκευσης

Μετατροπή φωνής
Speaker interpolation
Παρεμβολή ομιλητών
Speech conversion

Πανεπιστήμιο Κρήτης (EL)
University of Crete (EN)



Σχολή/Τμήμα--Σχολή Θετικών και Τεχνολογικών Επιστημών--Τμήμα Επιστήμης Υπολογιστών--Μεταπτυχιακές εργασίες ειδίκευσης

