King's Speech: Pronounce a Foreign Language with Style

Georgios Athanasopoulos, Céline Lucas, Alessandro Cierro, Robin Guérit, Kaori Hagihara, Julie Chatelain, Sébastien Lugan, Benoît Macq



Computer assisted pronunciation training requires strategies that capture the attention of the learners and guide them along the learning pathway. In this paper, we introduce an immersive storytelling scenario for creating appropriate learning conditions. The proposed learning interaction is orchestrated by a spoken karaoke. We motivate the concept of the spoken karaoke and describe our design. Driven by the requirements of the proposed scenario, we suggest a modular architecture designed for immersive learning applications. We present our prototype system and our approach for the processing of spoken and visual interaction modalities. Finally, we discuss how technological challenges can be addressed in order to enable the learner's self-evaluation.


Immersive Language Learning; L2 Pronunciation; Computer Assisted Pronunciation Training; Gamification; Audiovisual Speech Technology

Full Text:



Algazi, V. R., Duda, R. O., Thompson, D. M., & Avendano, C. (2001). The CIPIC HRTF Database. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Electroacoustics

Barrós-Loscertales, A., Ventura-Campos, N., Visser, M., Alsius, A., Pallier, C., Rivera, C. Á., & Soto-Faraco, S. (2013). Neural correlates of audiovisual speech processing in a second language. Journal of Brain and Language

Brognaux, S., & Drugman, T. (2016). HMM-based Speech Segmentation: Improvements of Fully Automatic Approaches. IEEE/ACM Trans. Audio Speech Lang. Process., 24(1)

Cugelman, B. (2013). Gamification: what it is and why it matters to digital health behavior change developers. JMIR Serious Games, 1 (1)

Fette, I., & Melnikov, A. (2011). The Websocket Protocol, IETF, RFC 6455.

Hamari, J., Koivisto, J., & Sarsa H. (2014). Does gamification work? A literature review of empirical studies on gamification. Proceedings of 47th Hawaii International Conference on System Sciences (HICSS)

Kalogeras, S. (2013). Media-education Convergence: Applying Transmedia Storytelling Edutainment in E-Learning Environments. International Journal of Information and Communication Technology Education 9(2).

Miller, A. S., Cafazzo, J. A., & Seto, E. (2014). A game plan: Gamification design principles in mHealth applications for chronic disease management. Health informatics journal, 22(2), 184-193

Møller, H. (1992). Fundamentals of Binaural Technology. Applied Acoustics, 36, 171-218.

Müller, M. (2007). Information Retrieval for Music and Motion, chapter Dynamic Time Warping, 69-84, Springer, Berlin, Heidelberg

Soens, P., & Verhelst, W. (2012). On split Dynamic Time Warping for robust Automatic Dialogue Replacement. Signal Processing, 92, 439-454

Soens, P., & Verhelst, W. (2012b). An iterative bilinear frequency warping approach to robust speaker-independent time synchronization. Proceedings of 20th European Signal Processing Conference (EUSIPCO)

Stadniczuk, D., Bauckmann, G., & Suendermann-Oeft, D. (2013). An Open-Source Octave Toolbox for VTLN-Based Voice Conversion. Proceedings of International Conference of the German Society for Computational Linguistics and Language Technology Turetsky, R., & Ellis, D. (2003). Ground-Truth Transcriptions of Real Music from Force-Aligned MIDI Syntheses. Proceedings of 4th International Symposium on Music Information Retrieval (ISMIR)

Verhelst, W., & Roelands, M. (1993). An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Verhelst, W. (1997). Automatic post-synchronization of speech utterances. Proceedings of 5th European Conference on Speech Communication and Technology

Verhelst, W., & Brouckxon, H. (2003). Rejection phenomena in inter-signal voice transplantations. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Zhao, Y. (1997). The Effects of Listener' Control of Speech Rate on Second Language Comprehension. Applied Linguistics, 18(1), 49-68



  • There are currently no refbacks.

Journal of Science and Technology of the Arts
Revista de Ciência e Tecnologia das Artes
ISSN: 1646-9798
e-ISSN: 2183-0088
Portuguese Catholic University | Porto


Esta revista científica é financiada por Fundos Nacionais através da FCT – Fundação para a Ciência e a Tecnologia

 Governo da República Portuguesa