Learning L2 pronunciation with a mobile speech recognizer: French /y/
Issue: Vol 32 No. 1 (2015)
Journal: CALICO Journal
This study investigates the acquisition of the L2 French vowel /y/ in a mobile-assisted learning environment, via the use of automatic speech recognition (ASR). Particularly, it addresses the question of whether ASR-based pronunciation instruction using a mobile device can improve the production and perception of French /y/. Forty-two elementary French students participated in an experimental study in which they were assigned to one of three groups: (1) the ASR Group, which used an ASR application on their mobile devices to complete weekly pronunciation activities, with immediate written visual (textual) feedback provided by the software and no human interaction; (2) the Non-ASR Group, which completed the same weekly pronunciation activities in individual weekly sessions but with a teacher who provided immediate oral feedback using recasts and repetitions; and finally, (3) the Control Group, which participated in weekly individual meetings ‘to practice their conversation skills’ with a teacher who provided no pronunciation feedback. The study followed a pretest/posttest design. According to the results of the dependent samples t-tests, only the ASR group improved significantly from pretest to posttest (p < 0.001), and none of the groups improved in perception. The overall success of the ASR group on the production measures suggests that this type of learning environment is propitious for the development of segmental features such as /y/ in L2 French.
Author: Denis Liakin, Walcir Cardoso, Natallia Liakina
Aist, G. (1999). Speech recognition in computer-assisted language learning. In K. Cameron (ed.), CALL: Media, Design & Applications, 165–181. Lisse, Holland: Swets & Zeitlinger.
Aliaga-Garcia, C. and Mora, J. C. (2009). Assessing the effects of phonetic training on L2 sound perception and production. In B. Baptista, A. Rauber and M. Watkins (eds), Recent Research in Second Language Phonetics/Phonology: Perception and Production, 2–31. Newcastle Upon Tyne: Cambridge Scholars.
Baker, W. and Smith, L. (2010). The impact of L2 dialect on learning French vowels: Native English speakers learning Québécois and European French. Canadian. Modern Language Review, 66 (7): 711–738. http://dx.doi.org/10.3138/cmlr.66.5.711
Best, C. T. (1993). Emergence of language-specific constraints in perception of non-native speech: A window on early phonological development. In B. de Boysson-Bardies, S. de Schoenen, P. Jusczyk, P. MacNeilage and J. Morton (eds), Developmental Neurocognition: Speech and Face Processing in the First Year of Life, 289–304. Dordrecht: Kluwer Academic Publishers.
Best, C. T. (1995). A direct realist view of cross-language speech perception. In: W. Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-Language Research, 171–206. Baltimore, MD: York Press.
Borden, G., Gerber, A. and Milsark, G. (1983). Production and perception of the /r/-/l/ contrast in Korean adults learning English. Language Learning 33 (3): 499–526. http://dx.doi.org/10.1111/j.1467-1770.1983.tb00946.x
Bradlow, A. R., Pisoni, D. B., Yamada, R. A. and Tohkura, Y. (1997). Training Japanese listeners to identify English /r/ and /l/: II. Some effects of perceptual learning on speech production. Journal of the Acoustical Society of America, 101 (4): 2299–2310. http://dx.doi.org/10.1121/1.418276
Brown, A. (1991). Functional load and the teaching of pronunciation. In A. Brown (ed.), Teaching English Pronunciation: A Book of Readings, 211–224. London: Routledge.
Bruff, D. (2009). Teaching with Classroom Response Systems: Creating Active Learning Environments. San Francisco, CA: Jossey-Bass.
Bybee, J. (2001). Phonology and Language Use. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511612886
Chapelle, C. (2001). Computer Applications in Second Language Acquisition: Foundations for Teaching, Testing, and Research. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9781139524681
Chapelle, C. (2012, April). Using mixed-methods research in technology-based innovation for language learning. Paper presented at the Innovative Practices in Computer Assisted Language Learning Conference, University of Ottawa, Ontario.
Chapelle, C. and Jamieson, J. (2008). Tips for Teachers: Computer-assisted Language Learning. New York: Pearson Longman.
Chun, D. M. and Plass, J. L. (1996). Effects of multimedia annotations on vocabulary acquisition. The Modern Language Journal, 80 (2): 183–198. http://dx.doi.org/10.1111/j.1540-4781.1996.tb01159.x
Christison, M. A. (1999). A Guidebook for Applying Multiple Intelligences Theory in the ESL/EFL Classroom. Burlingame, CA: Alta Book Center Publishers.
Clark, R. (1983). Reconsidering research on learning from media. Review of Educational Research, 53 (4): 445–459. http://dx.doi.org/10.3102/00346543053004445
Coniam, D. (1999). Voice recognition software accuracy with second language speakers of English, System 27 (1): 49–64. http://dx.doi.org/10.1016/S0346-251X(98)00049-9
Cucchiarini, C., Neri, A. and Strik, H. (2009). Oral proficiency training in Dutch L2: The contribution of ASR-based corrective feedback. Speech Communication, 51 (10): 853–863. http://dx.doi.org/10.1016/j.specom.2009.03.003
Dabaghi, A. (2010). Corrective Feedback in Second Language Acquisition: Theory, Research and Practice. LAP Lambert Academic Publishing.
Dalby. J. and Kewley-Port, D. (1999). Explicit pronunciation training using automatic speech recognition. CALICO Journal 16 (3): 425–445.
Dekeyser, R. M. (1993). The effect of error correction on L2 grammar knowledge and oral proficiency. The Modern Language Journal, 77 (4): 501–514. http://dx.doi.org/10.1111/j.1540-4781.1993.tb01999.x
Derwing, T., Munro, M. and Wiebe, G. (1998). Evidence in favor of a broad framework for pronunciation instruction. Language Learning, 48 (3): 393–410. http://dx.doi.org/10.1111/0023-8333.00047
Derwing, T., Munro, M. and Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech?, TESOL Quarterly 34: 592–603. http://dx.doi.org/10.2307/3587748
Dickerson, W. (2004). Stress in the Speech Stream: The Rhythm of Spoken English. Urbana, IL: University of Illinois Press.
Dickerson, W. (2013). Prediction in teaching pronunciation. In C. Chapelle (ed.), The Encyclopedia of Applied Linguistics. Oxford: Wiley-Blackwell.
Eskenazi, M. (1999). Using Automatic Speech Processing for foreign language pronunciation tutoring: Some issues and a prototype. Language Learning and Technology, 2 (2): 62–76.
Flege, J. (1995). Second language speech learning: Theory, findings and problems. In W. Strange (ed.), Speech Perception and Linguistic Experience: Theoretical and Methodological Issues, 233–277. Baltimore, MD: York Press.
Flege, J. (1999). The relation between L2 production and perception. In J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, and A. Bailey (eds), Proceedings of the XIV International Congress of the Phonetic Sciences, Vol. 2, 1273–1276. Berkeley, CA: University of California.
Flege, J., Takagi, N. and Mann, V. (1996). Lexical familiarity and English-language experience affect Japanese adults’ perception of /ɹ/ and /l/. Journal of Acoustical Society of America, 99 (2): 1161–1173. http://dx.doi.org/10.1121/1.414884
Gardner, H. (1983). Frames of Mind: The Theory of Multiple Intelligences. New York: Basic Books.
Godwin-Jones, R. (2009). Emerging technologies: personal learning environments. Language Learning and Technology, 13 (2): 3–9.
Gottfried, T. (1984). Effects of consonant context on the perception of French vowels. Journal of Phonetics, 12: 91–114.
Hale, M. and Reiss, C. (1998). Formal and empirical arguments concerning phonological acquisition. Linguistic Inquiry, 29: 656–683. http://dx.doi.org/10.1162/002438998553914
Handley, Z. (2009). Is text-to-speech synthesis ready for use in computer-assisted language learning?, Speech Communication, 51 (10): 906–919. http://dx.doi.org/10.1016/j.specom.2008.12.004
Hardison, D. (2004). Generalization of computer-assisted prosody training: Quantitative and qualitative findings. Language Learning & Technology, 8 (1): 34–52.
Hardison, D. (2005). Contextualized computer-based L2 prosody training: Evaluating the effects of discourse context and video input. CALICO Journal 22 (2): 175–190.
Hattori, K. (2009). Perception and Production of English /r/-/l/ by Adult Japanese Speakers. Unpublished doctoral dissertation. University College London, UK.
Hincks, R. (2003). Speech technologies for pronunciation feedback and evaluation. ReCALL, 15, 3–20. http://dx.doi.org/10.1017/S0958344003000211
Holec, H. (1981). Autonomy and Foreign Language Learning. Oxford: Pergamon.
Holland, M. (1999). Tutors that listen. CALICO Journal, 16 (3): 245–250.
Jenkins, J. (2000). The Phonology of English as an International Language: New Models, New Norms, New Goals. Oxford: Oxford University Press.
Jenkins, J. (2002). A sociolinguistically based, empirically researched pronunciation syllabus for English as an international language. Applied Linguistics, 23 (1): 83–103. http://dx.doi.org/10.1093/applin/23.1.83
Jongman, A. and Wade, T. (2007). Acoustic variability and perceptual learning: The case of non-native accented speech. In O.-S. Bohn and M. J. Munro (eds), Language Experience in Second Language Speech Learning, 135–150, Amsterdam: John Benjamins.
Joseph, S. and Uther, M. (2009). Mobile devices for language learning: Multimedia approaches. Research and Practice in Technology Enhanced Learning, 4 (1): 7–32. http://dx.doi.org/10.1142/S179320680900060X
Jurafsky, D. and Martin, A. (2008). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd Edition. Upper Saddle River, NJ: Prentice Hall.
Kawai, G. and Hirose, K. (2000). Teaching the pronunciation of Japanese double-mora phonemes using speech recognition technology. Speech Communication, 30 (2–3): 131–143. http://dx.doi.org/10.1016/S0167-6393(99)00041-2
Kennedy, C. and Levy, M. (2008). L’italiano al telefonino: Using SMS to support beginners’ language learning. ReCALL, 20 (3): 315–330. http://dx.doi.org/10.1017/S0958344008000530
Kiernan, P. and Aizawa, K. (2004). Cell phones in task based learning. Are cell phones useful language learning tools? ReCALL, 16 (1): 71–84. http://dx.doi.org/10.1017/S0958344004000618
Kim, I. (2006). Automatic speech recognition: Reliability and pedagogical implications for teaching pronunciation. Educational Technology and Society, 9 (1): 322–344.
King, R. (1967). Functional load and sound change. Language, 43 (4): 831–852. http://dx.doi.org/10.2307/411969
Koerich, R. (2006). Perception and Production of vowel paragorge by Brazilian EFL students. In B. Baptista and M. Watkins (eds), English with a Latin Beat. Studies in Portuguese/Spanish – English Interphonology, 91–104). Studies in Bilingualism 31. Amsterdam: John Benjamins.
Kukulska-Hulme, A. and Shield, L. (2008). An overview of mobile assisted language learning: From content delivery to supported collaboration and interaction. ReCALL, 20 (3): 271–289. http://dx.doi.org/10.1017/S0958344008000335
LaRocca, S., Morgan, J. and Bellinger, S. (1999). On the path to 2X learning: Exploring the possibilities of advanced speech recognition, CALICO Journal 16 (3): 295–310.
Levis, J. (2007). Computer technology in teaching and researching pronunciation. Annual Review of Applied Linguistics, 27: 1–19. http://dx.doi.org/10.1017/S0267190508070098
Levy, E. and Law II, F. (2010). Production of French vowels by American-English learners of French: Language experience, consonantal context, and the perception-production relationship. Journal of the Acoustical Society of America, 128 (3): 1290–1305. http://dx.doi.org/10.1121/1.3466879
Levy, E. and Strange, W. (2008). Perception of French vowels by American English adults with and without French language experience. Journal of Phonetics, 36 (1): 141–157. http://dx.doi.org/10.1016/j.wocn.2007.03.001
Littlewood, W. (2004). The task-based approach: Some questions and suggestions. English Language Teaching Journal, 58 (4): 319–326. http://dx.doi.org/10.1093/elt/58.4.319
Lu, M. (2008). Effectiveness of vocabulary learning via mobile phone. Journal of Computer Assisted Learning, 24 (6): 515–525. http://dx.doi.org/10.1111/j.1365-2729.2008.00289.x
MacDonald, D., Yule, G. and Powers, M. (1994) Attempts to improve English L2 pronunciation: The variable effects of different types of instruction. Language Learning, 44 (1): 75–100. http://dx.doi.org/10.1111/j.1467-1770.1994.tb01449.x
Mostow, J. and Aist, G. (1999). Giving help and praise in a reading tutor with imperfect listening because automated speech recognition means never being able to say you're certain. CALICO Journal 16 (3): 407–424.
Neri, A., Cucchiarini, C. and Strik, H. (2003). Automatic speech recognition for second language learning: how and why it actually works. Proceedings of 15th International Congress of Phonetic Sciences, 1157–1160, Barcelona, Spain.
Neri, A., Cucchiarini, C. and Strik, H. (2006). Selecting segmental errors in L2 Dutch for optimal pronunciation training. International Review of Applied Linguistics, 44 (4): 357–404. http://dx.doi.org/10.1515/IRAL.2006.016
Neri, A., Mich, O., Gerosa, M. and Giuliani, D. (2008). The effectiveness of computer assisted pronunciation training for foreign language learning by children. Computer Assisted Language Learning, 21 (5): 393–408. http://dx.doi.org/10.1080/09588220802447651
Nikolova, O. (2002). Effects of students’ participation in authoring of multimedia materials on student acquisition of vocabulary. Language Learning and Technology 6 (1): 100–122.
Nunan, D. (2004). Task-Based Language Teaching. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511667336
Rabiner, L. and Juang, B. (1993). Fundamentals of Speech Recognition. Upper Saddle River, NJ: Prentice Hall.
Rochet, B. (1995). Perception and production of Second-Language speech sounds by adults. In W. Strange (ed.), Speech Perception and Linguistic Experience: Theoretical and Methodological Issues, 379–410. Timonium, MD: York Press.
Rosa, E. and Leow, R. (2004). Computerized task-based exposure, explicitness, type of feedback, and Spanish L2 development. Modern Language Journal, 88 (2): 192–216. http://dx.doi.org/10.1111/j.0026-7902.2004.00225.x
Rosen, K. and Yampolsky, S. (2000). Automatic speech recognition and a review of its functioning with dysarthric speech. Augmentative and Alternative Communication, 16 (1): 48–
Schwienhorst, K. (2008). Learner Autonomy and CALL Environments. New York: Routledge.
Sheldon, A. (1985). The relationship between production and perception of the /r/–/l/ contrast in Korean adults learning English: A reply to Borden, Gerber, and Milsark. Language Learning, 35 (1): 107–13. http://dx.doi.org/10.1111/j.1467-1770.1985.tb01018.x
Sheldon, A. and Strange, W. (1982). The Acquisition of /r/-/l/ by Japanese Learners of English: Evidence that Speech Production Can Precede Speech Perception. Applied Psycholinguistics, 3 (3): 243–261. http://dx.doi.org/10.1017/S0142716400001417
Stampe, D. (1973). A Dissertation in Natural Phonology. New York: Garland.
Strambi, A. (2001). The interaction of web-based interaction and collaboration on the language learner. Unpublished doctoral thesis, University of Sydney.
Strik, H., Truong, K., Wet, F. and Cucchiarini, C. (2009). Comparing different approaches for automatic pronunciation error detection, Speech Communication, 51 (10): 845–852. http://dx.doi.org/10.1016/j.specom.2009.05.007
Warschauer, M. (1996). Comparing face-to-face and electronic communication in the second language classroom. CALICO Journal 13 (2): 7–26.
Young, V. and Mihailidis, A. (2010). Difficulties in automatic speech recognition of dysarthric speakers and the implications for speech-based applications used by the elderly: a literature review. Assistive Technology Journal, 22 (2): 99–112. http://dx.doi.org/10.1080/10400435.2010.483646
Zhang, H., Song, W. and Burston, J. (2011). Reexamining the effectiveness of vocabulary learning via mobile phones. The Turkish Online Journal of Educational Technology, 10 (3): 203–221.