Between-speaker rhythmic variability is not dependent on language rhythm, as evidence from Persian reveals
Issue: Vol 25 No. 2 (2018)
Journal: International Journal of Speech Language and the Law
Subject Areas: Linguistics
DOI: 10.1558/ijsll.37110
Abstract:
Acoustic measures of speech rhythm based on the durational characteristics of consonantal and vocalic intervals (henceforth C- or V-intervals) as well as syllabic intensity reveal between-speaker variability. The evidence obtained so far is based on speakers of stressed-timed languages, which are assumed to have complex consonant clusters and a higher degree of vowel reduction. Speakers of stressed-timed languages might operate their articulatory organs in different ways due to the syllable complexity and vowel reduction. Complex consonant clusters are released differently, and vowel reduction tends to be produced more or less strongly depending on speakers. When a language lacks such features, it is possible that rhythmic variation between its speakers decreases. In the present study, we aimed at exploring between- and within-speaker rhythmic variability in Persian, an Indo-European language categorised as syllable-timed. Acoustic correlates of speech rhythm (%V, ΔV[ln], ΔC[ln], n-PVI-V) and articulation rate were obtained from two Persian corpora with different sources of within-speaker variability. In the first corpus, the source of within-speaker variability mainly comes from non-contemporaneous recording sessions, and in the second corpus, from different speech rates. Results revealed that there were significant differences between speakers in all investigated speech rhythm measures in Persian and %V best discriminated between speakers. This reveals that the lack of typical stress-time features does not affect between-speaker variability in speech rhythm.
Author: Homa Asadi, Mandana Nourbakhsh, Lei He, Elisa Pellegrino, Volker Dellwo
References :
Amino, K. and Arai, T. (2009) Speaker-dependent characteristics of the nasals. Forensic Science International 185(1–3): 21–28. https://doi.org/10.1016/j.forsciint.2008.11.018
Bates, D., Maechler, M., Bolker, B. and Walker, S. (2016) lme4: Linear mixed-effects models using Eigen and S4 (R package version 1.1-7). http://CRAN.R project.org/ package=lme4. Accessed 24 November 2016.
Boersma, P. and Weenink, D. (2013) Praat: doing phonetics by computer. http://www.praat.org, Accessed 13 July 2013.
Dellwo, V. (2010) Influences of speech rate on the acoustic correlates of speech rhythm: an experimental phonetic study based on acoustic and perceptual evidence. PhD dissertation, Bonn University.
Dellwo, V. and Fourcin, A. (2013) Rhythmic characteristics of voice between and within languages. Travaux Neuchâtelois de Linguistique 59: 87–107.
Dellwo, V., Huckvale, M. and Ashby, M. (2007) How is individuality expressed in voice? An introduction to speech production and description for speaker classification. In C. Müller (ed.) Speaker Identification vol. 1: Fundamentals, Features, and Methods 1–20. Berlin: Springer Verlag. https://doi.org/10.1007/978-3-540-74200-5_1
Dellwo, V., Leemann, A. and Kolly, M. (2012) Speaker idiosyncratic rhythm features in the speech signal. In Interspeech-2012: 1584–1587. Portland, OR, USA.
Dellwo, V., Leeman, A. and Kolly, M. (2015) Rhythmic variability between speakers: articulatory, prosodic, and linguistic factors. Journal of the Acoustical Society of America 137(3): 1513–1528. https://doi.org/10.1121/1.4906837
Gold, E., and French, J. P (2011) International practices in forensic speaker comparison. International Journal of Speech, Language and the Law 18(2): 293–307. https://doi.org/10.1558/ijsll.v18i2.293
Gold, E., French, J. P. and Harrison, P (2013) Examining long-term formant distributions as a discriminant in forensic speaker comparisons under a likelihood ratio framework. Proceedings of Meetings on Acoustics 19(1): 1–8. https://doi.org/10.1121/1.4800285
Goldstein, U. (1976) Speaker-identifying features based on formant tracks. Journal of the Acoustical Society of America 59(1): 176–182. https://doi.org/10.1121/1.380837
Gordon, M, Barthmaier, P. and Sands, K. (2002) A cross-linguistic study of voiceless fricatives. Journal of the International Phonetic Association 32(2): 2–32. https://doi.org/10.1017/S0025100302001020
Grabe, E. and Low, E. L. (2002) Durational variability in speech and rhythm class hypothesis. In N. Warner and C. Gussenhoven (eds.) Papers in Laboratory Phonology vol.7: 515–543. Berlin and New York: Mouton de Gruyter.
He, L. (2018) Development of speech rhythm in first language: the role of syllable intensity variability. Journal of the Acoustical Society of America 143(6): 463–467. https://doi.org/10.1121/1.5042083
He, L. and Dellwo, V. (2014) Speaker idiosyncratic variability of intensity across syllables. In Interspeech-2014: 233–237. Singapore.
He, L. and Dellwo, V. (2016) The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language and the Law 23(2): 243–273. https://doi.org/10.1558/ijsll.v23i2.30345
Hudson, T., de Jong, G., McDougall, K., Harrison, P. and Nolan, F. (2007) F0 statistics for 100 young male speakers of Standard Southern British English. In Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken: 1809–1812.
IBM Corp. (2012) IBM SPSS Statistics for Windows (version 21.0). Armonk, NY: International Business Machines Corporation.
Jessen, M. (2008) Forensic phonetics. Language and Linguistics Compass 2(4): 671–711. https://doi.org/10.1111/j.1749-818X.2008.00066.x
Kahn, J., Audibert, J. F. B. and Rossato, S. (2011) Inter and intra-speaker variability in French: an analysis of oral vowels and its implication for automatic speaker verification. In Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong: 1002–1005.
Kinoshita, Y. (2002) Use of likelihood ratio and Bayesian approach in forensic speaker identification. In Proceedings of the 9th Australian International Conference on Speech Science and Technology, Melbourne, Australia: 297–302.
Kinoshita Y. (2005) Does Lindley’s LR estimation formula work for speech data? Investigation using long-term F0. International Journal of Speech, Language and the Law 12(2): 235–254. https://doi.org/10.1558/sll.2005.12.2.235
Lazard, G. (1992) Grammar of Contemporary Persian. Costa Mesa, CA: Mazda Publishers.
Leemann, A., Kolly, M.-J. and Dellwo, V. (2014) Speaker-individuality in suprasegmental temporal features: implications for forensic voice comparison. Forensic Science International 238: 59–67. https://doi.org/10.1016/j.forsciint.2014.02.019
Lindh J. (2006) Preliminary descriptive F0-statistics for young male speakers. Lund University Working Papers 52: 89–92.
Marcus, S. (1981) Acoustic determinants of perceptual center (P-center) location. Perception and Psychophysics 30(3): 247–256. https://doi.org/10.3758/BF03214280
Morrison, G. S. (2010) Forensic voice comparison. In I. Freckelton and H. Selby (eds) Expert Evidence Ch. 99. Sydney: Thomson Reuters.
Nolan, F. (1983) The Phonetic Bases of Speaker Recognition. Cambridge: Cambridge University Press.
Nolan, F. and Asu, E. L. (2009) The pairwise variability index and coexisting rhythms in language. Phonetica 66(1–2): 64–77. https://doi.org/10.1159/000208931
Nolan, F. and Grigoras, C. (2005) A case for formant analysis in forensic speaker identification. International Journal of Speech Language and the Law 12(2): 143–173. https://doi.org/10.1558/sll.2005.12.2.143
Prieto, P., del Mar Vanrell, M., Astruc, L., Payne, E. and Post, B. (2012) Phonotactic and phrasal properties of speech rhythm. Evidence from Catalan, English, and Spanish. Speech Communication 54(6): 681–702. https://doi.org/10.1016/j.specom.2011.12.001
R Core Team (2014) R: A Language and Environment for Statistical Computing (version 3.3.3). R Foundation for Statistical Computing. http://www.Rproject.org, Accessed 20 November 2016.
Ramus, F., Nespor, M. and Mehler, J. (1999) Correlates of linguistic rhythm in the speech signal. Cognition 73(3): 265–292. https://doi.org/10.1016/S0010-0277(99)00058-X
Roach, P. (1983) English Phonetics and Phonology. Cambridge: Cambridge University Press.
Rose, P. (2002) Forensic Speaker Identification. New York: Taylor & Francis. https://doi.org/10.1201/9780203166369
Rose, P. (2003) The technical comparison of forensic voice samples. In I. S. Freckleton and H. Selby (eds) Expert Evidence Ch. 99. North Ryde: Lawbook Co.
Rose, P. (2007) Forensic speaker discrimination with Australian English vowel acoustics. In Proceedings of the 16th International Congress of Phonetic Sciences. Saarbrücken: 1817–1820.
Sadeghi, V. (2015) A phonetic study of vowel reduction in Persian. Language Related Research 30: 165–187.
Schindler, C. and Draxler, C. (2013) Using spectral moments as a speaker specific feature in nasals and fricatives. In Interspeech-2013: 2793–2796. Lyon, France.
Sheikh Sangtajan, Sh. and Bijankhan, M. (2010) The study of vowel reduction in Persian spontaneous speech. Journal of Research in Linguistics 2: 35–48.
Wiget, L., White, L., Schuppler, B., Grenon, I., Rauch, O. and Mattys, S. L. (2010) How stable are acoustic metrics of contrastive speech rhythm? Journal of the Acoustical Society of America 127(3): 1559–1569. https://doi.org/10.1121/1.3293004
Windfuhr, G. L. (1979) Persian Grammar: History and State of its Study. New York: Mouton de Gruyter. https://doi.org/10.1515/9783110800425
Wolf, J. J. (1972) Efficient acoustic parameters for speaker recognition. Journal of the Acoustical Society of America 51(68): 255–272. https://doi.org/10.1121/1.1913065
Yavaş, M. (2011) Applied English Phonology. Chichester: Wiley-Blackwell. https://doi.org/10.1002/9781444392623
Yoon, T. J. (2010) Capturing inter-speaker invariance using statistical measures of speech rhyth”. In Electronic Proceedings of Speech Prosody: 1–4. Chicago, USA.