Item Details

Speaker-individuality in Fujisaki model f0 features: implications for forensic voice comparison

Issue: Vol 21 No. 2 (2014)

Journal: International Journal of Speech Language and the Law

Subject Areas: Linguistics

Abstract:

Fundamental frequency (f0) is a highly speaker-specific feature. Consequently, practitioners often use f0 information in forensic casework. Current research principally examines the use of long-term f0 statistics such as f0 means and standard deviations for forensic voice comparison. The present study investigates how short-term f0 features such as measured by the Fujisaki intonation model capture speaker-individuality. Based on data of a homogeneous group of Zurich German speakers, we conducted an experiment on a large corpus of read speech and on a subset of sentences that included speaking style variability (spontaneous vs. read). The latter is characteristic of forensic casework. Speakers demonstrated high between-speaker variability and low within-speaker variability across the two speaking styles for a number of f0 features. Given this evidence of speaker-individuality, we discuss Fujisaki f0 features’ potential for forensic voice comparison.

Author: Adrian Leeman, Hansjörg Mixdorff, Maria O'Reilly, Marie-José Kolly, Volker Dellwo

View Original Web Page

References :

Abberton, E. and Fourcin A.J. (1978) Intonation and speaker identification. Language and Speech 21.4: 305--318.
Adriaens, L.M.H. (1991) Ein Modell deutscher Intonation. PhD Thesis, University of Technology, Eindhoven.
Atal, B.S. (1972) Automatic speaker recognition based on pitch contours. Journal of the Acoustical Society of America 52.6: 1687--1697.
Author name disclosed (2012) Title removed. Amsterdam/New York: Benjamins.
Author name disclosed (1998) Title removed. PhD thesis, TU Dresden.
Author name disclosed (2000a) Title removed. Proceedings of ICASSP 2000, Istanbul, Turkey: 1285--1288.
Author name disclosed (2000b) Title removed. Proceedings of ICASSP 2000, Instanbul, Turkey:1281--1284.
Author name disclosed (2004) Title removed. Proceedings of TAL 2004, Beijing, China: 137--142.
Author name disclosed (2009) Program for estimating Fujisaki-parameters. Unpublished manual. Retrieved on 19 August 2013 from Website removed.
Author name disclosed and Fujisaki, H. (2000) Title removed. ICSLP 2000, Beijing, China: 98--110.
Author name disclosed. (2008) Title removed, Proceedings of Interspeech, 2008, Brisbane: 136--139.
Author name disclosed. (2013) Title removed. PhD Thesis, Trinity College Dublin.
Baayen, R.H. (2008) Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: CUP.
Baayen, R.H. (2009) languageR: Data sets and functions with “Analyzing Linguistic Data: A practical introduction to statistics using R”. R package version 0.955.
Bates, D.M. and M. Maechler (2009) lme4. Linear mixed-effects models using S4 classes. R package version 0.999375-32.
Baumann, S. (2006) Information structure and prosody: Linguistic categories for spoken language annotation. In S. Sudhoff et al. (eds) Methods in empirical prosody research 153--180. Berlin: de Gruyter.
Braun, A. (1992) Zur Bedeutung des Merkmals “mittlere Sprechstimmlage” in der forensischen Sprechererkennung. In H. R. Dingeldein (ed.) Festschrift für J. Göschel 1--26. Marburg: Universitätsbibliothek.
Braun, A. (1995) Fundamental frequency – How speaker-specific is it? In A. Braun and J.P. Köster (eds) Studies in Forensic Phonetics: Beiträge zur Phonetik und Linguistik 64: 9--23.
Boersma, P. (1993) Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of sampled sound. Proceedings of the Institute of Phonetic Sciences, University of Amsterdam 17: 97--100.
Boersma, P., D. Weenink (2014). Praat – Doing phonetics by Computer. http://www.fon.hum.uva.nl/praat/
Bolinger, D. (1989) Intonation and its uses. Melody in grammar and discourse. Standard CA: Stanford University Press.
Botinis, A., Granström, B. and Möbius, B. (2001) Developments and paradigms in intonation research. Speech Communication 33: 263--296.
Duez, D. (1982) Silent and non-silent pauses in three speech styles. Language and Speech 25: 11--28.
Fleischer, J. and Schmid, S. (2006) Zurich German. Journal of the International Phonetic Association 25.2: 243--253.
Fujisaki, H. (1981) Dynamic characteristics of voice fundamental frequency in speech and singing. Acoustical analysis and physiological interpretations, Quarterly Progress and Status Report, Department for Speech, Music, and Hearing, KTH Stockholm: 1--20.
Fujisaki, H. (1987) A note on the physiological and physical basis for the phrase and accent components in the voice fundamental frequency contour. Annual Bulletin, Research Institute for Logopaedics and Phoniatrics, Faculty of Medicine, University of Tokyo 21: 65--75.
Fujisaki, H. (1992) The role of quantitative modeling in the study of intonation. Proceedings of the International Symposium on Japanese Prosody: 163--174.
Fujisaki, H. (2004) Prosody, information, and modeling – with emphasis on tonal features of speech. Proceedings of Speech Prosody 2004, Nara, Japan: 1--10.
Fujisaki, H. and Hirose, K. (1982) Modeling the dynamic characteristics of voice fundamental frequency with applications to analysis and synthesis of intonation. Preprints of the Working Group on Intonation, 13th International Congress of Linguistics, Tokyo: 57--70.
Fujisaki, H., Hirose, K. and Ohta, K. (1979) Acoustic features of the fundamental frequency contours of declarative sentences in Japanese. Annual Bulletin, Research Institute for Logopaedics and Phoniatrics, Faculty of Medicine, University of Tokyo 13: 163--172.
Gfroerer, S., Wagner, I. (1995) Fundamental frequency in forensic speech samples.
In A. Braun and J.P. Köster (eds.) Studies in Forensic Phonetics: Beiträge zur Phonetik und Linguistik 64: 41--48.
Gilles, P. (2005) Regionale Prosodie im Deutschen: Variabilität in der Intonation von Abschluss und Weiterweisung. Berlin: de Gruyter.
Gold, E. and French, P. (2011) International practices in forensic speaker comparison. The International Journal of Speech, Language, and the Law 18.2: 293--307.
Goldsmith, J. (1976) Autosegmental and metrical phonology. New York: Garland.
Hall, T. (2011) Phonologie – Eine Einführung. Berlin: de Gruyter.
Hirst, D. J. and Di Cristo, A. (1998) A survey of intonation systems. In D. J. Hirst and A. Di Cristo (eds) Intonation Systems: A Survey of Twenty Languages 1--44. Cambridge: CUP.
Hollien, H. and Jackson, B. (1973) Normative data on the speaking fundamental frequency characteristics of young adult males. Journal of Phonetics 1: 117--120.
Jessen, M., Köster, O. and Gfroerer, S. (2005) Influence of vocal effort on average and variability of fundamental frequency. Journal of Speech, Language and the Law 12:2: 174--213.
JMP. Version 9.0, Cary NY, SAS Institute Inc. 1989-2007.
Kehrein, R. (2002) Prosodie und Emotionen. Tübingen: Niemeyer.
Kliegl, P., Wei, P., Dambacher, M., Yan, M. and Zhoug X. (2011) Experimental effects and individual differences in linear mixed models: estimating the relationship between spatial, object, and attraction effects in visual attention. Frontiers in Psychology 1.238: 1--12.
Kraayeveld, H. (1997) Idiosyncrasy in prosody. Speaker and speaker group identification in Dutch using melodic and temporal information. Doctoral thesis, Katholieke Universiteit Nijmegen.
Künzel, H., (2000) Effects of voice disguise on speaking fundamental frequency. Forensic Linguistics 7:2: 149--179.
Künzel, H., Masthoff, H.R. and Köster, J.P. (1995) The relation between speech tempo, loudness, and fundamental frequency: an important issue in forensic speaker recognition. Science and Justice 35:4: 291--295.
Labutin, P., Koval, S. and Raev, A. (2007) Speaker identification based on the statistical analysis of f0. Proceedings of IAFPA 2007, Plymouth, UK.
Ladd, D.R. (1996) Intonational Phonology. Cambridge: CUP.
Möbius, B. (1993) Ein quantitatives Modell der deutschen Intonation: Analyse und Synthese von Grundfrequenzverläufen. Tübingen: Niemeyer.
Murray, I. R. and J. L. Arnott (1993) Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. Journal of the Acoustic Society of America 93: 1097--1108.
Nolan, F. (2002) Intonation in speaker identification: an experiment on pitch alignment features. International journal of Speech, Language and the Law 9/1: 1--21.
Nolan, F. (2009) The Phonetic Bases of Speaker Recognition. CUP, Cambridge.
R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computer, Version 3.0.0, http://www.R-project.org.
Rose, P., and Morrison, G.S. (2009) A response to the UK Position Statement on forensic speaker comparison. International Journal of Speech, Language and the Law 15:1: 139--163.
Selting, M. (1995) Prosodie im Gespräch. Aspekte einer interaktionalen Phonologie der Konversation. Tübingen: Niemeyer.
Siebenhaar, B. (in press) Phonological and phonetic considerations for a classification of Swiss German dialects as a word language or syllable language. In R. Szczepaniak and J.C. Reina (eds) Phonological Typology of Syllable and Word Languages in Theory and Practice. Berlin: de Gruyter.
Sievers, E. (1881) Grundzüge der Phonetik. Leipzig: Breitkopf und Hartel.
Silverman, K., Beckman, M.E., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J. and Hirschberg, J. (1992) TOBI: A standard for labeling English prosody. Proceedings of the International Conference on Spoken Language Processing 2: 867--870.
Taylor, P. (1994) A Phonetic Model of Intonation in English. Bloomington, IN: Indiana University Linguistics Club Publications.
Umeda, N. (1982) “F0 declination” is situation dependent. Journal of Phonetics 10: 279--290.
Vaissière, J. (1983) Language-independent prosodic features. In A. Cutler and D.R. Ladd (eds), Prosody: Models and measurement 53--66. New York: Springer.
Zellner Keller, B. (2006) F0 and intensity distributions of Marsec Speakers: Types of Speaker Prosody. In M. Faundez-Zanuy et al. (eds) Lecture Notes in Computer Science 116--124. Berlin: Springer.