Item Details

A phonetic case study on prosodic variability in suicidal emergency calls


Journal: International Journal of Speech Language and the Law

Subject Areas: Linguistics

DOI: 10.1558/ijsll.39667


Speech prosody has been applied in numerous speech emotion recognition tasks. Yet, especially in forensic speech science, a need for acoustic-phonetic analyses with human evaluation still exists since many current speech emotion models are trained with speech data wherein emotions are considered as constant states and the dynamic effects of the interlocutor have been disregarded; for instance, during an emergency call, the caller’s emotional prosody varies according to the communication with the emergency operator, which causes problems for existing speech emotion models when analysing individual emergency recordings. In this phonetic case study, prosodic variation was investigated in two suicidal emergency calls; eight prosodic features from two adult male callers were analysed before and after hearing the emergency operators’ offer to help. In addition, the existence of a possible linear association between the emergency operator’s and the caller’s prosodic features were evaluated. The results show that caller and operator pitch are negatively correlated (−0.33), and half of callers’ prosodic features vary significantly (p < 0.05) after hearing the offer of help.

Author: Lauri Tavi, Stefan Werner

View Full Text

References :

Alghowinem, S. Goecke, R. Wagner, M. Epps, J. Breakspear, M. and Parker, G. (2013) Detecting depression: a comparison between spontaneous and read speech. Proceedings Acoustics, Speech and Signal Processing (ICASSP) IEEE International Conference 2013, 7547–7551, Vancouver, Canada.

Berninger, K., Hoppe, J. and Milde, B. (2016) Classification of Speaker Intoxication Using a Bidirectional Recurrent Neural Network. International Conference on Text, Speech, and Dialogue: 435–442. Springer, Cham.

Biadsy, F., Wang, W. Y., Rosenberg, A. and Hirschberg, J. (2011) Intoxication detection using phonetic, phonotactic and prosodic cues, Proceedings INTERSPEECH 2011, 3209–3212, Florence, Italy.

Boersma, P. and Weenink, D. (2017) Praat: doing phonetics by computer [Computer program]. Version 6.0.36 [available at: http:/].

Bone, D., Li, M., Black, M. P. and Narayanan, S. S.  (2014) Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors. Computer Speech & Language 28(2): 375–391.

Brady, J. (2006) The association between alcohol misuse and suicidal behaviour. Alcohol and Alcoholism, 41(5): 473-478.

Campbell, N. and Mokhtari, P. (2003) Voice quality: the 4th prosodic dimension. Proceedings 15th ICPhS 2003, 2417–2420, Barcelona, Spain.

C-PROM (Antoine Auchlin, U. Genève, Mathieu Avanzi, U. Neuchâtel/Paris X, Jean-Philippe Goldman, U. Genève, Anne Catherine Simon, UC Louvain). Primary data (corpus). Université de Genève, Département de linguistique (UNIGE, Genève CH), Centre de recherche Valibel - Discours et variation (Valibel, Louvain BE), Université de Neuchâtel (UniNE, Neuchâtel CH), Modèles, dynamiques, corpus - UMR 7114 (MoDyCo, Paris FR). Created 2010-06-24. Speech and Language Data Repository (SLDR/ORTOLANG). Identifier hdl:11041/c-prom-000250

Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J. and Quatieri, T. F. (2015) A review of depression and suicide risk assessment using speech analysis. Speech Communication 71: 10–49.

Cummins, N., Epps, J., Breakspear, M. and Goecke, R. (2011) An investigation of depressed speech detection: Features and normalization. Proceedings INTERSPEECH 2011, 2997–3000, Florence, Italy.

Demenko, G. (2008). Voice stress extraction. Proceedings of Speech Prosody 2018, 53–56, Campinas, Brasil.

Farrus, M. (2008) Fusing prosodic and acoustic information for speaker´ recognition. PhD Thesis, Polytechnic University of Catalonia.

Hollien, H., Dejong, G., Martin, C. A., Schwartz, R. and Liljegren, K. (2001) Effects of ethanol intoxication on speech suprasegmentals. The Journal of the Acoustical Society of America, 110(6): 3198–3206.

Kirchhübel, C., Howard, D. M. and Stedmon A. W. (2011) Acoustic correlates of speech when under stress: research, methods and future directions. International Journal of Speech Language and the Law 18(1): 75–98.

Ling, L. E., Grabe, E. and Nolan, F. (2000) Quantitative characterizations of speech rhythm: Syllable-timing in Singapore English. Language and speech 43(4), 377-401.

Meyer, P., Buschermohle, E. and Fingscheidt, T. (2018) What Do Classifiers Actually Learn? a Case Study on Emotion Recognition Datasets. Proceedings INTERSPEECH 2018, 262–266, Hyderabad, India.

Origlia, A., Cutugno, F. and Galata, V. (2014) Continuous emotion recognition with phonetic syllables. Speech Communication (57): 155–169.

Quatieri, T.F. and Malyska, N. (2012) Vocal-source biomarkers for depression: a link to psychomotor activity. Proceedings INTERSPEECH 2012, 1059–1062, Portland, USA.

R Core Team (2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [Available at:]

Scherer, S., Pestian, J. and Morency, L. P. (2013) Investigating the speech characteristics of suicidal adolescents. Proceedings Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference 2013 709–713, Vancouver, Canada.

Schiel, F. and Heinrich, C. (2015) Disfluencies in the speech of intoxicated speakers. International Journal of Speech, Language & the Law 22(1): 19–33.

Schuller, B., Steidl, S., Batliner, A., Schiel, F. and Krajewski, J. (2011) The Interspeech 2011 speaker state challenge. Proceedings INTERSPEECH 2011, 3201–3204, Florence, Italy.

Sobin, C. and Sackeim, H. A. (1997) Psychomotor symptoms of depression. American Journal of Psychiatry 154(1): 4–17.

Swain, M., Routray, A., & Kabisatpathy, P. (2018). Databases, features and classifiers for speech emotion recognition: a review. International Journal of Speech Technology 21(1), 93–120.

Ververidis, D. and Kotropoulos, C. (2006) Emotional speech recognition: Resources, features, and methods. Speech communication 48(9): 1162–1181.

Williamson, J. R., Young, D., Nierenberg, A. A., Niemi, J., Helfer, B. S. and Quatieri, T. F. (2019) Tracking depression severity from audio and video based on speech articulatory coordination. Computer Speech & Language 55: 40–56.

Yeh, S.L., Lin, Y.S. and Lee, C.C. (2019) An Interaction-aware Attention Network for Speech Emotion Recognition in Spoken Dialogs. Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019, 6685–6689, Brighton, England.

Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Deng, Z., Lee, S., Narayanan, S., Busso, C. (2004) An Acoustic Study of Emotions Expressed in Speech. Proceedings INTERSPEECH 2004, 2193-2196, Jeju Island, Korea.