Item Details

More is better: likelihood ratio-based forensic voice comparison with vocalic segmental cepstra frontends

Issue: Vol 20 No. 1 (2013)

Journal: International Journal of Speech Language and the Law

Subject Areas: Linguistics

DOI: 10.1558/ijsll.v20i1.77

Abstract:

The suitability of vowel cepstral spectra for forensic voice comparison is explored within a likelihood ratio-based framework, and non-technical explanations provided for some basic concepts of cepstral analysis and forensic voice comparison. Non-contemporaneous landline telephone recordings of 297 male Japanese speakers are compared using only two replicates per recording of each of their five read-out vowels. 14 cepstrally-mean-subtracted LPC cepstral coefficients modelling the spectral shape to 5 kHz are used as features. When evaluated intrinsically with kernel density multivariate likelihood ratios, all 297 same-speaker comparisons are correctly discriminated as coming from the same speaker, and only 173 of the 43,956 different-speaker comparisons (0.4%) are incorrectly evaluated as coming from the same speaker. The log-likelihood ratio cost for this comparison is very low at 0.013. Fusion with a speaker’s long-term spectral data marginally improves the different-speaker error rate to 0.27% and the log-likelihood ratio cost to 0.009. It is concluded that the approach warrants further examination.

Author: Phil Rose

View Original Web Page

References :

Aitken, C.G.G. and Lucy, D. (2004) Evaluation of trace evidence in the form of multivariate data. Applied Statistics 53(4): 109–122. http://dx.doi.org/10.1046/j.0035-9254.2003.05271.x
Brümmer, N. (2011) Tutorial for Bayesian forensic likelihood ratio. Ms.: 1–14. Available at: https://sites.google.com/sites/nikobrummer.
Brümmer, N. and du Preez, J. (2006) Application independent evaluation of speaker detection. Computer Speech and Language 20(2–3): 230–275. http://dx.doi.org/10.1016/j.csl.2005.08.001
Campbell, W.M., Brady, K.G., Campbell, J.P., Granville, R. and Reynolds, D.A. (2006) Understanding scores in forensic speaker recognition. Proceedings IEEE Odyssey Speaker and Language Recognition Workshop. http://dx.doi.org/10.1109/ODYSSEY.2006.248091
Clermont, F. and Itahashi, S. (2000) Static and dynamic vowels in a ‘cepstro-phonetic’ sub-space. Journal of the Acoustic Society of Japan 21(4): 221–223. http://dx.doi.org/10.1250/ast.21.221
Clermont, F. and Mokhtari, P. (1994) Frequency-band specification in cepstral distance computation. In R. Togneri (ed.) Proceedings of the 5th Australian International Conference on Speech Science and Technology: 354–359.
Doherty, E.T. (1976) An evaluation of selected acoustic parameters for use in speaker identification. Journal of Phonetics 4: 321–326.
Evett, I.W., Scrange, J. and Pinchin, R. (1993) An illustration of the advantages of efficient statistical methods for RFLP analysis in forensic science. American Journal of Human Genetics 52: 498–505.
Furui, S. (1981) Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics Speech and Signal Processing 29(2): 254–272. http://dx.doi.org/10.1109/TASSP.1981.1163530
Furui, S. and Akagi, M. (1985) Perception of voice individuality and physical correlates. Acoustical Society of Japan Technical Report H85-18: 1–8.
Furui, S., Itakura, F. and Saito, S. (1972) Talker recognition by longtime averaged speech spectrum. Electronics and Communications in Japan 55-A(10): 54–61.
Furui, S. and Matsui, T. (1994) Phoneme-level voice individuality used in speaker recognition. Proceedings of ICSLP: 1463–1466.
Garcia, A.A. and Mammone, R.J. (1999) Channel-robust speaker identification using modified-mean cepstral mean normalisation with frequency warping. Proceedings of ICASSP 1: 325–328.
Gonzalez-Rodriguez, J. (2011) Speaker recognition using temporal contours in linguistic units: the case of formant and formant-bandwidth trajectories. Proceedings of Interspeech 2011: 133–135.
Gonzalez-Rodriguez J., Drygajlo A., Ramos-Castro D., Garcia-Gomar M. and Ortega-Garcia, J. (2006) Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Computer Speech and Language 20(2–3): 331–355. http://dx.doi.org/10.1016/j.csl.2005.08.005
Gonzalez-Rodriguez, J., Rose, P., Ramos, D., Torre, D. and Ortega-García, J. (2007) Emulating DNA: rigorous quantification of evidential weight in transparent and testable forensic speaker recognition. IEEE Transactions Audio Speech and Language Processing 15(7): 2104–2115. http://dx.doi.org/10.1109/TASL.2007.902747
Guillemin, B. and Watson, C. (2008) The impact of the GSM mobile phone network on the speech signal. International Journal of Speech Language and the Law 15(2): 193–218.
Hepler, A.B., Saunders, C.P., Davis, L.J. and Buscaglia, J. (2012) Score-based likelihood ratios for handwriting evidence. Forensic Science International 219(1–3): 129–140. http://dx.doi.org/10.1016/j.forsciint.2011.12.009
Hollien, H. and Majewski, W. (1977) Speaker identification by long-term spectra under normal and distorted speech conditions. Journal of the Acoustical Society of America 62: 975–979. http://dx.doi.org/10.1121/1.381592
Holmes, J.N., Holmes, W.J. and Garner, P.N. (1997) Using formant frequencies in speech recognition. Eurospeech 97: 2083–2086.
Ishihara, S. (2012) A forensic text comparison in SMS messages: a likelihood ratio approach with lexical features. In N. Clarke, T. Tryfonas and R. Dodge (eds) Proceedings of the 7th International Workshop on Digital Forensics and Incident Analysis: 55–65.
Khodai-Joopari, M. (2006) Forensic speaker analysis and identification by computer. A Bayesian approach anchored in the cepstral domain. Unpublished PhD thesis, University of New South Wales.
Khodai-Joopari, M., Clermont, F. and Barlow, M. (2004) Speaker variability on a continuum of spectral sub-bands from 297-speakers’ non-contemporaneous cepstral of Japanese vowels. In S. Cassidy (ed.) Proceedings of the 10th Australian International Conference on Speech Science and Technology: 504-509.
Morrison, G.S. (2010) Forensic voice comparison. In I. Freckelton and H. Selby (eds) Expert Evidence. Ch 99. Sydney: Thomson Reuters.
Morrison, G.S. (2011a) A comparison of procedures for the calculation of forensic likelihood rations from acoustic-phonetic data: multivariate kernel density (MVKD) versus Gaussian mixture model-universal background model (GMM-UBM). Speech Communication 53: 24–256. http://dx.doi.org/10.1016/j.specom.2010.09.005
Morrison, G.S. (2011b) Measuring the validity and reliability of forensic likelihood-ratio systems. Science and Justice 51: 91–98. http://dx.doi.org/10.1016/j.scijus.2011.03.002
Morrison, G.S. (2012) Tutorial on logistic regression calibration and fusion: converting a score to a likelihood ratio. Australian Journal of Forensic Sciences: 1–25.
Nair, B.B.T., Alzqhoul, E.A.S. and Guillemin, B. (2012) A new approach to computing likelihood ratios based on principal component analysis. Paper given at UNSW Forensic Speech Science Conference, 3 December 2012, Sydney, Australia. Abstract available at: http://sydney2012.forensic-voice-comparison.net/
Neuman, C., Evett, I.W. and Skerrett, L. (2012) Quantifying the weight of evidence from a forensic fingerprint comparison: a new paradigm. Journal of the Royal Statistical Society 175(2): 371–415. http://dx.doi.org/10.1111/j.1467-985X.2011.01027.x
Osanai, T., Tanimoto, M., Kido, H. and Suzuki, T. (1995) Text-dependent speaker verification using isolated word utterances based on dynamic programming. [In Japanese]. National Research Institute for Police Science Report 48(1): 15–19.
Pigeon, S., Druyts, P. and Verlinde, P. (2000) Applying logistic regression to the fusion of the NIST’99 1-speaker submissions. Digital Signal Processing 10(1–3): 237–248. http://dx.doi.org/10.1006/dspr.1999.0358
Rabiner, L. and Juang, B.-H.J. (1993) Fundamentals of Speech Recognition. Englewood Cliffs: Prentice-Hall.
Ramos-Castro, D., Gonzalez-Rodriguez, J. and Ortega-Garcia, J. (2006) Likelihood ratio calibration in a transparent and testable forensic speaker recognition framework. Proceedings of IEEE Odyssey.
Rose, P. (2003) The technical comparison of forensic voice samples. In I. Freckelton and H. Selby (eds) Expert Evidence 1051–6102 . Sydney: Thomson Reuters.
Rose, P. (2006) Technical forensic speaker recognition: evaluation, types and testing of evidence. Computer Speech and Language 20(2–3): 159–191. http://dx.doi.org/10.1016/j.csl.2005.07.003
Rose, P. (2010a) The effect of correlation on strength of evidence estimates in forensic voice comparison: uni- and multivariate likelihood ratio-based discrimination with Australian English vowel acoustics. International Journal of Biometrics 2(14): 316–329. http://dx.doi.org/10.1504/IJBM.2010.035447
Rose, P. (2010b) Bernard’s 18-vowel inventory size and strength of forensic voice comparison evidence. In M. Tabain, J. Fletcher, B. Grayden, J. Hajek and A. Butcher (eds) Proceedings of the 13th Australasian International Conference on Speech Science and Technology: 30–33.
Rose, P. (2011a) Forensic voice comparison with secular shibboleths – a hybrid fused GMM-multivariate likelihood ratio-based approach using alveolo-palatal fricative cepstral spectra. Proceedings of the International Conference on Acoustics Speech and Signal Processing: 5900–5903.
Rose, P. (2011b) Forensic voice comparison with Japanese vowels – a likelihood ratio-based approach using segmental cepstra. In W. Lee and E. Zee (eds) Proceedings of the 17th International Congress of Phonetic Sciences: 1718–1721.
Rose, P. and Clermont, F. (2001) A comparison of two acoustic methods for forensic speaker discrimination. Acoustics Australia 29(1): 31–35.
Rose, P., Osanai, T. and Kinoshita, Y. (2003) Strength of forensic speaker identification evidence – multispeaker formant and cepstrum based segmental discrimination with a Bayesian likelihood ratio as threshold. International Journal of Speech Language and the Law. 10(2): 179–202.
Shibatani, M. (1990) The Languages of Japan. Cambridge: Cambridge University Press.
Wells, J.C. (1982) Accents of English. Cambridge: Cambridge University Press.
Yim, A.C.S. and Rose, P. (2012) Are nasals better? Likelihood ratio-based forensic voice comparison with segmental cepstra from Cantonese and Japanese syllabic/mora nasals. In F. Cox, K. Demuth, S. Lin, K. Miles, S. Palethorpe, J. Shaw and I. Yuen (eds) Proceedings of the 14th Australasian International Conference on Speech Science and Technology: 5–8.
Zahorian, S.A. and Jagharghi, A.J. (1993) Spectral-shape features versus formants as acoustic correlates for vowels. Journal of the Acoustical Society of America 94(4): 1966–1982. http://dx.doi.org/10.1121/1.407520