Effect of covert recordings from vehicles on the performance of forensic automatic speaker recognition
Issue: Vol 24 No. 1 (2017)
Journal: International Journal of Speech Language and the Law
Subject Areas: Linguistics
DOI: 10.1558/ijsll.30985
Abstract:
Speech recordings made in vehicles with hidden recording devices (“bugs”) present a number of peculiarities compared to high-quality direct microphone recordings, and also ordinary mobile phone recordings. Inspired by a recent court case the present investigation was undertaken to quantitatively assess the effect of three types of covert recordings, including data transmission, on a widely-used forensic automatic speaker recognition (FASR) system (Batvox 3.1). Using simultaneous high-quality re-recording of speech samples from 50 speakers played back inside four automobiles and one 5-ton RV as a benchmark, equal-error rates (EERs) between zero and 2 % were found. The latter number was obtained for GSM-transmitted voice data. When the acoustic data were not transmitted but stored inside the covert recording device, or recorded and stored on a smartphone placed inside the vehicles, EERs between zero to 0.45 % were obtained. Generally, EERs are similar to those obtained using the same FASR system in studies with non-covert recordings (direct recordings, landline and mobile telephone). No effect of the type or model of the vehicles on EERs was observed.
Author: Hermann J. Künzel
References :
Agnitio Voice Biometrics (2009) Batvox 3.0 Basic User Manual. Available at www.agnitio.es/ingles/contacto. See also www.agnitio-corp-com/sites/default/files/BATVOX-Datasheet.pdf
Bautista-Tapias, R. (2005) Sistemas Forenses de Reconocimiento de Locutor (Proyecto Fín de Carrera, Universidad Politécnica de Madrid, Madrid, Spain, 2005).
Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D.A. (1998) Sheeps, goats, lambs and wolves: A statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. Proc. International Conference on Spoken Language Processing, Sydney, 37-40.
Künzel, H.J. (2007): Non-contemporary speech samples: Auditory detectability of an 11 year delay and its effect on automatic speaker identification. In: International Journal of Speech, Language and the Law 14(1), 109-136.
Künzel, H.J (2010): Automatic spear identification of identical twins. International Journal of Speech, Language and the Law 17(1), 251-277.
Künzel, H.J. (2013): Automatic Speaker Recognition with Cross-Language Speech Material. In: International Journal of Speech, Language and the Law 20(1), 21-44.
Künzel, H.J and Alexander, P. (2014) Automatic speaker recognition with degraded and enhanced speech. J. Audio Eng. Soc. 62.4, 244-252.
Ramos-Castro, D. (2007): Forensic evaluation of the evidence using automatic speaker recognition systems, (PhD Diss., Universidad Autónoma de Madrid, Madrid, Spain, 2007).