Strength of forensic text comparison evidence from stylometric features: a multivariate likelihood ratio-based analysis
Issue: Vol 24 No. 1 (2017)
Subject Areas: Linguistics
An experiment in forensic text comparison (FTC) within the likelihood ratio (LR) framework is described, in which authorship attribution was modelled with word- and character-based stylometric features. Chatlog messages of 115 authors were selected from a chatlog archive containing real pieces of chatlog evidence used to prosecute paedophiles. Four different text lengths (500, 1000, 1500 or 2500 words) were used for modelling in order to investigate how system performance is influenced by sample size. Strength of authorship attribution evidence (or LR) is estimated with the Multivariate Kernel Density formula. Performance was primarily assessed with the log-likelihood ratio cost (Cllr), but assessments of other metrics, e.g. credible interval and equal error rate, are also given. Taking into account the small number of features used for modelling authorship attribution, results are promising. Even with a small sample size of 500 words, the system achieved a discrimination accuracy of c. 76% (Cllr = 0.68258). With a sample size of 2500 words, a discrimination accuracy of c. 94% (Cllr = 0.21707) was obtained. Larger sample size is beneficial to FTC, resulting in an improvement in discriminability, an increase in the magnitude of the consistent-with-fact LRs and a decrease in the magnitude of the contrary-to-fact LRs. It was found that ‘Average character number per word token’, ‘Punctuation character ratio’, and vocabulary richness features are robust features, which work well regardless of sample sizes. The results demonstrate the efficacy of the LR framework for analysing authorship attribution evidence.
Author: Shunichi Ishihara
Abbasi, A. and Chen, H. C. (2005) Applying authorship analysis to extremist-group web
forum messages. IEEE Intelligent Systems 20(5): 67–75. https://doi.ieeecomputersociety.
Abbasi, A. and Chen, H. (2008) Writeprints: a stylometric approach to identity-level
identification and similarity detection in cyberspace. ACM Transactions on Information
Systems 26(2): 1–29. https://doi.acm.org/10.1145/1344411.1344413
Aitken, C. G. G. (1995) Statistics and the Evaluation of Evidence for Forensic Scientists.
Chichester: John Wiley.
Aitken, C. G. G. and Lucy, D. (2004) Evaluation of trace evidence in the form of multivariate
data. Journal of the Royal Statistical Society Series C-Applied Statistics 53: 109–122.
Aitken, C. G. G. and Stoney, D. A. (1991) The Use of Statistics in Forensic Science. New
York: Ellis Horwood.
Aitken, C. G. G. and Taroni, F. (2004) Statistics and the Evaluation of Evidence for Forensic
Scientists. Chichester: John Wiley & Sons. https://doi.org/10.1002/0470011238
Baayen, R. H. (2001) Word Frequency Distributions. Dordrecht: Kluwer Academic Publisher.
Balding, D. J. and Steele, C. D. (2015) Weight-of-Evidence for Forensic DNA Profiles.
Chichester: John Wiley & Sons. https://doi.org/10.1002/9781118814512
Bolstad, W. M. (2013) Introduction to Bayesian Statistics. New Jersey: John Wiley & Sons.
Bozza, S., Taroni, F., Marquis, R. and Schmittbuhl, M. (2008) Probabilistic evaluation of
handwriting evidence: likelihood ratio for authorship. Journal of the Royal Statistical
Society: Series C (Applied Statistics) 57(3): 329–341. https://dx.doi.org/10.1111/j.1467-
Brümmer, N. and du Preez, J. (2006) Application-independent evaluation of speaker
detection. Computer Speech and Language 20(2–3): 230–275. https://dx.doi.
Brümmer, N. and Swart, A. (2014) Bayesian calibration for forensic evidence reporting.
Proceedings of Interspeech 2014: 388–392.
Chaski, C. E. (2001) Empirical evaluations of language-based author identification techniques.
Forensic Linguistics 8: 1–65.
Choi, H., Nagar, A. and Jain, A. K. (2011) On the evidential value of fingerprints.
Proceedings of the 2011 International Joint Conference on Biometrics: 1–8. https://doi.
Curran, J. M. (2005) An introduction to Bayesian credible intervals for sampling error in
DNA profiles. Law, Probability and Risk 4(1–2): 115–126. https://dx.doi.org/10.1093/
De Vel, O., Anderson, A., Corney, M. and Mohay, G. (2001) Mining e-mail content
for author identification forensics. ACM Sigmod Record 30(4): 55–64. https://dx.doi.
El, S. E. M. and Kassou, I. (2014) Authorship analysis studies: a survey. International
Journal of Computer Applications 86(12): 22–29. https://dx.doi.org/10.5120/15038-
Evett, I. W. (1998) Towards a uniform framework for reporting opinions in forensic
science casework. Science & Justice 38(3): 198–202. https://dx.doi.org/10.1016/S1355-
Foreman, L., Champod, C., Evett, I., Lambert, J. and Pope, S. (2003) Interpreting DNA
evidence: a review. International Statistical Review 71(3): 473–495. https://dx.doi.
Frost, D. (2013). Likelihood ratio-based forensic voice comparison on L2 speakers: a case
of Hong Kong male production of English vowels. Unpublished honours thesis, The
Australian National University, Canberra.
Gonzalez-Rodriguez, J., Rose, P., Ramos-Castro, D., Toledano, D. T. and Ortega-Garcia,
J. (2007) Emulating DNA: rigorous quantification of evidential weight in transparent
and testable forensic speaker recognition. IEEE Transactions on Audio Speech and
Language Processing 15(7): 2104–2115. https://dx.doi.org/10.1109/tasl.2007.902747
Grant, T. (2007) Quantifying evidence in forensic authorship analysis. International Journal
of Speech Language and the Law 14(1): 1–25. https://dx.doi.org/10.1558/ijsll.v14i1.1
Grant, T. (2010) Text messaging forensics: txt 4n6: idiolect free authorship analysis? In A.
J. Malcolm Coulthard (ed.), The Routledge Handbook of Forensic Linguistics 508–522:
Hepler, A. B., Saunders, C. P., Davis, L. J. and Buscaglia, J. (2012) Score-based likelihood
ratios for handwriting evidence. Forensic Science International 219(1–3): 129–140.
Holmes, D. I. (1994) Authorship attribution. Computers and the Humanities 28(2):
Honoré, A. (1979) Some simple measures of richness of vocabulary. Association for Literary
and Linguistic Computing Bulletin 7(2): 172–177.
Iqbal, F., Binsalleeh, H., Fung, B. and Debbabi, M. (2010) Mining writeprints from
anonymous e-mails for forensic investigation. Digital Investigation 7(1): 56–64. https://
Ishihara, S. (2012) Probabilistic evaluation of SMS messages as forensic evidence: likelihood
ratio based approach with lexical features. International Journal of Digital Crime
and Forensics 4(3): 47–57. https://dx.doi.org/10.4018/jdcf.2012070104
Ishihara, S. (2014a) A fused forensic text comparison system using lexical features, word
and character N-grams. Proceedings of the 2014 International Conference on Advances
in Computing, Communications and Informatics: 2762–2768.
Ishihara, S. (2014b) A likelihood ratio-based evaluation of strength of authorship
attribution evidence in SMS messages using N-grams. International Journal of Speech
Language and the Law 21(1): 23–50. https://dx.doi.org/10.1558/ijsll.v21i1.23
Kinoshita, Y. and Ishihara, S. (2014) Background population: how does it affect LR-based
forensic voice comparison? International Journal of Speech Language and the Law
21(2): 191–224. https://dx.doi.org/10.1558/ijsll.v21i2.191
Layton, R., Watters, P. and Dazeley, R. (2010) Authorship attribution for twitter in 140
characters or less. Proceedings of the 2nd Cybercrime and Trustworthy Computing
Workshop (CTC): 1–8. https://doi.org/10.1109/ctc.2010.17
McMenamin, G. R. (2001) Style markers in authorship studies. International Journal of
Speech Language and the Law 8(2): 93–97. https://dx.doi.org/10.1558/sll.2001.8.2.93
McMenamin, G. R. (2002) Forensic Linguistics: Advances in Forensic Stylistics. Boca
Raton, FL: CRC Press. https://doi.org/10.1201/9781420041170
Mendenhall, T. C. (1887) The characteristic curves of composition. Science 9(214S):
Mohan, A., Baggili, I. M. and Rogers, M. K. (2010) Authorship attribution of SMS
messages using an N-grams approach. Unpublished paper, available from https://www.
Morrison, G. S. (2009) Forensic voice comparison and the paradigm shift. Science &
Justice 49(4): 298–308. https://dx.doi.org/10.1016/j.scijus.2009.09.002
Morrison, G. S. (2011) Measuring the validity and reliability of forensic likelihood-ratio
systems. Science & Justice 51(3): 91–98. https://dx.doi.org/10.1016/j.scijus.2011.03.002
Morrison, G. S. (2013) Tutorial on logistic-regression calibration and fusion: converting
a score to a likelihood ratio. Australian Journal of Forensic Sciences 45(2): 173-197.
Morrison, G. S. (2016) Special issue on measuring and reporting the precision of forensic
likelihood ratios: introduction to the debate. Science and Justice 56(5): 371–373.
Morrison, G. S., Zhang, C. L. and Rose, P. (2011) An empirical estimate of the precision
of likelihood ratios from a forensic-voice-comparison system. Forensic Science International
208(1–3): 59–65. https://dx.doi.org/10.1016/j.forsciint.2010.11.001
Mosteller, F. and Wallace, D. (1964) Inference and Disputed Authorship: The Federalist.
Reading, MA: Addison-Wesley.
Nair, B., Alzqhoul, E. and Guillemin, B. J. (2014) Determination of likelihood ratios for
forensic voice comparison using principal component analysis. International Journal of
Speech Language and the Law 21(1): 83–112. https://dx.doi.org/10.1558/ijsll.v21i1.83
Neumann, C., Champod, C., Puch-Solis, R., Egli, N., Anthonioz, A. and Bromage-Griffiths,
A. (2007) Computation of likelihood ratios in fingerprint identification for
configurations of any number of minutiae. Journal of Forensic Sciences 52(1): 54–64.
Oakes, M. P. (1998) Statistics for Corpus Linguistics. Edinburgh: Edinburgh University
Robertson, B. and Vignaux, G. A. (1995) Interpreting Evidence: Evaluating Forensic Science
in the Courtroom. Chichester: John Wiley & Sons.
Rose, P. (2013) More is better: likelihood ratio-based forensic voice comparison with
vocalic segmental cepstra frontends. International Journal of Speech Language and the
Law 20(1): 77–116. https://dx.doi.org/10.1558/ijsll.v20i1.77
Saks, M. J. and Koehler, J. J. (2005) The coming paradigm shift in forensic identification
science. Science 309(5736): 892–895. https://dx.doi.org/10.1126/science.1111565
Silverman, B. W. (1986) Density Estimation for Statistics and Data Analysis. London:
Chapman and Hall.
Solan, L. M. and Tiersma, P. M. (2004) Author identification in American courts. Applied
Linguistics 25(4): 448–465. https://dx.doi.org/10.1093/applin/25.4.448
Stamatatos, E. (2009) A survey of modern authorship attribution methods. Journal of the
American Society for Information Science and Technology 60(3): 538–556. https://dx.doi.
Stamatatos, E., Fakotakis, N. and Kokkinakis, G. (2001) Computer-based authorship
attribution without lexical measures. Computers and the Humanities 35(2): 193–214.
Tamboli, M. S. and Prasad, R. S. (2013) Authorship analysis and identification techniques:
a review. International Journal of Computer Applications 77(16): 11–15. https://
Tweedie, F. J. and Baayen, R. H. (1998) How variable may a constant be? Measures of
lexical richness in perspective. Computers and the Humanities 32(5): 323–352. https://
Williams, C. B. (1970) Style and Vocabulary: Numerical Studies. New York: Hafner.
Yule, G. U. (1944) The Statistical Study of Literary Vocabulary. New York: Cambridge
Zadora, G. (2009) Evaluation of evidence value of glass fragments by likelihood ratio and
Bayesian Network approaches. Analytica Chimica Acta 642(1): 279–290. https://dx.doi.
Zhang, S. M. (2016) Authorship attribution and feature testing for short Chinese emails.
International Journal of Speech Language and the Law 23(1): 71–97. https://dx.doi.
Zheng, R., Li, J. X., Chen, H. C. and Huang, Z. (2006) A framework for authorship identification
of online messages: writing-style features and classification techniques. Journal
of the American Society for Information Science and Technology 57(3): 378–393. https://
Zipf, G. K. (1932) Selected Studies of the Principle of Relative Frequency in Language.
Cambridge, MA: Harvard University Press. https://doi.org/10.4159/harvard.