Predicting American Movie Genre Categories from Linguistic Characteristics
Issue: Vol 2 No. 1 (2015)
Subject Areas: Linguistics
The goal of the current study is to explore the possibility of correctly classifying movie transcripts into movie genres by means of a Discriminant Function Analysis (DFA) based on a previous comprehensive multidimensional (MD) analysis of American cinema. MD analysis is a framework for describing the salient characteristics of text varieties by means of multivariate statistical techniques, notably factor analysis. Traditionally, MD analysis has been restricted to the study of register variation, being largely ignored in text classification research. In the MD analysis reported, a large genre-diversified movie corpus was tagged for lexico-grammatical features with the Biber tagger and the resulting factor scores were used as input for the DFA. The results showed that particular movie genres could be successfully predicted from the MD analysis, thereby lending credence to movie genre distinctions, while at the same time stressing the robustness of MD factor scores as reliable predictors of genre distinctions.
Author: Tony Berber Sardinha, Marcia Veirano Pinto
Al-Surmi, M. (2012) Authenticity and TV shows: A multi-dimensional analysis perspective. Tesol Quarterly 46 (4): 671–694.
Altman, R. (2000) Film/Genre. London: Palgrave Macmillan.
Berber Sardinha, T., Kauffman, C., and Mayer-Acunzo, C. (2014). A Multi-dimensional analysis of register variation in Brazilian Portuguese. Corpora vol. 9, no 2: 239–271. http://dx.doi.org/10.3366/cor.2014.0059
Berber Sardinha, T. and Veirano Pinto, M. (2014, November) What’s on TV? Looking at American television corpus linguistics style. Paper presented at the XVI Encontro de Alunos de Graduação em Inglês como Língua Estrangeira [XVI Meeting of Undergraduate Students in English as a Foreign Language], São Paulo, SP.
Bergan, R. (2006) Eyewitness Companions: Film. New York: DK Publishing.
Bértoli-Dutra, P (2014) Muti-dimensional analysis of British and American pop songs. In T. Berber Sardinha and M. Veirano Pinto (Eds) Multi-Dimensional Analysis, 25 years on: A Tribute to Douglas Biber, 274–310. Amsterdam/Philadelphia, PA: John Benjamins.
Biber, D. (1986) Spoken and written textual dimension in English: Resolving the contradictory findings. Language 62: 384–414. http://dx.doi.org/10.2307/414678
Biber, D. (1988) Variation across Speech and Writing. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511621024
Biber, D. (1993) Representativeness in corpus design, Literary and Linguistic Computing, 8 (4): 243–257. http://dx.doi.org/10.1093/llc/8.4.243
Biber, D. (1995) Dimensions of Register Variation: A Cross-Linguistic Comparison. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511519871
Biber, D. (2004) Historical patterns for the grammatical marking of stance: A cross register comparison, Journal of Historical Pragmatics, 5 (1): 107–136. http://dx.doi.org/10.1075/jhp.5.1.06bib
Biber, D. (2006) University Language: A Corpus Based Study of Spoken and Written Registers. Amsterdam/Philadelphia, PA: John Benjamins. http://dx.doi.org/10.1075/scl.23
Biber, D. (2009) A corpus-driven approach to formulaic language in English: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics 14 (3): 275–311. http://dx.doi.org/10.1075/ijcl.14.3.08bib
Biber, D. and Conrad, S. (2009) Register, Genre, and Style. Cambridge: Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511814358
Biber, D. and Tracy-Ventura, N. (2007) Dimensions of register variation in Spanish. In G. Parodi (Ed.) Working with Spanish Corpora, 54–89. New York: Continuum.
Bordwell, D. (2005) O cinema clássico hollywoodiano: Normas e princípios narrativos [Classical Hollywood cinema: Narrational principles and procedures]. In F. P. Ramos (Ed.). Teoria contemporânea do cinema: Documentário e narrativa ficcional [Contemporary cinema theory: documentaries and fictional narrative], 277–302. São Paulo: Editora Senac.
Cantos Gómez, P. (2013) Statistical Methods in Language and Linguistic Research. Bristol: Equinox.
Crossley, S. A., Allen, L. K. and McNamara, D. S. (2014) A multi-dimensional analyis of essay writing: What linguistic features tell us about situational parameters and the effects of language functions on judgments of quality. In T. Berber Sardinha and M. Veirano Pinto (Eds), Multi-Dimensional Analysis 25 Years on: A Tribute to Douglas Biber, 109–147. Amsterdam/Philadelphia, PA: John Benjamins.
Forchini, P. (2012) Movie Language Revisited: Evidence from Multi-dimensional Analysis and Corpora. Bern: Peter Lang. http://dx.doi.org/10.3726/978-3-0351-0325-0
Halliday, M. A. K. and Matthiessen, C. M. (2004) An Introduction to Functional Grammar. London: Hodder Arnold.
Hunter, R. (2011) A comédia muda. [Funny Men]. In P. Kemp (Ed.). Tudo sobre cinema [Cinema: The whole story], 62–67. Rio de Janeiro: Sextante.
Kauffmann, C. H. (2005) O corpus do jornal: Variação linguística, gênero e dimensões da imprensa diária escrita [A newspaper corpus: Dimensions of variation in the daily written press]. Unpublished master dissertation, São Paulo Catholic University, São Paulo, Brazil. Retrieved on 10 February 2009 from
King, C. (2011) Capa e Espada [The Swashbuckler]. In P. Kemp (Ed.). Tudo sobre cinema [Cinema: The whole story], 48–53. Rio de Janeiro: Sextante.
Kozloff, S. (2000) Overhearing Film Dialogue. Berkeley, CA: University of California Press.
Jullier, L. and Marie, M. (2009) Lendo as imagens do cinema. São Paulo: Editora Senac.
Macnab, G. (2011) Musicais. In: P. Kemp (Ed.). Tudo sobre cinema, 76–80. Rio de Janeiro: Sextante.
Neal, S. (1980) Genre. London: British Film Institute.
Ramos Filho, E. (2014) Artigos acadêmicos em língua inglesa: uma abordagem multidimensional [Academic articles in English: A multidimensional approach]. Unpublished PhD thesis, São Paulo Catholic University, São Paulo, Brazil.
Schatz, T. (1981) Hollywood genres. Boston: McGraw-Hill.
Schneider, S. J. (2008) (Ed). 1001 filmes para ver antes de morrer. Rio de Janeiro: Sextante.
Souza, R.C. (2014) Dimensions of variation in Time magazine. In T. Berber Sardinha and M. Veirano Pinto (Eds) Multi-Dimensional Analysis, 25 years on: A Tribute to Douglas Biber, 311–343. Amsterdam/Philadelphia, PA: John Benjamins.
Veirano Pinto, M. (2013) A linguagem dos filmes norte-americanos ao longo dos anos: Uma abordagem multidimensional [The language of North American movies over the years: A multidimensional study]. (Unpublished doctoral dissertation), Catholic University of São Paulo, São Paulo, Brazil.
Veirano Pinto, M. (2014) Dimensions of variation in North American Movies. In T. Berber Sardinha, and M. Veirano Pinto, Multi-dimensional analysis, 25 years on: A tribute to Douglas Biber 109–148. Amsterdam/Philadelphia: John Benjamins.
Zuppardo, M.C. (2014) Dimensões de variação em manuais aeronáuticos: Um estudo baseado na análise multidimensional [Dimensions of variation in aviation manuals: A multidimensional approach]. Unpublished master dissertation, São Paulo Catholic University, São Paulo, Brazil. Retrieved on 15 November 2014 from