Item Details

Measuring and interpreting lexical dispersion in corpus linguistics

Issue: Vol 3 No. 2 (2016)

Journal: Journal of Research Design and Statistics in Linguistics and Communication Science

Subject Areas: Linguistics

DOI: 10.1558/jrds.33066

Abstract:

The frequency of occurrence and the dispersion of a word are measures of a word’s importance in a collection of texts or a corpus. In particular, lexical dispersion is a statistic in corpus linguistics that measures a word’s homogeneity across the parts of a corpus. There are different ways to measure dispersion and the authors compare three approaches. Both formulaic and interpretative issues pertaining to dispersion are discussed in terms of the frequency of a word in the corpus parts and the variability of a word across the corpus. A simulation study and an application involving words from the British National Corpus indicate that the index constructed from the difference between every possible pair of frequencies of the word in the parts of a corpus is preferred.

Author: Brent Burch, Jesse Egbert, Douglas Biber

View Original Web Page

References :

Biber, D., Reppen, R., Schnur, E., & Ghanem, R. (2016). On the (non) utility of Juilland’s D to measure lexical dispersion in large corpora. International Journal of Corpus Linguistics, 21(4), 439–464.


Brezina, V. and Gablasova, D. (2015). Is there a core general vocabulary?: Introducing the new general service list. Applied Linguistics, 36 (1), 1–22. https://doi.org/10.1093/applin/amt018


Carroll, J. B. (1970). An Alternative to Juilland’s Usage Coefficient for Lexical Frequencies. ETS Research Bulletin Series, 1970: i–15. https://doi.org/10.1002/j.2333-8504.1970.tb00778.x


Carroll, J. B. (1970). An alternative to Juilland’s usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behavior, 3 (2), 61–65.


Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34 (2), 213–238. https://doi.org/10.2307/3587951


Davies, M. and Gardner, D. (2010). A Frequency Dictionary of Contemporary American English: Word Sketches, Collocates, and Thematic Lists. London: Routledge.


Gardner, D. and Davies, M. (2013). A new academic vocabulary list. Applied Linguistics Advanced Access: https://doi.org/10.1093/applin/amt015. First published online: 2 August 2013.


Gries, St. Th. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13 (4), 403–437. https://doi.org/10.1075/ijcl.13.4.02gri


Gries, St. Th. (2010). Dispersions and adjusted frequencies in corpora: Further explorations. In St. Th. Gries, S. Wulff, and M. Davies (Eds), Corpus Linguistic Applications: Current Studies, New Directions, 197–212. Amsterdam: Rodopi. https://doi.org/10.1163/9789042028012_014


Gries, St. Th. and Lijffijt, J. (2012). Correction to ‘Dispersions and adjusted frequencies in corpora’. International Journal of Corpus Linguistics, 17 (1), 147–149. https://doi.org/10.1075/ijcl.17.1.08lij


Juilland, A. G. and Chang-Rodriguez, E. (1964). Frequency Dictionary of Spanish Words. The Hague: Mouton & Co.


Juilland, A. G., Brodin, D. R. and Davidovitch, C. (1970). Frequency Dictionary of French Words. The Hague: Mouton de Gruyter.


Leech, G., Rayson, P., and Wilson, A. (2001). Word Frequencies in Written and Spoken English: Based on the British National Corpus. London: Longman.


Stuart, A. and Ord, K. (1994). Kendall’s Advanced Theory of Statistics, Volume 1: Distribution Theory, sixth edition. London: Arnold.


Wilcox, A. R. (1967). Indices of Qualitative Variation, Oak Ridge, TN: Oak Ridge National Laboratory, ORNL-TM-1919, http://web.ornl.gov/info/reports/1967/3445605133753.pdf.


Wilcox, A. R. (1973). Indices of qualitative variation and political measurement. The Western Political Quarterly, 26 (2), 325–343. https://doi.org/10.2307/446831