Item Details

Possible measures of asymmetry and redundancy in collocations

Issue: Vol 1 No. 2 (2014)

Journal: Journal of Research Design and Statistics in Linguistics and Communication Science

Subject Areas: Linguistics

DOI: 10.1558/jrds.v1i2.20304


It has long been recognized that developing measures of the internal structure of collocations is an important goal (Sinclair, 1991). Recently, Gries’ (2013) presented a measure that captures the asymmetric nature of conditional probabilities in collocations. This paper intends to contribute to the discussion by introducing measures of asymmetry and redundancy that may meet the needs of some researchers. Two asymmetry measures are described. The first captures only frequency asymmetry while the second is an asymmetric version of the mutual information measure. A measure of semantic redundancy is also described here. This measure takes a higher value when the fact that two words co occur contains more information than the uncertainty introduced by the occurrence of the individual words.

Author: Robert Nelson

View Original Web Page

References :

Bird, S., Klein, E. and Loper, E. (2009). Natural Language Processing in Python. Sebastopol, CA: O’Reilly Media.

Bybee, J. L. (2010). Language, Usage and Cognition (Vol. 98). Cambridge: Cambridge University Press.

Dirven, R. and Verspoor, M. (Eds) (2004). Cognitive Exploration of Language and Linguistics (Vol. 1). New York: John Benjamins Publishing.

Ellis, N. C. (2006). Language acquisition as rational contingency learning. Applied Linguistics, 27 (1): 1–24.

Granger, S. (2009). The contribution of learner corpora to second language acquisition and foreign language teaching: A critical evaluation. In Aijmer, Karin (Ed.), Corpora and Language Teaching, 13–332. New York: John Benjamins.

Gries, S. T. (2010). Useful statistics for corpus linguistics. In Aquilino Sánchez and Moisés Almela (Eds) A Mosaic of Corpus Linguistics: Selected Approaches, 269–291. Frankfurt am Main: Peter Lang.

Gries, S. T. (2013). 50-something years of work on collocations: What is or should be next. International Journal of Corpus Linguistics, 18 (1): 137–166.

Justeson, J. S. and Katz, S. M. (1995). Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1 (1): 9–27.

Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22 (1): 79–86.

Liu, D. (2013). Salience and construal in the use of synonymy: A study of two sets of near-synonymous nouns. Cognitive Linguistics, 24 (1): 67–113.

Michelbacher, L., Evert, S. and Schütze, H. (2011). Asymmetry in corpus-derived and human word associations. Corpus Linguistics and Linguistic Theory, 7 (2): 245–276.

Ramscar, M., Dye, M. and McCauley, S. M. (2013). Error and Expectation in language learning: The curious absence of mouses in adult speech. Language, 89 (4): 760–793.

Renouf, A. and Banerjee, J. (2007). Lexical repulsion between sense-related pairs. International Journal of Corpus Linguistics, 12 (3): 415–444.

Rescorla, R. A. and Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black and W. F. Prokasky Classical conditioning II: Current Research and Theory, 64–99. New York: Appleton-Century-Crofts.

Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27 (3): 379–423.

Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press.

Spivey, M. J. and Richardson, D. C. (2008). Language embedded in the environment. In P. Robbins and M. Aydede (Eds) The Cambridge Handbook of Situated Cognition, 382-400. Cambridge: Cambridge University Press.

Theil, H. (1970). On the estimation of relationships involving qualitative variables. American Journal of Sociology, 76 (1): 341–357.

Watanabe S (1960). Information theoretical analysis of multivariate correlation, IBM Journal of Research and Development, 4 (1): 66–82.

Wolfram, S. (2014). Launching Mathematica 10 – with 700+ New Functions and a Crazy Amount of R&D.