Item Details

Polysemy and word frequency: A replication

Issue: Vol 4 No. 2 (2017)

Journal: Journal of Research Design and Statistics in Linguistics and Communication Science

Subject Areas: Linguistics

DOI: 10.1558/jrds.33751


One piece of evidence adduced by George Kingsley Zipf for his eponymous law (Zipf, 1935) and its explanation of the principle of least effort (Zipf, 1949) is the hypothesis that a word's polysemy is proportional to the square root of its frequency (Levelt, 2013). Pawley (2006) following Zipf, also proposes that 'there is a strong general correlation between frequency and the extent of polysemy'. This paper replicates Zipf 's approach but with data drawn from different sources to those available to Zipf, namely, for word frequency, the Kilgarriff most frequent word list drawn from the BNC (Kilgarriff, 1995) and, as a measure of polysemy, the WordNet data for the polysemy of the words in Kilgarriff's list. It also takes note of the syntactic category of lexemes. More advanced statistical modelling is used. Zipf 's observations are confirmed with some provisos. Their utility is examined. Explanations for this relationship remain to be established.

Author: Koenraad Kuiper, Robert Fromont, Daniel Gerhard

View Original Web Page

References :

Amir, Y. and Sharon, I. (1990). Replication research: A ‘must’ for the scientific advancement of psychology. Journal of Social Behavior and Personality 5 (4): 51–69.

Baayen, R. H., Shaoul, C., Willits, J., and Ramscar, M. (2015). Comprehension with­out segmentation: A proof of concept with naive discrimination learning. Language, Cognition, and Neuroscience 31 (1): 106–128.

Baker, M. C. (2003). Lexical Categories: Verbs, Nouns and Adjectives. Cambridge: Cambridge University Press.

Barque, L. and Chaumartin, F.-R. (2006). Regular polysemy in WordNet. LDV-Forum 21 (1): 1–14.

Chaplot, D. S., Bhattacharyya, P., and Paranjape, A. (2015). Unsupervised word sense disambiguation using Markov random field and dependency parser Paper presented at the 29th AAAI Conference on Artificial Intelligence (AAAI-15), Austin, Texas.

Crossley, S., Salsbury, T., and McNamara, D. (2010). The development of polysemy and frequency use in English second language speakers. Language Learning: A Journal of Research in Language Studies 60 (3): 573–605.

Everaert, M. and Bolhuis, J. (2017). The biology of language. Neuroscience and Biobehavioral Reviews 81: 99–102.

Grimshaw, J. (1990). Argument Structure. Cambridge, MA: MIT Press.

Hanks, P. (2013). Lexical Analysis: Norms and Exploitations. Cambridge, MA: MIT Press.

Hernández-Fernández, A., Casas, B., Ferrer-i-Cancho, R., and Baixeries, J. (2016). Testing the robustness of laws of polysemy and brevity versus frequency. In P. Král and C. Martín-Vide (Eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science, vol 9918. Champaign, IL: Springer.

Katz, J. J. and Fodor, J. A. (1963). The structure of semantic theory Language 39 (2): 170–210.

Kearns, K. (1998). Light verbs in English. Linguistics 34: 53–72.

Kilgarriff, A. (1995). BNC database and word frequency lists. Retrieved on 24 February 2014 from

Klepousniotou, E. (2002). The processing of lexical ambiguity: Homonymy and polysemy in the mental lexicon. Brain and Language 81: 205–223.

Levelt, W. J. M. (1989). Speaking: From Intention to Articulation. Cambridge, MA: MIT Press.

Levelt, W. J. M. (2013). A History of Psycholinguistics: The pre-Chomskian Era. Oxford: Oxford University Press.

Levelt, W. J. M., Roelofs, A., and Meyer, A. S. (1999). A theory of lexical access in speech production. Behavioral & Brain Sciences 22 (1): 1–75.

McCullagh, P. and Nelder, J. A. (1989). Generalized linear Models (2nd ed.). Boca Raton, FL: Chapman & Hall.

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. (1992). WordNet: A lexical database for English. Commun. ACM 38: 39-41.

Nation, I. S. P. (2008). Teaching Vocabulary: Strategies and Techniques. Boston, MA: Cengage Learning.

Pawley, A. (2006). Where have all the verbs gone? Remarks on the organisation of language with small, closed verb classes. Paper presented at the 11th Biennial Rice University Linguistics Symposium. Austin, Texas.

R Core Team. (2016). R: A language and environment for statistical computing. Retrieved on 28 August 2015from

Simons, D. J. (2014). The value of direct replication. Perspectives on Psychological Science 9 (1): 76–80.

Taylor, J. R. (2003). Polysemy’s paradoxes. Language Sciences 25 (6): 637–655.

Taylor, J. R. (2012). The Mental Corpus: How Language is Represented in the Mind. Oxford: Oxford University Press.

Tengi, R. I. (1998). Design and implementation of the WordNet lexical database and searching software. In: C. Fellbaum (Ed.) WordNet: An Electronic Lexical Database, 105–127. Cambridge, MA: MIT Press.

Wittgenstein, L. (1965). Philosophical Investigations. New York: The Macmillan Company.

Yang, C. (2013). Ontogeny and phylogeny of language. Proceedings of the National Academy of Sciences 110 (16): 6324–6327.

Yang, C. (2016). The Price of Linguistic Productivity: How Children learn to break the Rules of Language. Cambridge, MA: MIT Press.

Zipf, G. K. (1949). Human Behaviour and the Principle of Least Effort. Cambridge, MA: Addison-Wesley.