Item Details

What do we get from extracting collocations? Linguistic analysis of automatically obtained Russian MWEs

Issue: Vol 1 No. 2 (2014)

Journal: Journal of Research Design and Statistics in Linguistics and Communication Science

Subject Areas: Linguistics

DOI: 10.1558/jrds.v1i2.26946


This paper applies linguistic analysis to the results from the automatic extraction of multiword expressions in order to understand whether they are reliable from the theoretical point of view. The nature of the extracted units is discussed and illustrated with examples of Russian prepositions: first classified according to I. Mel’čuk’s theory (1995) and then re-analysed using the notion of constructions. The corpus-driven approach reveals the shortcomings in the prevalent way of describing multiword expressions in terms of strict classes, and the present paper can be thought of as providing a theoretical basis for the development of a new approach to their description.

Author: Daria Kormacheva

View Original Web Page

References :

Baker, P., Hardie, A. and McEnery, T. (2006). A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press.

Calzolari, N., Fillmore, C. J., Grishman, R., Ide, N., Lenci, A., MacLeod, C. and Zampolli, A. (2002). Towards best practice for multiword expressions in computational lexicons. Proceedings of the Third International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, Canary Islands – Spain. European Language Resources Association (ELRA).

Church, K. W. and Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics 16 (1): 22–29.

Church, K. W., Gale, W., Hanks, P. and Hindle, D. (1991). Using statistics in lexical analysis. In U. Zernik (ed.) Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, 115–164. Hillsdale, NJ: Lawrence Erlbaum.

Čermák, F. (2001). Substance of idioms: Perennial problems, lack of data or theory? International Journal of Lexicography 14 (1): 1–20.

Daudaravicius, V. (2010). Automatic identification of lexical units. Informatica (03505596) 34 (1).

Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational linguistics 19 (1): 61–74.

Gries, S. Th. (2013). 50-something years of work on collocations: What is or should be next. International Journal of Corpus Linguistics 18 (1): 137–166.

Fillmore, Ch. J., Kay, P. and O'Connor, M. C. (1988). Regularity and idiomaticity in grammatical constructions: The case of let alone. Language 64 (3): 501–538.

Frank, S. L., Bod, R. and Christiansen, M. H. (2012). How hierarchical is language use? Proceedings of the Royal Society B: Biological Sciences 279 (1747): 4522–4531.

Iordanskaja, L. and Mel’čuk, I. (2007). Smysl i sochetaemost’ v slovare. Moskva: Jazyki slavjanskih kul’tur. [In Russian]

Jackendoff, R. (1997). The Architecture of the Language Faculty. No. 28. Cambridge, MS: MIT Press.

Kormacheva, D., Pivovarova, L. and Kopotev, M. (2014). Automatic collocation extraction and classification of automatically obtained bigrams. Workshop on Computational, Cognitive, and Linguistic Approaches to the Analysis of Complex Words and Collocations (CCLCC 2014): 27–33.

Levontina, I. (1995). Slovarnye stat’i predlogov DLJA i RADI: k probleme leksikograficheskoj interpretacii mnogoznachnosti u služebnyh slov. Teoreticheskaja lingvistika i leksikografija: opyty sistemnogo opisanija leksiki. [In Russian]

Manning, Ch. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.

Mel’cuk, I. and Žolkovsky, A. (1984). Explanatory Combinatorial Dictionary of Modern Russian. Wiener Slawistischer Almanach. Inst. für Slawistik d. Univ. Wien.

Mel’čuk, I. (1995). Phrasemes in language and phraseology in linguistics. Idioms: Structural and Psychological Perspectives: 167–232.

Mel’čuk, I. (1998). Collocations and lexical functions. In A. P. Cowie (ed.) Phraseology. Theory, Analysis, and Applications, 23–53. Oxford: Clarendon Press.

Mel’čuk, I. (2006). Explanatory combinatorial dictionary. Open Problems in Linguistics and Lexicography: 225–355.

Moon, R. (1998). Fixed Expressions and Idioms in English: A Corpus-based Approach. Oxford: Clarendon Press.

Nunberg, G., Sag, I. A. and Wasow, T. (1994). Idioms. Language 70 (3): 491–538.

Sag, I. A., Baldwin, T., Bond, F., Copestake, A. and Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. Computational Linguistics and Intelligent Text Processing, 1–15. Berlin and Heidelberg: Springer.

Sinclair, J. (1991). Corpus, Concordance, Collocation. Vol. 1. Oxford: Oxford University Press.

Stubbs, M. (2001). Words and Phrases: Corpus Studies of Lexical Semantics. Oxford: Blackwell Publishers.

Swinney, D. A. and Cutler, A. (1979). The access and processing of idiomatic expressions. Journal of Verbal Learning and Verbal Behavior 18 (5): 523–534.

Rogožnikova, R. (2003). Tolkovyj slovar’ sochetanij, ekvivalentnih slovu. Moskva: Astrel’. [In Russian]