Item Details

Mathematical modeling of the frequencies of words of different lengths in written Hindi language corpora and examination of the role of texts’ stylistic factor in model’s parameters

Issue: Vol 4 No. 1 (2017)

Journal: Journal of Research Design and Statistics in Linguistics and Communication Science

Subject Areas: Linguistics

DOI: 10.1558/jrds.33107

Abstract:

In quantitative research related to the areas of language and linguistics, first the linguistic features are specified and counted, and then statistical models are constructed in order to explicate these observed facts. In the present paper, an attempt has been made to represent the pattern of occurrence of words of different lengths in various corpora of Hindi language in the form of a mathematical model and an inspection has been made to check the dependency of the parameters of investigated model for a particular text in the type of text by selection of texts under categories media/essay and creative writing; or in other words we have attempted to test the applications of the parameters of the model in text classification process.

Author: Hemlata Pande, Hoshiyar S. Dhami

View Original Web Page

References :

Abbe, S. (2000). Word length distribution in Arabic letters. Journal of Quantitative Linguistics, 7 (2), 121–127. https://doi.org/10.1076/0929-6174(200008)07:02;1-Z;FT121


Alekseev, P. M. (1998). Graphemic and syllabic length of words in text and vocabulary. Journal of Quantitative Linguistics, 5 (1–2), 5–12. https://doi.org/10.1080/09296179808590107


Antić, G., Kelih, E., and Grzybek, P. (2006). Zero syllable words in determining word length. In P. Grzybek (Ed.) Contributions to the Science of Text and Language: Word Length Studies and Related Issues, 117–156. Springer, Netherlands. https://doi.org/10.1007/1-4020-4068-7_4


Antić, G.,  Stadlober, E.,  Grzybek, P., and  Kelih, E. (2006). Word Length and Frequency Distributions in Different Text Genres. From Data and Information Analysis to Know­ledge Engineering, 310–317. Springer, Berlin Heidelberg. https://doi.org/10.1007/3-540-31314-1_37


Aoyama, H and Constable, J. (1999). Word length frequency and distribution in English: Part I. Prose. Literary and Linguistic Computing 14 (3), 339–358. https://doi.org/10.1093/llc/14.3.339


Bharati, A., Rao K, P., Sangal R. and Bendre, S. M. (2002). Basic statistical analysis of corpus and cross comparison among corpora. In Proceedings of 2002 International Confer­ence on Natural Language Processing, Mumbai, India. Available at: http://ltrc.iiit.ac.in/MachineTrans/publications/technicalReports/tr022/camera-187.pdf


Barbaro, S. (2000). Word length distribution in Italian letters by Pier Paolo Pasolini, Journal of Quantitative Linguistics 7 (2), 115–120. https://doi.org/10.1076/0929-6174(200008)07:02;1-Z;FT115


Best, K.-H. (1996). Word length in Old Icelandic songs and prose texts, Journal of Quantitative Linguistics 3 (2), 97–105. https://doi.org/10.1080/09296179608599619


Dittrich, H. (1996). Word length frequency in the letters of G. E. Lessing, Journal of Quantitative Linguistics 3 (3), 260–264. https://doi.org/10.1080/09296179608599633


Frischen, J. (1996). Word length analysis of Jane Austen’s letters, Journal of Quantitative Linguistics 3 (1), 80–84. https://doi.org/10.1080/09296179608590066


Gómez, P. C. (2013). Statistical Methods in Language and Linguistic Research. Sheffield: Equinox Publishing Ltd.


Gries, S. T. (2009). Statistics for Linguistics. Berlin: R. De Gruyter Mouton. https://doi.org/10.1515/9783110216042


Grzybek, P. (Ed.) (2006). Contributions to the Science of Text and Language: Word Length Studies and Related Issues, Rotterdam: Springer. https://doi.org/10.1007/1-4020-4068-7


Grzybek, P., Stadlober, E., Kelih, E., and Antić, G. (2005). Quantitative text typology: The impact of word length. In: C. Weihs and W. Gaul (Eds), Classification – The Ubiquitous Challenge, 53–64. Heidelberg, Springer. https://doi.org/10.1007/3-540-28084-7_5


Hatzigeorgiu, N., Mikros, G., and Carayannis, G. (2001). Word length, word frequencies and Zipf’s Law in the Greek language. Journal of Quantitative Linguistics 8 (3), 175–185. https://doi.org/10.1076/jqul.8.3.175.4096


Jayaram, B. D. and Vidya, M. N. (2006). Word length distribution in Indian languages, Glottometrics 12, 16–38.


Kelih, E., Antić, G., Grzybek, P., and Stadlober, E. (2005). Classification of author and/or genre? The impact of word length. In C. Weihs and W. Gaul (Eds) Classification, the Ubiquitous Challenge, 498–505. Springer Berlin-Heidelberg. https://doi.org/10.1007/3-540-28084-7_58


Kromer, V. (2001). Word length model based on one displaced Poisson uniform distribution. Glottometrics 1, 87–96.


Krott, A. (1996). Some remarks on the relation between word length and morpheme length. Journal of Quantitative Linguistics 3 (1), 29–37. https://doi.org/10.1080/09296179608590061


Krylov, J. K. (2002). Synergetic models and methods in quantitative linguistics. Journal of Quantitative Linguistics 9 (2), 125–185. https://doi.org/10.1076/jqul.9.2.125.8487


Leopold, E. (1998). Frequency spectra within word‐length classes. Journal of Quantitative Linguistics 5 (3), 224–231. https://doi.org/10.1080/09296179808590130


Lupsa, D. A. and Lupsa, R. (2005). The law of word length in a vocabulary. Studia Univ. Babes-Bolyal, Informatica, Vol. L, No. 2.


Manning, C. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge: MIT Press.


Meyer, P. (1999). Relating word length to morphemic structure: A morphologically motivated class of discrete probability distributions, Journal of Quantitative Linguistics 6 (1), 66–69. https://doi.org/10.1076/jqul.6.1.66.4143


PawlRottmann, O. A. (1999). Word and syllable lengths in East Slavonic, Journal of Quantitative Linguistics 6 (3), 235–238. https://doi.org/10.1076/jqul.6.3.235.6162


Pande, H. and Dhami, H. S. (2010). Mathematical modelling of occurrence of letters and word’s initials in texts of Hindi Language. SKASE Journal of Theoretical Linguistics 7 (2), 19–38.


Pande, H. and Dhami, H. S. (2012).: Model generation for word length frequencies in texts with the application of Zipf’s order approach, Journal of Quantitative Linguistics 19 (4), 249–261. https://doi.org/10.1080/09296174.2012.714531


Pande, H. and Dhami, H. S. (2013a).Mathematical modelling of the pattern of occurrence of words in different corpora of the Hindi language, Journal of Quantitative Linguistics 20 (1), 1–12. https://doi.org/10.1080/09296174.2012.754596


Pande, H. and Dhami, H. S. (2013b). Analysis for the significance of statistical word-length features in genre discrimination of Hindi texts. IOSR Journal of Mathematics 8 (1), 5–10. https://doi.org/10.9790/5728-0810510


Popescu, I.-I., Naumann, S., Kelih, E., Rovenchak, A., Overbeck, A., Sanada, H., Smith, R., Čech, R., Mohanty, P., Wilson, A., and Altmann, G. (2013). Word length: Aspects and languages. In G. Altmann and R. Köhler (Eds), Issues in Quantitative Linguistics Vol. 3, 224–281. Studies in Quantitative Linguistics, vol. 13, Lüdenscheid: RAM-Verlag.


Renkui, H. and Minghu, J. (2012). Discrimination of Chinese Quantitative Style Features Based on Text Clustering. 11th International Conference on Signal Processing (ICSP), 2012 IEEE, 21–25 October 2012, Beijing.


Röttger, W. (1996). Distribution of word length in Ciceronian letters. Journal of Quantitative Linguistics 3 (1), 68–72. https://doi.org/10.1080/09296179608590064


Rottmann, O. (2003). Word length in the Baltic languages – are they of the same type as the word lengths in the Slavic languages? Glottometrics 6, 52–60.


Rottmann, O. A. (1997). Word‐length counting in Old Church Slavonic. Journal of Quant­itative Linguistics, 4 (1–3), 252–256. https://doi.org/10.1080/09296179708590101


Sigurd B., Eeg-Olofsson M., and Weijer, J. van de (2004). Word length, sentence length and frequency – Zipf revisited. Studia Linguistica 58 (1), 37–52. https://doi.org/10.1111/j.0039-3193.2004.00109.x


Těšitelová, M. (1992). Quantitative Linguistics. Amsterdam/Philadelphia: John Benjamins Publishing Company. https://doi.org/10.1075/llsee.37


Uhlírová, L. (1995). On the generality of statistical laws and individuality of texts. A case of syllables, word forms, their length and frequencies, Journal of Quantitative Linguistics 2 (3), 238–247. https://doi.org/10.1080/09296179508590052


Uhlírová, L. (1999). Word length modelling: Intertextuality as a relevant factor? Journal of Quantitative Linguistics 6 (3), 252–256. https://doi.org/10.1076/jqul.6.3.252.6165


Wilson, A. (2003). Word length distribution in modern Welsh prose texts. Glottometrics 6, 35–39.


Wilson, A. (2006). Word-length distribution in present-day lower Sorbian newspaper texts. In P. Grzybek (Ed.), Contributations to the Science of Text and Language: Word Length Studies and Related Issues, 319–327. Rotterdam: Springer.


Ziegler, A. (1996). Word length distribution in Brazilian‐Portuguese texts, Journal of Quantitative Linguistics 3 (1), 73–79. https://doi.org/10.1080/09296179608590065


Ziegler, A. (2000). Word length in Romance languages. A complemental contribution, Journal of Quantitative Linguistics 7 (1), 65–68. https://doi.org/10.1076/0929-6174(200004)07:01;1-3;FT065