A statistical test for the Zipf's law by deviations from the Heaps' law

Research output: Contribution to journalArticle

Abstract

We explore a probabilistic model of an artistic text: words of the text are chosen independently of each other in accordance with a discrete probability distribution on an infinite dictionary. The words are enumerated 1, 2,:::, and the probability of appearing the i'th word is asymptotically a power function. Bahadur proved that in this case the number of different words as a function of the length of the text, again, asymptotically behaves like a power function. On the other hand, in the applied statistics community there are statements known as the Zipf's and Heaps' laws that are supported by empirical observations. We highlight the links between Bahadur results and Zipf's/Heaps' laws, and introduce and analyse a corresponding statistical test.

Original languageEnglish
Pages (from-to)1822-1832
Number of pages11
JournalSiberian Electronic Mathematical Reports
Volume16
DOIs
Publication statusPublished - 1 Jan 2019

Keywords

  • Heaps' law
  • Weak convergence
  • Zipf's law

Fingerprint Dive into the research topics of 'A statistical test for the Zipf's law by deviations from the Heaps' law'. Together they form a unique fingerprint.

  • Cite this