Information-Theoretic method for classification of texts

B. Ya Ryabko, A. E. Gus’kov, I. V. Selivanova

Research output: Contribution to journalArticlepeer-review

Abstract

We consider a method for automatic (i.e., unmanned) text classification based on methods of universal source coding (or “data compression”). We show that under certain restrictions the proposed method is consistent, i.e., the classification error tends to zero with increasing text lengths. As an example of practical use of the method we consider the classification problem for scientific texts (research papers, books, etc.). The proposed method is experimentally shown to be highly efficient.

Original languageEnglish
Pages (from-to)294-304
Number of pages11
JournalProblems of Information Transmission
Volume53
Issue number3
DOIs
Publication statusPublished - 1 Jul 2017

Fingerprint Dive into the research topics of 'Information-Theoretic method for classification of texts'. Together they form a unique fingerprint.

Cite this