Classification by Compression: Application of Information-Theory Methods for the Identification of Themes of Scientific Texts

I. V. Selivanova, B. Y. A. Ryabko, A. E. Guskov

A method for automatic classification of scientific texts based on data compression is proposed. The method is implemented and investigated based on the data from an archive of scientific texts ( and in the CyberLeninka scientific electronic library ( Experiments showed that the method correctly identified the themes of scientific texts with a probability of 75-95%; its accuracy depends on the quality of the original data.

Страницы (с-по)120-126
Число страниц7
ЖурналAutomatic documentation and mathematical linguistics
Номер выпуска3
СостояниеОпубликовано - 1 июн 2017