Parallel text document clustering based on genetic algorithm

Madina Mansurova, Vladimir Barakhnin, Sanzhar Aubakirov, Yerzhan Khibatkhanuly, Aigerim Mussina

Результат исследования: Научные публикации в периодических изданияхстатья

Аннотация

This work describes parallel implementation of the text document clustering algorithm. The algorithm is based on evaluation of the similarity between objects in a competitive situation, which leads to the notion of the function of rival similarity. Attributes of bibliographic description of scientific articles were chosen as the scales for determining similarity measure. To find the weighting coefficients which are used in the formula of similarity measure a genetic algorithm is developed. To speed up the performance of the algorithm, parallel computing technologies are used. Parallelization is executed in two stages: in the stage of the genetic algorithm, as well as directly in clustering. The parallel genetic algorithm is implemented with the help of MPJ Express library and the parallel clustering algorithm using the Java 8 Streams library. The results of computational experiments showing benefits of the parallel implementation of the algorithm are presented.

Язык оригиналаанглийский
Страницы (с-по)218-232
Число страниц15
ЖурналCEUR Workshop Proceedings
Том1839
СостояниеОпубликовано - 2017

    Fingerprint

Цитировать

Mansurova, M., Barakhnin, V., Aubakirov, S., Khibatkhanuly, Y., & Mussina, A. (2017). Parallel text document clustering based on genetic algorithm. CEUR Workshop Proceedings, 1839, 218-232.