A method for automatic text summarization based on rhetorical analysis and topic modeling

Tatiana Batura, Aigerim Bakiyeva, Maria Charintseva

Результат исследования: Научные публикации в периодических изданияхстатьярецензирование

4 Цитирования (Scopus)

Аннотация

This article describes the original method of automatic summarization of scientific and technical texts based on rhetorical analysis and using topic modeling. The proposed method combines the use of a linguistic knowledge base and machine learning. For the detection of key terms, we used topic modeling. First, unigram topic models containing only one-word terms are constructed. Further, these models are extended by adding multiword terms. The most significant fragments of the original document are determined in the process of rhetorical analysis with the help of discursive markers. When evaluating the importance of text fragments, keywords, multiword terms, and scientific lexicon characterizing scientific and technical texts are also taken into account. A linguistic knowledge base has been created to store information about the markers and scientific lexicon. The experiments showed that this method is effective, needs a comparatively small amount of training data and can be adapted to processing texts of different subject fields in other languages.

Язык оригиналаанглийский
Страницы (с-по)118-127
Число страниц10
ЖурналInternational Journal of Computing
Том19
Номер выпуска1
СостояниеОпубликовано - 1 янв. 2020

Fingerprint

Подробные сведения о темах исследования «A method for automatic text summarization based on rhetorical analysis and topic modeling». Вместе они формируют уникальный семантический отпечаток (fingerprint).

Цитировать