A method for automatic text summarization based on rhetorical analysis and topic modeling

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

This article describes the original method of automatic summarization of scientific and technical texts based on rhetorical analysis and using topic modeling. The proposed method combines the use of a linguistic knowledge base and machine learning. For the detection of key terms, we used topic modeling. First, unigram topic models containing only one-word terms are constructed. Further, these models are extended by adding multiword terms. The most significant fragments of the original document are determined in the process of rhetorical analysis with the help of discursive markers. When evaluating the importance of text fragments, keywords, multiword terms, and scientific lexicon characterizing scientific and technical texts are also taken into account. A linguistic knowledge base has been created to store information about the markers and scientific lexicon. The experiments showed that this method is effective, needs a comparatively small amount of training data and can be adapted to processing texts of different subject fields in other languages.

Original languageEnglish
Pages (from-to)118-127
Number of pages10
JournalInternational Journal of Computing
Volume19
Issue number1
Publication statusPublished - 1 Jan 2020

Keywords

  • Additive regularization
  • Automatic summarization
  • Discourse markers
  • Natural language processing
  • Rhetorical structure theory
  • Topic modeling

Fingerprint

Dive into the research topics of 'A method for automatic text summarization based on rhetorical analysis and topic modeling'. Together they form a unique fingerprint.

Cite this