Automatic text summarization based on syntactic links

A. S. Yerimbetova, T. V. Batura, F. A. Murzin, S. K. Sagnayeva

Результат исследования: Научные публикации в периодических изданияхстатья по материалам конференции

Аннотация

The task of information retrieval is to find documents relevant to the query in a certain collection of documents. The document is a text selected by the author as a single fragment. A query is usually a meaningful phrase or set of words describing the information needed. Instead of searching through the whole document, organizing a search by topic or resume of the document becomes enough. By the term "topic" we refer to a set of small reference texts. Therefore, one of the interesting tasks in information retrieval systems is the task of classifying texts by topic. The whole classification process is carried out in four stages: preprocessing the text, weighing the terms, weighing the sentences, extracting meaningful sentences. In the process of selecting topics, fragments of the text are studied (for example, paragraphs) and compared with the chosen standard. Different fragments can be attributed to different topics. Selected fragments can be combined into a summary on this topic. This paper considers the issues of automatic summarization of text documents taking into account the syntactic relations between words and word forms in sentences that can be obtained at the output of the Link Gramma Parser (LGP) system for the Kazakh and Turkish languages. The authors operate on the results of studies on customizing the LGP parser for agglutinative languages.

Язык оригиналаанглийский
ЖурналCEUR Workshop Proceedings
Том2570
СостояниеОпубликовано - 1 янв 2020
Событие1st International Conference of Information Systems and Design, ICID 2019 - Moscow, Российская Федерация
Продолжительность: 5 дек 2019 → …

Fingerprint Подробные сведения о темах исследования «Automatic text summarization based on syntactic links». Вместе они формируют уникальный семантический отпечаток (fingerprint).

  • Цитировать