Abstract
The task of information retrieval is to find documents relevant to the query in a certain collection of documents. The document is a text selected by the author as a single fragment. A query is usually a meaningful phrase or set of words describing the information needed. Instead of searching through the whole document, organizing a search by topic or resume of the document becomes enough. By the term "topic" we refer to a set of small reference texts. Therefore, one of the interesting tasks in information retrieval systems is the task of classifying texts by topic. The whole classification process is carried out in four stages: preprocessing the text, weighing the terms, weighing the sentences, extracting meaningful sentences. In the process of selecting topics, fragments of the text are studied (for example, paragraphs) and compared with the chosen standard. Different fragments can be attributed to different topics. Selected fragments can be combined into a summary on this topic. This paper considers the issues of automatic summarization of text documents taking into account the syntactic relations between words and word forms in sentences that can be obtained at the output of the Link Gramma Parser (LGP) system for the Kazakh and Turkish languages. The authors operate on the results of studies on customizing the LGP parser for agglutinative languages.
Original language | English |
---|---|
Journal | CEUR Workshop Proceedings |
Volume | 2570 |
Publication status | Published - 1 Jan 2020 |
Event | 1st International Conference of Information Systems and Design, ICID 2019 - Moscow, Russian Federation Duration: 5 Dec 2019 → … |
Keywords
- Closeness centrality
- Directed graph
- Information retrieval
- LGP
- Summarization
- Text topics
- Word weight