О географической привязке контента текстовых документов

Oleg L. Zhizhimov, Yulia V. Leonova

Результат исследования: Научные публикации в периодических изданияхстатья по материалам конференциирецензирование

Аннотация

Extracting geographical names from arbitrary text documents is important in the tasks of processing large arrays of documents and linking their content to a specific geographic region. In the simplest form, the model for extracting geographical names from the text looks like a sequence of actions with the text, while at each stage its task is solved. Among these tasks, there are undoubtedly: text parsing, analyzing text elements, processing synonyms and abbreviations, bringing the text elements to normal form from possible word forms and grammar rules, comparing text elements with the elements of dictionaries of geographical names, adding special tags to the text for unambiguous identification geographical names. The proposed work describes a technology that implements the above tasks on the basis of a freely distributed PostgreSQL DBMS. In this case, the standard configuration is used, all the server part settings are performed within the framework of the documented procedures. GeoNames Gazetteer database, Open Street Map (OSM) databases, OKATO and КЛАДР classifications are used as an authoritative database of geographical names.

Переведенное названиеOn geographical binding of the content of text documents
Язык оригиналарусский
Страницы (с-по)241-247
Число страниц7
ЖурналCEUR Workshop Proceedings
Том2534
СостояниеОпубликовано - 12 янв. 2020
Опубликовано для внешнего пользованияДа
Событие2019 All-Russian Conference "Spatial Data Processing for Monitoring of Natural and Anthropogenic Processes", SDM 2019 - Berdsk, Российская Федерация
Продолжительность: 26 авг. 201930 авг. 2019

Ключевые слова

  • Full-text search
  • Geographical names
  • Geographical search
  • Model of extraction of names
  • PostgreSQL
  • Text processing

Fingerprint

Подробные сведения о темах исследования «О географической привязке контента текстовых документов». Вместе они формируют уникальный семантический отпечаток (fingerprint).

Цитировать