The work considers an approach to information extraction based on lexico-syntactic patterns (LSPs). LSPs are built on the basis of knowledge about the scientific subject domain presented in the ontology and the corpus of scientific publications in different areas of knowledge. Two key tasks must be solved with the help of the LSPs: extracting object names and constructing objects in accordance with the structure of the ontology classes. In line with these tasks, terminological and informational LSPs are differentiated. Terminological patterns ensure the extraction of object names and properties based on indicators - marker words and phrases. Information patterns provide identification of ontology objects based on key attributes, description of actant structure for predicates expressing attributive relations and relations between ontology objects, as well as matching language constructions to values of attributes of ontology objects and their relations. Research is conducted on the basis of a corpus of scientific publications, which includes 100 articles from various fields of knowledge. The ways of expressing information about research method as the central concept of the ontology of scientific activity are investigated.
|Журнал||CEUR Workshop Proceedings|
|Состояние||Опубликовано - 2021|
|Событие||Supplementary 23rd International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2021 - Virtual, Moscow, Российская Федерация|
Продолжительность: 26 окт. 2021 → 29 окт. 2021
Предметные области OECD FOS+WOS
- 1.02 КОМПЬЮТЕРНЫЕ И ИНФОРМАЦИОННЫЕ НАУКИ