RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain

Vitaly Ivanin, Ekaterina Artemova, Tatiana Batura, Vladimir Ivanov, Veronika Sarkisyan, Elena Tutubalina, Ivan Smurov

Результат исследования: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциинаучнаярецензирование

Аннотация

We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency. The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English. Unlike expectations, the state-of-the-art transformer-based models show modest performance for both tasks, either when approached sequentially, or in an end-to-end fashion. Our experiments have demonstrated that fine-tuning on a large unlabeled corpora does not automatically yield significant improvement and thus we may conclude that more sophisticated strategies of leveraging unlabelled texts are demanded. In this paper, we describe the whole developed pipeline, starting from text annotation, baseline development, and designing a shared task in hopes of improving the baseline. Eventually, we realize that the current NER and RE technologies are far from being mature and do not overcome so far challenges like ours.

Язык оригиналаанглийский
Название основной публикацииAnalysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Selected Papers
РедакторыWil M. van der Aalst, Vladimir Batagelj, Dmitry I. Ignatov, Michael Khachay, Olessia Koltsova, Andrey Kutuzov, Sergei O. Kuznetsov, Irina A. Lomazova, Natalia Loukachevitch, Amedeo Napoli, Alexander Panchenko, Panos M. Pardalos, Marcello Pelillo, Andrey V. Savchenko, Elena Tutubalina
ИздательSpringer Science and Business Media Deutschland GmbH
Страницы19-27
Число страниц9
ISBN (печатное издание)9783030726096
DOI
СостояниеОпубликовано - 2021
Событие9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020 - Moscow, Российская Федерация
Продолжительность: 15 окт 202016 окт 2020

Серия публикаций

НазваниеLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Том12602 LNCS
ISSN (печатное издание)0302-9743
ISSN (электронное издание)1611-3349

Конференция

Конференция9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020
СтранаРоссийская Федерация
ГородMoscow
Период15.10.202016.10.2020

Предметные области OECD FOS+WOS

  • 1.02 КОМПЬЮТЕРНЫЕ И ИНФОРМАЦИОННЫЕ НАУКИ

Fingerprint

Подробные сведения о темах исследования «RuREBus: A Case Study of Joint Named Entity Recognition and Relation Extraction from E-Government Domain». Вместе они формируют уникальный семантический отпечаток (fingerprint).

Цитировать