RUREBUS-2020: СОРЕВНОВАНИЕ ПО ИЗВЛЕЧЕНИЮ ОТНОШЕНИЙ В БИЗНЕС-ПОСТАНОВКЕ

Translated title of the contribution: Rurebus-2020 shared task: Russian relation extraction for business

V. A. Ivanin, E. L. Artemova, T. V. Batura, V. V. Ivanov, V. V. Sarkisyan, E. V. Tutubalina, I. M. Smurov

Research output: Contribution to journalConference articlepeer-review

7 Citations (Scopus)

Abstract

In this paper, we present a shared task on core information extraction problems, named entity recognition and relation extraction. In contrast to popular shared tasks on related problems, we try to move away from strictly academic rigor and rather model a business case. As a source for textual data we choose the corpus of Russian strategic documents, which we annotated according to our own annotation scheme. To speed up the annotation process, we exploit various active learning techniques. In total we ended up with more than two hundred annotated documents. Thus we managed to create a high-quality data set in short time. The shared task consisted of three tracks, devoted to 1) named entity recognition, 2) relation extraction and 3) joint named entity recognition and relation extraction. We provided with the annotated texts as well as a set of unannotated texts, which could of been used in any way to improve solutions. In the paper we overview and compare solutions, submitted by the shared task participants. We release both raw and annotated corpora along with annotation guidelines, evaluation scripts and results at https://github.com/dialogue-evaluation/RuREBus.

Translated title of the contributionRurebus-2020 shared task: Russian relation extraction for business
Original languageRussian
Pages (from-to)416-431
Number of pages16
JournalKomp'juternaja Lingvistika i Intellektual'nye Tehnologii
Volume2020-June
Issue number19
DOIs
Publication statusPublished - 2020
Event2020 Annual International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2020 - Moscow, Russian Federation
Duration: 17 Jun 202020 Jun 2020

OECD FOS+WOS

  • 6.02 LANGUAGES AND LITERATURE

Fingerprint

Dive into the research topics of 'Rurebus-2020 shared task: Russian relation extraction for business'. Together they form a unique fingerprint.

Cite this