RUREBUS-2020: СОРЕВНОВАНИЕ ПО ИЗВЛЕЧЕНИЮ ОТНОШЕНИЙ В БИЗНЕС-ПОСТАНОВКЕ

V. A. Ivanin, E. L. Artemova, T. V. Batura, V. V. Ivanov, V. V. Sarkisyan, E. V. Tutubalina, I. M. Smurov

Результат исследования: Научные публикации в периодических изданияхстатья по материалам конференциирецензирование

5 Цитирования (Scopus)

Аннотация

In this paper, we present a shared task on core information extraction problems, named entity recognition and relation extraction. In contrast to popular shared tasks on related problems, we try to move away from strictly academic rigor and rather model a business case. As a source for textual data we choose the corpus of Russian strategic documents, which we annotated according to our own annotation scheme. To speed up the annotation process, we exploit various active learning techniques. In total we ended up with more than two hundred annotated documents. Thus we managed to create a high-quality data set in short time. The shared task consisted of three tracks, devoted to 1) named entity recognition, 2) relation extraction and 3) joint named entity recognition and relation extraction. We provided with the annotated texts as well as a set of unannotated texts, which could of been used in any way to improve solutions. In the paper we overview and compare solutions, submitted by the shared task participants. We release both raw and annotated corpora along with annotation guidelines, evaluation scripts and results at https://github.com/dialogue-evaluation/RuREBus.

Переведенное названиеRurebus-2020 shared task: Russian relation extraction for business
Язык оригиналарусский
Страницы (с-по)416-431
Число страниц16
ЖурналKomp'juternaja Lingvistika i Intellektual'nye Tehnologii
Том2020-June
Номер выпуска19
DOI
СостояниеОпубликовано - 2020
Событие2020 Annual International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2020 - Moscow, Российская Федерация
Продолжительность: 17 июн 202020 июн 2020

Ключевые слова

  • BERT
  • Named entity recognition
  • Relation extraction
  • Russian fine-tuning
  • Shared task

Предметные области OECD FOS+WOS

  • 6.02 ЯЗЫК И ЛИТЕРАТУРА

Fingerprint Подробные сведения о темах исследования «RUREBUS-2020: СОРЕВНОВАНИЕ ПО ИЗВЛЕЧЕНИЮ ОТНОШЕНИЙ В БИЗНЕС-ПОСТАНОВКЕ». Вместе они формируют уникальный семантический отпечаток (fingerprint).

Цитировать