In this paper, we present a shared task on core information extraction problems, named entity recognition and relation extraction. In contrast to popular shared tasks on related problems, we try to move away from strictly academic rigor and rather model a business case. As a source for textual data we choose the corpus of Russian strategic documents, which we annotated according to our own annotation scheme. To speed up the annotation process, we exploit various active learning techniques. In total we ended up with more than two hundred annotated documents. Thus we managed to create a high-quality data set in short time. The shared task consisted of three tracks, devoted to 1) named entity recognition, 2) relation extraction and 3) joint named entity recognition and relation extraction. We provided with the annotated texts as well as a set of unannotated texts, which could of been used in any way to improve solutions. In the paper we overview and compare solutions, submitted by the shared task participants. We release both raw and annotated corpora along with annotation guidelines, evaluation scripts and results at https://github.com/dialogue-evaluation/RuREBus.
|Translated title of the contribution||Rurebus-2020 shared task: Russian relation extraction for business|
|Number of pages||16|
|Journal||Komp'juternaja Lingvistika i Intellektual'nye Tehnologii|
|Publication status||Published - 2020|
|Event||2020 Annual International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2020 - Moscow, Russian Federation|
Duration: 17 Jun 2020 → 20 Jun 2020
- 6.02 LANGUAGES AND LITERATURE