Identification of argumentative sentences in Russian scientific and popular science texts

N. V. Salomatina, I. S. Pimenov, E. A. Sidorova

Результат исследования: Научные публикации в периодических изданияхстатья по материалам конференциирецензирование

Аннотация

In this study we analyze the applicability of specific machine learning algorithms to the task of detecting sentences containing argumentation in Russian text. We employ a collection of scientific and popular science texts with manually annotated argumentation to evaluate the quality of identifying argumentative sentences in terms of precision, recall, and F-measure. The experiment involves three algorithms: MNB, SVM, and MLP. The bag of words model is used for representing texts. Lemmas of words in analyzed sentences serve as features for the classification. We perform the automatic selection of informative features in accordance with Variance and x2 criteria combined with the weight-based filtration of lemmas (via TF*IDF and EMI). The training set includes around 800 sentences, while the test set contains 180. The MNB algorithm demonstrates the highest F-measure and recall scores on almost all feature sets (maximal values reached equal 68.7% and 89% respectively), while the MLP algorithm shows the best precision for about half of feature selection variations (the maximal value is 72.5%).

Язык оригиналаанглийский
Номер статьи012025
ЖурналJournal of Physics: Conference Series
Том2099
Номер выпуска1
DOI
СостояниеОпубликовано - 13 дек 2021
СобытиеInternational Conference on Marchuk Scientific Readings 2021, MSR 2021 - Novosibirsk, Virtual, Российская Федерация
Продолжительность: 4 окт 20218 окт 2021

Предметные области OECD FOS+WOS

  • 1.03 ФИЗИЧЕСКИЕ НАУКИ И АСТРОНОМИЯ

Fingerprint

Подробные сведения о темах исследования «Identification of argumentative sentences in Russian scientific and popular science texts». Вместе они формируют уникальный семантический отпечаток (fingerprint).

Цитировать