A study of machine learning algorithms applied to gis queries spelling correction

V. V. Fomin, I. Yu Bondarenko

Результат исследования: Научные публикации в периодических изданияхстатьярецензирование

Аннотация

The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.

Язык оригиналаанглийский
Страницы (с-по)200-210
Число страниц11
ЖурналKomp'juternaja Lingvistika i Intellektual'nye Tehnologii
Номер выпуска17
СостояниеОпубликовано - 2018

Предметные области OECD FOS+WOS

  • 6.02 ЯЗЫК И ЛИТЕРАТУРА

Fingerprint Подробные сведения о темах исследования «A study of machine learning algorithms applied to gis queries spelling correction». Вместе они формируют уникальный семантический отпечаток (fingerprint).

Цитировать