A study of machine learning algorithms applied to GIS queries spelling correction

V. V. Fomin, I. Yu Bondarenko

Результат исследования: Научные публикации в периодических изданияхстатья по материалам конференциирецензирование


The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.

Язык оригиналаанглийский
Страницы (с-по)185-199
Число страниц15
ЖурналKomp'juternaja Lingvistika i Intellektual'nye Tehnologii
Номер выпуска17
СостояниеОпубликовано - 1 янв. 2018
Событие2018 International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2018 - Moscow, Российская Федерация
Продолжительность: 30 мая 20182 июн. 2018


Подробные сведения о темах исследования «A study of machine learning algorithms applied to GIS queries spelling correction». Вместе они формируют уникальный семантический отпечаток (fingerprint).