Аннотация
The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.
Язык оригинала | английский |
---|---|
Страницы (с-по) | 185-199 |
Число страниц | 15 |
Журнал | Komp'juternaja Lingvistika i Intellektual'nye Tehnologii |
Том | 2018-May |
Номер выпуска | 17 |
Состояние | Опубликовано - 1 янв 2018 |
Событие | 2018 International Conference on Computational Linguistics and Intellectual Technologies, Dialogue 2018 - Moscow, Российская Федерация Продолжительность: 30 мая 2018 → 2 июн 2018 |