A study of machine learning algorithms applied to gis queries spelling correction

V. V. Fomin, I. Yu Bondarenko

Research output: Contribution to journalArticlepeer-review

Abstract

The problem of spelling correction is crucial for search engines as misspellings have a negative effect on their performance. It gets even harder when search queries are related to a specific area not quite covered by standard spell checkers, such as geographic information systems (GIS). Moreover, standard spell-checkers are interactive, i. e. they can notice a misspelled word and suggest candidate corrections, but picking one of them is up to the user. This is why we decided to develop a spelling correction unit for 2GIS, a cartographic search company. To do this, we have extracted and manually annotated a corpus of GIS lookup queries, trained a language model, performed various experiments to find the best feature extractor, then fitted a logistic regression using an approach suggested in SpellRuEval, and then used it iteratively to get a better result. We have then measured the resulting performance by means of cross-validation, compared at against a baseline and observed a substantial increase. We also present an interpretation of the result achieved by calculating and discussing the importance of specific features and analyzing the output of the model.

Original languageEnglish
Pages (from-to)200-210
Number of pages11
JournalKomp'juternaja Lingvistika i Intellektual'nye Tehnologii
Issue number17
Publication statusPublished - 2018

Keywords

  • Geographic information system
  • Language model
  • Local search
  • Spell checker
  • Text corpus

Fingerprint

Dive into the research topics of 'A study of machine learning algorithms applied to gis queries spelling correction'. Together they form a unique fingerprint.

Cite this