Identification of microorganisms by MALDI-TOF mass spectrometry is a very efficient method with high throughput, speed, and accuracy. However, it is significantly limited by the absence of a universal database of reference mass spectra. This problem can be solved by creating an Internet platform for open databases of protein spectra of microorganisms. Choosing the optimal mathematical apparatus is the pivotal issue for this task. In our previous study we proposed the geometric approach for processing mass spectrometry data, which represented a mass spectrum as a vector in a multidimensional Euclidean space. This algorithm was implemented in a Jacob4 stand-alone package. We demonstrated its efficiency in delimiting two closely related species of the Bacillus pumilus group. In this study, the geometric approach was realized as R scripts which allowed us to design a Web-based application. We also studied the possibility of using full spectra analysis (FSA) without calculating mass peaks (PPA), which is the logical development of the method. We used 74 microbial strains from the collections of ICiG SB RAS, UNIQEM, IEGM, KMM, and VGM as the models. We demonstrated that the algorithms based on peak-picking and analysis of complete data have accuracy no less than that of Biotyper 3.1 software. We proposed a method for calculating cut-off thresholds based on averaged intraspecific distances. The resulting database, raw data, and the set of R scripts are available online at https://icg-test.mydisk.nsc.ru/s/qj6cfZg57g6qwzN.
- Geometric approach
- microorganisms identification
- MS data processing