Rapid development of high-performance genomic, transcriptomic, proteomic and metabolic technologies led to an information explosion in the field of plant biology and agrobiology. To date, the number of scientific publications on only one of the most important agricultural crops of Solanum tuberosum L. (potato) has exceeded 1.5 million. Effective access to knowledge distributed over such a multitude of non-formalized natural language textual sources requires the use of special computer-Assisted intelligent methods of data mining (text-mining). However, in the literature, there is no data on the application of intellectual methods of automatic knowledge extraction from publications on agricultural crops, such as potato. Previously we have developed a pilot version of the SOLANUM TUBEROSUM knowledge base. SOLANUM TUBEROSUM is a computer platform for complex intellectual processing of large data bodies, including (1) automatic analysis of scientific publications and databases for extraction of information on genetics, markers, breeding, diagnostics, protection and storage technologies for potato, (2) formalized representation of extracted information in the knowledge base, (3) user access to these data, (4) analysis and visualization of query results. The ontology of the SOLANUM TUBEROSUM knowledge base contains dictionaries of molecular genetic objects (proteins, genes, metabolites, microRNAs, biomarkers); phenotypic characteristics of potato varieties; potato diseases and pests; biotic/abiotic environmental factors; potato agrobiotechnologies. This article describes the current version of the SOLANUM TUBEROSUM knowledge base developed from an extensive analysis of scientific publications on the moleculargenetic regulation of metabolic pathways in potatoes, as well as model plant organisms (maize, rice, Arabidopsis thaliana). In total, about 9,000 full-Text articles and more than 130,000 abstracts of PubMed were analyzed. With the help of automatic analysis of scientific publications, more than 59,000 facts on molecular genetic interactions and genetic regulation were identified, and the analysis of factual databases revealed more than 380,000 such interactions in the examined organisms. It turned out that about 3 % of extracted facts about molecular genetic interactions and genetic regulation were related to Solanum tuberosum L. Thus, the inclusion of information on well-studied model species during the extraction of information on the molecular-genetic regulation of metabolic processes is important. It allows prediction of orthologous genes in potato and their further identification and analysis based on homology. An associative network of genetic regulation of starch biosynthesis in potatoes, including 33 metabolites, 36 proteins, 6 metabolic pathways and 132 interactions between them, 86 of which describe catalytic reactions, and the rest - regulatory events, was reconstructed. The reconstructed network is the basis for the search for target genes for directed mutagenesis and marker-oriented selection of potato varieties with specified starch properties. The trial version of the SOLANUM TUBEROSUM knowledge base is available at http://www-bionet.sysbio.cytogen.ru/and/plant/.
Предметные области OECD FOS+WOS
- 4 СЕЛЬСКОХОЗЯЙСТВЕННЫЕ НАУКИ
- 3 МЕДИЦИНСКИЕ НАУКИ И ЗДРАВООХРАНЕНИЕ
- 1.06 БИОЛОГИЧЕСКИЕ НАУКИ