12.09.2018 change 12.09.2018

Researchers at the University of Warsaw develop a tool that accelerates the description of bacterial genetic sequences

Photo: Fotolia Photo: Fotolia

PhD students from the University of Warsaw are working on an online system that will shorten the stage of identifying and describing the functions of bacterial genes from several dozen days to several hours. Every scientist using genetic analyses of bacteria will be able to use the tool.

The online system for quick and accurate description of DNA sequences of bacteria is being developed by Mikołaj Dziurzyński, Przemysław Decewicz and Adrian Górecki, doctoral students from the Bacterial Genetics Department of the Faculty of Biology, University of Warsaw.

The main advantage of the system is providing high-quality results in just a few hours - many times faster than tedious and days-long manual analysis of genomic sequences of bacteria, the representatives of the scientific project inform in a release.

Currently, almost all research on vaccines, drugs or enzymes begins with bacterial DNA sequencing, including the so-called reverse vaccinology. Today, high-throughput sequencing is widely used in science. As a result, researchers now have a variety of databases at their disposal, containing a huge number of already identified DNA sequences. But they are not able to quickly and reliably analyse the available information in terms of the functions of individual genes, especially that errors often occur in the descriptions in the databases.

"So far, there are no automatic tools that would provide fully reliable sequencing results in the form of the described gene functions. Manual annotation (assigning functions - ed.) is very time-consuming. Description of one bacterial genome takes 8-10 hours a day for 20 to 30 days. And remember that many scientists analyse the genomes of several dozen to several hundred bacteria in a single project. Our online system will shorten the required time to just a few, a maximum of a dozen or so hours. Preliminary results will be available within a few hours" - says Przemysław Decewicz from the Faculty of Biology of the University of Warsaw, quoted in the release sent to PAP.

"Our assumption for the system was to combine the advantages of available data analysis methods obtained during sequencing. We built a semi-automated expert system and added a very intuitive interface, shortening the time needed for analysis from a few weeks to just a few hours. As a result, we can offer the scientific world an inexpensive, very fast and intuitive space to reliably determine the functions of the analysed DNA sequences" - adds Mikołaj Dziurzyński.

Huge databases containing described DNA sequences are available online. The system draws from these databases and integrates the data, and additionally, each return information from the database is analysed by an expert. The participation of an expert allows to eliminate the problem that currently affects fully automatic systems - high error rate. It happens quite often that individual sequences are described incorrectly in databases. Such information is then repeated in subsequent studies. Human participation in providing results will not only ensure that the description of the gene is correct, it will also allow to correct erroneous information in the database.

According to the authors of the system, the basic version of the solution will be free. The full version will be commercial, using a micropayment model.

The system is intuitive, the authors say. Additional training is not needed for using it. Users simply send a sequence via the system website and receive the result after a few hours.

Tools offering similar functions are available. But they are not available online, their interface is complicated, and the use of services is quite expensive. In turn, when deciding to use manual annotation, scientists must use a dozen different tools, and they must learn each of them in advance.

The system is currently in the testing phase. In the coming months, thanks to the financial support obtained from the University Technology Transfer Centre, the solution will be developed and made available to the scientific world.

"Soon the commercialisation of the new service will be possible. Its scalability is practically unlimited. The computing power will be increased to match the demand. The propagation of services in the scientific world, which analyses bacterial DNA sequences in its research projects, will be the key issue at the commercialisation stage" - says Robert Dwiliński, head of the University Technology Transfer Centre, quoted in the press release.

Why exactly do scientists analyse bacterial DNA? They search for bacteria that will help solve important problems, for example environmental pollution. There are more and more areas on Earth that can not be used due to pollution. They are often restored to the original state with bacteria, through biomerediation. Researchers search for genes that code for proteins capable of breaking down certain toxic compounds. Once the right DNA sequence is found, it can be transferred to the bacteria that work best in a given environment. What`s more, genes in bacteria can be accumulated, which allows to create a superbacteria for a given environment, one that will quickly and effectively clean it and then die because there is no more food for it.

"It is similar with antigens. When we are dealing with bacteria that threaten human life, we search for the most characteristic proteins of these bacteria. When we find genes associated with a given protein, we can design a targeted therapy. The system developed at the University of Warsaw will significantly accelerate the work of scientists looking for specific genes in bacteria" - reads the release.

Sequencing was developed in the 1960s. A decade after the discovery of DNA, the first techniques for its reading were developed. With time, new, more precise and faster DNA sequencing techniques appeared. In 2000, the human genome sequencing project was completed. It cost several billion dollars and took several years. Today, thanks to the development of high-throughput technologies, the cost of sequencing the human genome does not exceed one thousand dollars.

DNA sequencing is currently a relatively cheap and fast technology. We live in an era in which a huge number of sequences can be obtained in a short time thanks to high-throughput sequencing. The method is widely available. But the sequences themselves do not provide valuable information. The bottleneck is sequence analysis and description of their functions. The correctness of the results at this stage determines the success of further research. Wrong assumptions mean that the conclusions are also incorrect.

PAP - Science in Poland

ekr/ zan/ kap/

tr. RL

Copyright © Foundation PAP 2018