Genome analysis: understanding the book from a typing error

Flower colours, fragrances or substances that can be used therapeutically - often, only tiny variants in the genetic material determine which characteristics an organism displays. The search for such variants is many times like looking for a needle in a haystack. Scientists at the Max Planck Institute for Developmental Biology have now published a method in Nature Genetics that can be used to detect such gene variants even in species whose genome has not yet been completely deciphered. Instead of searching through the entire "text" of the genome, they are able to detect differences and to identify the genes and their characteristics. The new method is not only faster than the traditional way - it also enables them the possibility of identifying genes in plants that have been hardly studied so far, genes that play a fundamental role in the biosynthesis of medically valuable ingredients.

Tiny variants in the genetic material determine which characteristics an organism displays.

Plants can widely differ in various characteristics - from growth height, flower shape and colour to resistance to pests or chemical ingredients that can be used for therapeutic purposes. Often it is only small variations in the genome that are responsible for a particular characteristic. This can be the exchange of individual nucleotides, i.e. DNA building blocks (single nucleotide polymorphisms, single nucleodid polymorphisms, SNPs) or structural variants, which can result from the loss, insertion or exchange of larger or smaller DNA segments.

Traditionally, scientists use so-called reference genomes as a starting point for the search for genetic variants. As the creation of such reference genomes is very demanding, there is usually only one per species. The genome of other individuals is then analysed using a much simpler method and is compared with the reference genome. "This is like selecting a book from the library of life, breaking it down into sentences and finally into words and then trying to spot the small typing errors in other editions," explains Detlef Weigel, Managing Director at the Max Planck Institute for Developmental Biology. The disadvantage of this approach: It is enormously complex and time-consuming - the prerequisite is that a completely decoded reference genome exists.

Yoav Voichek, first author of the study and member of Weigel's team, has now developed a new method that virtually turns the tables. "Instead of first deciphering the entire genetic text of a plant, we immediately start looking for genetic variants and associate them with specific traits," said Voichek. "Based on this we can then identify affected genes." This approach succeeds with the help of bioinformatic analysis of so-called k-mers, which can be compared with word fragments or syllables in relation to the entire text of the genome. These are short pieces of DNA whose presence or absence allows statements to be made about gene variants such as SNPs, but also about larger differences.

Voichek and Weigel compared the method for three plant species - thale cress, maize and tomato - with the conventional approach. All three examples showed that the method is in no way inferior to other approaches when it comes to detecting SNPs, but that it is also very efficient in detecting major deviations from the reference genome. "A reference genome exists for the plants on which we tested the method," said Weigel. "But what is decisive for us is that this method would also work if no or only fragmentary information about the genome was available." This is of particular interest when it comes to analysing newly discovered plants or searching for certain ingredients that might be of medical use.

Together with chemist Sarah O'Connor from the Max Planck Institute for Chemical Ecology in Jena, the scientists want to put their method to the acid test as soon as possible. O'Connor researches the chemical ingredients of a wide range of plants. "Many of the plants we are interested in cannot even be grown in the laboratory," says O'Connor. To study such species scientifically is correspondingly complicated. "I look forward to applying the method to plants that grow in the Amazon rainforest, for example." Together, the Max Planck researchers hope to unlock one or two secrets in species that have so far been poorly researched.

Original publication:

Voichek Y and Weigel D. Identifying genetic variants underlying phenotypic variation in plants without complete genomes. Nature Genetics 2020
https://doi.org/10.1038/s41588-020-0612-7

Scientific contact:

Max Planck Institute for Developmental Biology
Prof. Dr. Detlef Weigel
Tel.: +49 7071 601-1410
eMail: detlef.weigel(at)tuebingen.mpg.de