Sep

2025

'Biotech Is Booming Worldwide'

For more than five years, the International Laboratory of Bioinformatics at the HSE Faculty of Computer Science has been advancing cutting-edge research. During this time, its scientists have achieved major breakthroughs, including the development of CARDIOLIFE—a unique genetic test unmatched worldwide that predicts the likelihood of cardiovascular disease. With the active participation of HSE students, including doctoral students, the team is also working on a new generation of medicines. In this interview with the HSE News Service, Laboratory Head Maria Poptsova shares insights into their work.

— When was the laboratory established?

— We first formed a research and study group in 2018. About six months later, it was upgraded to a full laboratory. During the pandemic, we gained international status, which made it possible to invite foreign scientists to join without requiring their physical presence in Russia.

At that time, we were actively developing deep learning models to analyse genomic data, and close collaboration with international colleagues was essential for exchanging ideas and data. Gaining the status of an international laboratory enabled us to partner with the experimental lab at the Fox Chase Cancer Center, Temple University, Pennsylvania. This collaboration led to a joint publication in Nature, one of the world’s leading journals. Our colleagues provided experimental data, which we processed as bioinformaticians and data scientists, building deep learning models to predict a highly important genomic element—Z-DNA, a left-handed secondary DNA structure. The model we developed was later used to investigate the mechanism of action of a cancer drug.

— Why did you choose Prof. Alan Herbert as your scientific supervisor, and what role does he play in the laboratory’s work?

— We first encountered Prof. Alan Herbert when we submitted an article to an international journal presenting our initial deep learning model for predicting Z-DNA. In the peer review, we were told we had overlooked several key papers on the subject. When we examined the reviewer’s comments, we discovered that all the cited works were authored by Alan Herbert—a renowned expert in DNA secondary structures and a leading specialist in our field of research. We reached out to him by email and later spoke over Zoom. In the summer of 2020, we organised a summer school on machine learning in bioinformatics, which had to be moved online because of the pandemic. This shift allowed us to attract participants from around the world, including the United States, China, and several European countries. Impressed by the quality of our presentations, Prof. Herbert agreed to become our scientific consultant.

Since then, we have been actively collaborating online. Together, we launched the annual ABZ Meeting—an international conference on Z-DNA—which has been held online each year. There are now plans to host it in person next year in Oxford.

Alan Herbert continues to foster international scientific collaborations, despite today’s challenges. In addition to his university work, he also leads a small startup, which further supports his efforts to build international partnerships.

— How would you explain bioinformatics to a general audience? Which scientific disciplines does it combine?

— Bioinformatics originally emerged as a set of computer-based methods for processing molecular biology data—information about cellular components such as DNA, RNA, proteins, and other macromolecules. Almost as soon as computers became available, scientists began conducting experiments to decode the composition of these molecules. For example, a DNA sequence can be represented as text using an alphabet of four letters, while a protein sequence uses an alphabet of twenty. A specific set of rules—the genetic code, deciphered in the late 1960s—governs how one alphabet is translated into the other. At the same time, technological advances made it possible to obtain DNA and protein sequences from different organisms. This created a need for algorithms to assess the similarities and differences between sequences and to estimate the likelihood of processes explaining how one DNA string could transform into another. This marked the emergence of bioinformatics. Over time, its methods and algorithms grew increasingly sophisticated, developing alongside new biotechnologies.

— Could you give some examples?

— One example is the emergence of genome-wide sequencing technology. This created a need to process vast amounts of genomic data and extract meaningful information. For instance, we can compare the sequences of different genomes or identify changes in an individual’s genome, such as single-letter substitutions, insertions, or deletions of small or large DNA segments.

Next, a new line of experimental technologies emerged, allowing researchers to read not only DNA sequences but also signals from other layers of information encoding—what is known as the epigenetic code.

We launched our laboratory to study DNA secondary structures, which are also encoded in the genome, with the goal of better understanding the algorithms and rules governing genomic function. Addressing this challenge requires comparing all levels of genetic information encoding. With three billion characters in the human genome and hundreds of thousands of genome-wide experiments already available to map epigenetic signals, it has become essential to use deep learning algorithms to uncover the relationships between different layers of genetic encoding. In non-biological fields, deep learning evolves rapidly, so we must adapt these advances to our biological research just as quickly.

— What are the key research areas of your laboratory?

— We focus on developing deep learning methods and architectures tailored to our research needs. In recent years, large language models and foundational AI models—such as those based on ChatGPT, DeepSeek, and other advanced systems—have begun to emerge specifically for applications in genomics and biology.

Large language models in genomics, like those in natural language processing, are extremely large—for example, Evo2 contains 40 billion parameters—and require a supercomputer to run

One of our objectives has been to test these large models on the genomes of cardiovascular patients and integrate them into genetic testing. We are also developing our own deep learning models to analyse the code of DNA secondary structures and epigenetic information.

Among other research areas, the laboratory is studying the tumour microenvironment using single-cell sequencing data. This approach allows us to determine which genes or programmes are active and which are inactive in individual cells.

For example, a tumour sample contains not only tumour cells but also normal tissue cells and immune cells, including lymphocytes, macrophages, neutrophils, and others. The central question we aim to answer is how and why the tumour evades the immune response. In a healthy organism, immune cells should recognise and eliminate foreign tumour cells. However, tumour cells somehow reprogram the immune cells, preventing them from identifying and destroying the cancerous cells. This behaviour is driven by genetic programmes that either suppress or activate the immune response. We are working to identify and study these programmes. Currently, this research relies on open data, but we plan to establish collaborations with experimental laboratories in Russia.

In parallel, we have been studying the role of non-coding variants, which are located outside protein-coding genes in the intergenic regions that make up 98% of the genome. The effects of these non-coding variants are poorly understood. We are developing deep learning methods and applying large language models to predict how non-coding variants might influence protein production—or, conversely, inhibit it—which could be linked to the onset and progression of disease.

— In which areas of medicine and biology are your laboratory’s results most in demand? Where do they contribute to significant advances in disease prevention and treatment?

— One key area is testing in cardiogenetics, which has historically lagged behind oncogenetics. We have been developing this field for about five years. To support this work, we established a cardiogenetic consortium that brings together clinicians, bioinformatics specialists, and genetic laboratories capable of sequencing individual genomes. As part of the 100,000 Russian Genomes project, we are collaborating with the Chazov Cardiology Centre and Bauman City Hospital No. 29. To date, we have sequenced approximately 1,000 complete genomes and are analysing them for variants associated with cardiovascular conditions.

To date, more than 900 genes have been identified as involved in the development of cardiovascular diseases. Drawing on all our accumulated experience, we developed a genetic test called CARDIOLIFE, which is now commercially available. This test enables patients to receive information about the presence of pathogenic variants associated with cardiovascular conditions.

— What are some of the most promising areas of your research?

— Our laboratory is preparing to expand into oncogenetics, since the methods used in genetic testing are similar across different diseases. We also have experience analysing large genomic datasets using AI algorithms. Our goal is to identify a small set of markers with strong predictive power. The fewer markers required to predict a disease, the easier it is to scale testing. We anticipate that early cancer diagnosis could be achieved using just 8–10 markers, and this is our near-term objective.

— What are the key features of the CARDIOLIFE test developed by your laboratory?

— CARDIOLIFE is a unique test with no equivalent anywhere in the world, and we are proud to have developed it. Among all currently available tests, it analyses the most comprehensive set of genes and regulatory regions. Standard programmes used by genetic testing companies do not cover this—we conduct research on a much deeper level. A single gene can have numerous variants, and we examine all isoforms, since the same mutation can affect each isoform differently. Additionally, we analyse non-coding variants and assess their impact on gene expression using AI techniques. This is what makes the CARDIOLIFE test unique. Currently, such analyses are not included in standard genetic testing.

— Could you tell us about the main research areas of your mirror laboratory with Surgut University?

— Our focus is on developing predictive systems for medicine using AI methods. This project grew out of the cardiogenetic consortium, as we discovered that electronic medical records have been maintained in the Khanty-Mansiysk Autonomous Okrug—Yugra since 2009. We began analysing the medical records of patients admitted to the Surgut District Cardiology Centre with a diagnosis of myocardial infarction.

These patients were treated, discharged, and subsequently monitored, as many faced a high risk of adverse events such as recurrent heart attacks, strokes, bleeding, or death. We developed a predictive system to assess the risk of such events, using data from 10,000 patients collected since 2009. We are now planning to retrain this model using data from other cardiology centres. We have also developed methods to process medical data for use with both classical machine learning algorithms and those underlying models like ChatGPT. This project is highly scalable and can be replicated in other regions. Clinicians can also define new objectives, such as creating models to predict the likelihood of side effects from specific medications.

In particular, in collaboration with the Russian Medical Academy of Continuing Professional Education, we began developing the first models to predict adverse events associated with anticoagulants and antidepressants. This work is especially important, as these medications can have numerous side effects, particularly in adolescents.

— Is AI used in drug development?

— Another area we have recently begun exploring is the use of AI methods to design protein-based drugs. To recall, in 2024, David Baker, Director of the Institute for Protein Design at the University of Washington in Seattle, and Demis Hassabis and John Jumper from Google DeepMind were awarded the Nobel Prize in Chemistry for their work on computational protein design and protein structure prediction. In recent years, drug development has seen a breakthrough through the use of generative models to create peptides of 15–20 amino acids that can specifically bind to targeted protein sites. These protein binders represent a new generation of drugs. Unlike conventional drugs based on chemical molecules, they are biologically derived. Instead of large chemical antibodies, binders can be used to attach to and neutralise harmful proteins. The models developed by Baker and his institute are publicly available. Today, students are defending their theses and term papers on using these models to discover protein–protein interactions. Numerous companies have been founded to develop protein-based drugs, which represent the future of medicine. We are also working in this field and have already seen our first results.

Biotech is booming worldwide, with its growth outpacing even the rapid development of computer technologies.

Ray Kurzweil, a prominent futurist and CEO at Google, known for predicting major technological breakthroughs such as driverless cars and AI, believes that by the mid-2030s, the singularity will lead humans to merge with AI, creating a new hybrid form of existence. Key areas of the human brain will be mapped, and the algorithms governing cellular function will be deciphered. Combining this knowledge will make it possible to program cells and grow organs. This seemingly science-fiction future may await us, thanks to the convergence of biotechnology and artificial intelligence.

— Which results of your laboratory’s fundamental and applied research would you consider its key achievements?

— The creation of CARDIOLIFE, our publication in Nature, the discovery of a potential approach to cancer treatment by targeting genetic programmes through DNA secondary structures, and the development of deep learning models for genome-wide annotation of DNA secondary structures—an achievement unmatched anywhere else in the world.

— How are the results of the laboratory’s scientific work incorporated into the university’s educational process?

— I always assign students topics that address current scientific problems for their term papers and theses. If a student successfully solves such a problem, they can become a co-author of a scientific publication. During lectures, I regularly highlight the laboratory’s work, and in the interdisciplinary Bioinformatics minor, I explain how students can engage in research through their term and thesis projects. Starting in September, I will be teaching a specialised course in which we will analyse in detail the scientific methods and breakthrough ideas from the past two to three years.

— How actively are HSE University students and doctoral students involved in the laboratory?

— Their involvement is substantial. We currently have over 20 research assistants, including advanced bachelor’s, master’s, and doctoral students.

Date

12 September

Topics

Research & Expertise

Keywords

bioinformatics centres of excellence mirror laboratories

About

Faculty of Computer Science, International Laboratory of Bioinformatics

About persons

Maria Poptsova

Similar Comprehension, Different Reading: How Native Language Affects Reading in English as a Second Language

Researchers from the MECO international project, including experts from the HSE Centre for Language and Brain, have developed a tool for analysing data on English text reading by native speakers of more than 19 languages. In a large-scale experiment involving over 1,200 people, researchers recorded participants’ eye movements as they silently read the same English texts and then assessed their level of comprehension. The results showed that even when comprehension levels were the same, the reading process—such as gaze fixations, rereading, and word skipping—varied depending on the reader's native language and their English proficiency. The study has been published in Studies in Second Language Acquisition.

18 September

Sep

2025

Scientists Discover How Correlated Disorder Boosts Superconductivity

Superconductivity is a unique state of matter in which electric current flows without any energy loss. In materials with defects, it typically emerges at very low temperatures and develops in several stages. An international team of scientists, including physicists from HSE MIEM, has demonstrated that when defects within a material are arranged in a specific pattern rather than randomly, superconductivity can occur at a higher temperature and extend throughout the entire material. This discovery could help develop superconductors that operate without the need for extreme cooling. The study has been published in Physical Review B.

16 September

Sep

2025

Civic Identity Helps Russians Maintain Mental Health During Sanctions

Researchers at HSE University have found that identifying with one’s country can support psychological coping during difficult times, particularly when individuals reframe the situation or draw on spiritual and cultural values. Reframing in particular can help alleviate symptoms of depression. The study has been published in Journal of Community Psychology.

11 September

Aug

2025

'Today, Human Existence Without Mathematics Is Difficult; Tomorrow, It Will Be Simply Impossible'

Mathematicians around the world share a common language and continue to collaborate despite the challenges of recent years. The hub of mathematical networking has been shifting to China, where scientists from various countries meet at conferences and other academic events. Partnerships with leading Chinese universities offer promising opportunities to strengthen existing ties and forge new ones. In this interview with the HSE News Service, Valery Gritsenko, Head of the HSE International Laboratory for Mirror Symmetry and Automorphic Forms, discusses this and other topics, including what AI is and why the state should engage with mathematicians.

8 August

Jul

2025

'Samarkand—St Petersburg': HSE University–St Petersburg and Samarkand State University to Design Digital Future of Cultural Heritage

Samarkand State University named after Sharof Rashidov hosted the international research and educational seminar 'Branding Cultural Heritage: Digital Tools and Design Practices.' The event gathered researchers and students from Russia and Uzbekistan—participants of a large-scale project of the mirror laboratory 'Integrated Platform Solutions for the Preservation and Promotion of Cultural Heritage (Cases of Samarkand and St Petersburg)'.

30 July

Jul

2025

'We Are Now Nearing Practical Application of a Stimulus-Free Brain-Mapping System'

Neural interfaces developed by scientists at HSE University in collaboration with clinicians make it possible to communicate with the brain and decode its signals. The use of such interfaces opens up opportunities to stimulate brain activity, restore and normalise muscle control in patients who have suffered a stroke, heart attack, or other neurological disorders, and support the rehabilitation of individuals with traumatic brain injuries or limb loss. Alexey Ossadtchi, Director of the Centre for Bioelectric Interfaces at the HSE Institute for Cognitive Neuroscience, discusses the centre and its work.

24 July

Jul

2025

Centre for Language and Brain Conducts First Neurolinguistic Field Study of Reading in Yakut

In July, a team from the HSE Centre for Language and Brain, in collaboration with the Centre for the Study, Preservation, and Development of Native Languages of the Academy of Sciences of the Republic of Sakha (Yakutia), conducted the first-ever neurolinguistic expedition to the village of Churapcha to study reading in the Yakut language using electroencephalography (EEG). For the first time, EEG data from 43 adults and behavioural data from 40 children was collected during the two-week expedition.

21 July

Jul

2025

‘We Describe Unwritten Languages’

Chiara Naccarato, Research Fellow at the HSE Linguistic Convergence Laboratory, graduated from university in Italy and came to Russia to study the languages of Dahgestan and the speech patterns of bilingual speakers. She notes the friendly atmosphere of the laboratory and the hospitality of the people of Daghestan.

18 July

Jul

2025

HSE Neurolinguists Reveal What Makes Apps Effective for Aphasia Rehabilitation

Scientists at the HSE Centre for Language and Brain have identified key factors that increase the effectiveness of mobile and computer-based applications for aphasia rehabilitation. These key factors include automated feedback, a variety of tasks within the application, extended treatment duration, and ongoing interaction between the user and the clinician. The article has been published in NeuroRehabilitation.

17 July

Jul

2025

'Our Goal Is Not to Determine Which Version Is Correct but to Explore the Variability'

The International Linguistic Convergence Laboratory at the HSE Faculty of Humanities studies the processes of convergence among languages spoken in regions with mixed, multiethnic populations. Research conducted by linguists at HSE University contributes to understanding the history of language development and explores how languages are perceived and used in multilingual environments. George Moroz, head of the laboratory, shares more details in an interview with the HSE News Service.

16 July