We use cookies in order to improve the quality and usability of the HSE website. More information about the use of cookies is available here, and the regulations on processing personal data can be found here. By continuing to use the site, you hereby confirm that you have been informed of the use of cookies by the HSE website and agree with our rules for processing personal data. You may disable cookies in your browser settings.

  • A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Scientists Present New Solution to Imbalanced Learning Problem

Scientists Present New Solution to Imbalanced Learning Problem

© iStock

Specialists at the HSE Faculty of Computer Science and Sber AI Lab have developed a geometric oversampling technique known as Simplicial SMOTE. Tests on various datasets have shown that it significantly improves classification performance. This technique is particularly valuable in scenarios where rare cases are crucial, such as fraud detection or the diagnosis of rare diseases. The study's results are available on ArXiv.org, an open-access archive, and will be presented at the International Conference on Knowledge Discovery and Data Mining (KDD) in summer 2025 in Toronto, Canada.

The problem of imbalanced learning is becoming increasingly relevant across various fields, including banking and medicine. Conventional methods, such as random oversampling, often generate low-quality samples or fail to accurately model rare class data.

Simplicial SMOTE (Synthetic Minority Oversampling Technique), a novel solution proposed by scientists from HSE University and Sber AI Lab, addresses these issues by enabling more accurate modelling of complex topological data structures and improving classifier performance on imbalanced datasets.

It generates new examples of a rare class by leveraging information from multiple closed instances ('simplex'), rather than just two close points, as in the original SMOTE and its well-known modifications. This facilitates a better understanding of the data and advances performance. The technique improves training on imbalanced data, where one class (eg, normal transactions) has many examples, while another class (eg, fraud) has few.

Researchers have experimentally shown on a large number of test datasets that the proposed approach achieves significantly better performance metrics, such as the F1 Score and Matthews Correlation Coefficient, for both the basic SMOTE and its modifications. In particular, an improvement was observed in gradient boosting, a classifier commonly used in practice.

'Our technique is particularly effective for tasks involving imbalanced data, where the rare class holds greater significance. Banks can use Simplicial SMOTE to detect fraud more effectively, and medical centres can apply it to diagnose rare diseases,' says Andrey Savchenko, co-author of the article and Leading Research Fellow at the Laboratories for Theoretical Modelling in AI of the HSE AI and Digital Science Institute.

The new technique can be integrated into existing oversampling algorithms (such as Borderline-SMOTE, Safe-level-SMOTE, and ADASYN), enabling better accuracy without significantly increasing computational complexity. According to the researchers, the developed approach could contribute to the creation of more accurate and reliable machine learning models, thereby improving the quality of analytics.

The study was conducted with support from the HSE Basic Research Programme.

See also:

When Thoughts Become Movement: How Brain–Computer Interfaces Are Transforming Medicine and Daily Life

At the dawn of the 21st century, humans are increasingly becoming not just observers, but active participants in the technological revolution. Among the breakthroughs with the potential to change the lives of millions, brain–computer interfaces (BCIs)—systems that connect the brain to external devices—hold a special place. These technologies were the focal point of the spring International School ‘A New Generation of Neurointerfaces,’ which took place at HSE University.

New Clustering Method Simplifies Analysis of Large Data Sets

Researchers from HSE University and the Institute of Control Sciences of the Russian Academy of Sciences have proposed a new method of data analysis: tunnel clustering. It allows for the rapid identification of groups of similar objects and requires fewer computational resources than traditional methods. Depending on the data configuration, the algorithm can operate dozens of times faster than its counterparts. Thestudy was published in the journal Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia.

Researchers from HSE University in Perm Teach AI to Analyse Figure Skating

Researchers from HSE University in Perm have developed NeuroSkate, a neural network that identifies the movements of skaters on video and determines the correctness of the elements performed. The algorithm has already demonstrated success with the basic elements, and further development of the model will improve its accuracy in identifying complex jumps. 

Script Differences Hinder Language Switching in Bilinguals

Researchers at the HSE Centre for Language and Brain used eye-tracking to examine how bilinguals switch between languages in response to context shifts. Script differences were found to slow down this process. When letters appear unfamiliar—such as the Latin alphabet in a Russian-language text—the brain does not immediately switch to the other language, even when the person is aware they are in a bilingual setting. The article has been published in Bilingualism: Language and Cognition.

HSE Experts Highlight Factors Influencing EV Market Growth

According to estimates from HSE University, Moscow leads in the number of charging stations for electric vehicles in Russia, while Nizhny Novgorod ranks first in terms of charging station coverage, with 11.23 electric vehicles per charging station, compared to 14.41 in Moscow. The lack of charging infrastructure is one of the key factors limiting the growth of the electric vehicle market. This is stated in the study titled ‘Socio-Economic Aspects of Introducing Electric Vehicles in Commercial Transportation’ conducted by experts from the Institute of Transport Economics and Transport Policy Studies at HSE University.

Machine Learning Links Two New Genes to Ischemic Stroke

A team of scientists from HSE University and the Kurchatov Institute used machine learning methods to investigate genetic predisposition to stroke. Their analysis of the genomes of over 5,000 people identified 131 genes linked to the risk of ischemic stroke. For two of these genes, the association was found for the first time. The paper has been published in PeerJ Computer Science.

First Digital Adult Reading Test Available on RuStore

HSE University's Centre for Language and Brain has developed the first standardised tool for assessing Russian reading skills in adults—the LexiMetr-A test. The test is now available digitally on the RuStore platform. This application allows for a quick and effective diagnosis of reading disorders, including dyslexia, in people aged 18 and older.

Low-Carbon Exports Reduce CO2 Emissions

Researchers at the HSE Faculty of Economic Sciences and the Federal Research Centre of Coal and Coal Chemistry have found that exporting low-carbon goods contributes to a better environment in Russian regions and helps them reduce greenhouse gas emissions. The study results have been published in R-Economy.

Russian Scientists Assess Dangers of Internal Waves During Underwater Volcanic Eruptions

Mathematicians at HSE University in Nizhny Novgorod and the A.V. Gaponov-Grekhov Institute of Applied Physics of the Russian Academy of Sciences studied internal waves generated in the ocean after the explosive eruption of an underwater volcano. The researchers calculated how the waves vary depending on ocean depth and the radius of the explosion source. It turns out that the strongest wave in the first group does not arrive immediately, but after a significant delay. This data can help predict the consequences of eruptions and enable advance preparation for potential threats. The article has been published in Natural Hazards. The research was carried out with support from the Russian Science Foundation (link in Russian).

Centre for Language and Brain Begins Cooperation with Academy of Sciences of Sakha Republic

HSE University's Centre for Language and Brain and the Academy of Sciences of the Republic of Sakha (Yakutia) have signed a partnership agreement, opening up new opportunities for research on the region's understudied languages and bilingualism. Thanks to modern methods, such as eye tracking and neuroimaging, scientists will be able to answer questions about how bilingualism works at the brain level.