Santeri Heiskanen

I am a PhD student at the Robot learning lab at Aalto University, under the supervision of Prof. Joni Pajarinen. I am also part of the AI-DOC doctoral education pilot, hosted by FCAI. My current research focuses on developing reinforcement learning and generative modelling techniques for complex, combinatorial search spaces, which one may face in many practical problems, such as neural architecture search, molecule generation or scheduling. More broadly, my interests lie at the intersection of sequential decision making, multi-objective optimisation, and their application to practical, real-world problems.

Previously, I received an M.Sc. degree in computer science from Tampere University in 2024. My M.Sc. thesis studied the effect of soft-information sharing between policies in multi-objective reinforcement learning, and it was supervised by Prof. Ville Kyrki from the Intelligent Robotics group at Aalto University. During my studies, I was also fortunate to complete 2 internships at Huawei Finland’s research center, and work in a startup specialising in the green energy transition.

News

May 10, 2026	“Momba: Network Modernization Improves Multi-Objective Reinforcement Learning” was accepted to RLC2026.
Feb 06, 2026	“Pareto-Conditioned Diffusion Models for Offline Multi-Objective Optimization” was accepted to ICLR2026 as an oral presentation. See you in Brazil
Nov 12, 2024	Gave a guest lecture about multi-objective reinforcement learning during course “ELEC-E8125 reinforcement learning” at Aalto University. You can checkout the slides from here
Nov 01, 2024	Got accepted into AI-DOC doctoral education pilot program.
Oct 10, 2024	I graduated with M.Sc. (tech) from Tampere University. My thesis, “Generalizing Pareto optimal policies in multi-objective reinforcement learning: An empirical study of hypernetworks” studied the use of hypernetworks in multi-objective reinforcement learning. You can checkout the slides for the study from here.

Selected publications

Pareto-Conditioned Diffusion Models for Offline Multi-Objective Optimization

Jatan Shrestha^*, Santeri Heiskanen^*, Kari Hepola, and 3 more authors

In The Fourteenth International Conference on Learning Representations, 2026

Oral Abs arXiv Bib HTML Code Website

This work was accepted as an oral presentation

Multi-objective optimization (MOO) arises in many real-world applications where trade-offs between competing objectives must be carefully balanced. In the offline setting, where only a static dataset is available, the main challenge is generalizing beyond observed data. We introduce Pareto-Conditioned Diffusion (PCD), a novel framework that formulates offline MOO as a conditional sampling problem. By conditioning directly on desired trade-offs, PCD avoids the need for explicit surrogate models. To effectively explore the Pareto front, PCD employs a reweighting strategy that focuses on high-performing samples and a reference-direction mechanism to guide sampling towards novel, promising regions beyond the training data. Experiments on standard offline MOO benchmarks show that PCD achieves highly competitive performance and, importantly, demonstrates greater consistency across diverse tasks than existing offline MOO approaches
@inproceedings{shrestha2026paretoconditioned, title = {Pareto-Conditioned Diffusion Models for Offline Multi-Objective Optimization}, author = {Shrestha, Jatan and Heiskanen, Santeri and Hepola, Kari and Rissanen, Severi and Jääskeläinen, Pekka and Pajarinen, Joni}, booktitle = {The Fourteenth International Conference on Learning Representations}, year = {2026}, }
Momba: Network Modernization Improves Multi-Objective Reinforcement Learning

Adam Štafa, Santeri Heiskanen, Petr Novotný, and 1 more author

Reinforcement Learning Journal, 2026

Abs Bib HTML Code

Recent advances in deep reinforcement learning (RL) have shown that improving neural network architectures can yield substantial gains in sample efficiency and asymptotic performance without altering the underlying algorithms. In contrast, work on multi-objective reinforcement learning (MORL), which aims to discover a set of policies that balance trade-offs among conflicting objectives, has predominantly focused on algorithmic innovations, leaving the area of architectures underexplored. While the optimal policies and value functions can differ significantly depending on the trade-offs, MORL algorithms commonly represent them with simple feedforward networks conditioned on the trade-off. This raises the question of whether the performance of the algorithms could be improved with more expressive function approximators. In this paper, we integrate recent advances in neural network design: (i) observation and feature normalization, (ii) weight normalization, and (iii) modeling of distributional returns with an entropy-regularized MORL algorithm. The empirical results across standard continuous control benchmarks demonstrate that these changes substantially improve the quality of the produced solution sets without requiring major changes to the underlying algorithm.
@article{stafa2026momba, title = {Momba: Network Modernization Improves Multi-Objective Reinforcement Learning}, author = {Štafa, Adam and Heiskanen, Santeri and Novotný, Petr and Pajarinen, Joni}, journal = {Reinforcement Learning Journal}, volume = {7}, year = {2026}, }