Valentin Thomas

I recently graduated from my PhD at Mila, where I worked on Reinforcement Learning and Deep Learning. Warning this site is still under construction, my linked CV may be more up to date.

Email  /  CV  /  Bio  /  Google Scholar  /  Twitter  /  Github

profile photo
Research

I'm interested in reinforcement learning, deep learning and optimization. I have worked on unsupervised RL, planning, generalization in deep learning and optimization aspects of reinforcement learning. Representative papers are highlighted. Stars * indicate first authorship.

rmt_td On the role of overparameterization in off-policy Temporal Difference learning with linear function approximation
Valentin Thomas*
NeurIPS 2022

We study the role of overparameterization in Temporal Difference (TD) learning and how it affects optimization. For this, we analyze the spectrum of the Temporal Difference operator when using random features and with some assumptions of the Markov transition kernel.

baselines_mei The Role of Baselines in Policy Optimization
Jincheng Mei*, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans
NeurIPS 2022

Using value function baselines in on-policy stochastic natural policy gradients help achieve convergence toward globally optimal policy by reducing update aggressiveness rather than variance.

target Bridging the Gap Between Target Networks and Functional Regularization
Alexandre Piché*, Valentin Thomas*, Joseph Marino, Rafael Pardinas, Gian Maria Marconi, Christopher Pal, Mohammad Emtiyaz Khan
TMLR 2023
NeurIPS 2021 DeepRL workshop

We analyze the implicit regularization performed by using Target Networks and show that, surprisingly, it can unstabilize TD. We propose a theoretically grounded alternative method, Functional Regularization, which alleviates these theoretical issues and performs well empirically.

baselines Beyond variance reduction: Understanding the true impact of baselines on policy optimization
Valentin Thomas*, Wesley Chung*, Marlos C. Machado, Nicolas Le Roux
ICML 2021
blog post/ICML talk

We show empirically and theoretically that despite common wisdom, baselines in policy gradient optimization have an effect beyond variance reduction and can impact convergence.

hfc On the interplay between noise and curvature and its effect on optimization and generalization
Valentin Thomas*, Fabian Pedregosa, Bart van Merriënboer, Pierre-Antoine Mangazol, Yoshua Bengio, Nicolas Le Roux
AISTATS 2020
Oral talk at the 2020 Workshop on theory of deep learning at the Institute for Advanced Studies, Princeton
AISTATS talk

We show how the interplay between the local curvature of the loss (the hessian) and the local gradient noise (the uncentered gradient covariance) can impact optimization and generalization in neural networks.

smcp Planning with Latent Simulated Trajectories
Alexandre Piché, Valentin Thomas, Cyril Ibrahim, Yoshua Bengio, Julien Cornebise and Chris Pal
ICLR 2019 Workshop on Structure and Priors in Reinforcement Learning

Extension of our work "Probabilistic Planning with Sequential Monte Carlo methods" by treating the trajectory as a latent variable and using an EM algorithm.

smcp Probabilistic Planning with Sequential Monte Carlo methods
Valentin Thomas*, Alexandre Piché*, Cyril Ibrahim, Yoshua Bengio and Chris Pal
ICLR 2019
Contributed talk at NeurIPS 2018 workshop Infer to Control

Leveraging control as inference and Sequential Monte Carlo methods, we propose a probabilistic planning algorithm.

smcp Disentangling the independently controllable factors of variation by interacting with the world
Valentin Thomas*, Emmanuel Bengio*, William Fedus*, Jules Pondard, Philippe Beaudoin, Hugo Larochelle, Joelle Pineau, Doina Precup and Yoshua Bengio
Oral at NeurIPS 2017 workshop on Learning Disentangled Representations: from Perception to Control

We draw a connection between mutual information and the intrinsic reward function (through the Donsker-Varadhan representation of the Kullback-Leibler divergence) used for learning jointly options/factors and latent representations in Independently Controllable Factors.

icf_grid Independently Controllable Factors
Valentin Thomas*, Jules Pondard*, Emmanuel Bengio*, Marc Sarfati, Philippe Beaudoin, Marie-Jean Meurs, Joelle Pineau, Doina Precup, Yoshua Bengio
Presented at the Montreal AI Symposium 2017

This work is a finalized version of Independently Controllable Features where the policies and factors are now embedded in a contiuous space. We demonstrate how one can use the features learnt.

icf_tabular Independently Controllable Features
Emmanuel Bengio*, Valentin Thomas, Joelle Pineau, Doina Precup, Yoshua Bengio
RLDM 2017

We propose a way to learn jointly a set of discrete policies each affecting a component of the latent state representation for unsupervised reinforcement learning. We hypothesize that this process discovers controllable factors of variation in the world as well as how to control them.

decoupling Decoupling Backpropagation using Constrained Optimization Methods
Valentin Thomas*, Akhilesh Gotmare*, Johanni Brea and Martin Jaggi
In ICML 2018 workshop on Efficent Credit Assignement

We propose BlockProp which lets one train deep neural networks in model parallel fashion, where parts of the model may reside on different devices (GPUs).

qbits Preserving the entanglement of two qubits with feedback control
Valentin Thomas*, Pierre Rouchon
Report for a research semester in 2014 (in french)

This research project was about designing a feeback control loop using an electromagnetic field to preserve the entanglement of two qbits. This is necessary as because of quantum decoherence the entanglement tends to vanish which is a major issue in developping quantum computer hardware. We proposed a simple Lyapunov-based feedback control loop.

Experience

Here you can find the internships I have done during my MSc at Mines Paris and at Ecole Normale Superiéure Paris-Saclay then during my PhD at University of Montreal.

rmt_td

Website template from Jon Barron (source code).