Claire is a Group Leader at the University of Tuebingen, in the Cluster of Excellence Machine Learning for Science. She was awarded an Emmy Noether award under the AI Initiative call in 2022.
Her research is on sequential decision making. It mostly spans bandit problems, and theoretical Reinforcement Learning, but her research interests extend to Learning Theory and principled learning algorithms. While keeping in mind concrete problems, she focuses on theoretical approaches, aiming for provably optimal algorithms.
Previously, she was a Research Scientist at DeepMind in London UK since November 2018 in the Foundations team lead by Prof. Csaba Szepesvari. She did a post-doc in 2018 with Prof. Alexandra Carpentier at the University of Magdeburg in Germany while working part-time as an Applied Scientist at Amazon in Berlin. She received her PhD from Telecom ParisTech in October 2017, under the guidance of Prof. Olivier Cappé.
I am co-leading the Women in Learning Theory initiative, please reach out if you have questions or if you'd like to help.
contact: first.last @ uni-tuebingen . de
I left Deepmind in December 2022 and joined the University of Tuebingen, Germany as a Group Leader in January 2022. My group is supported by an Emmy Noether award for the project "Foundations for Lifelong Reinforcement Learning" within the AI Initiative.
2 papers accepted at EWRL:
Lifelong Best-Arm Identification with Misspecified Priors, by Nicolas Nguyen and C.V
Beyond Average Rewards in Markov Decision Processes, by Alexandre Marthe, Aurelien Garivier, C.V.
For the full list of my publications, see my Google Scholar:
Research projects and selected publications
Lifelong Reinforcement Learning
In standard RL models, the dynamics of the environment and the reward function to be optimized are assumed to be fixed (and unknown). What if they can vary, perhaps with some constraints, or within some boundaries? What is the agent must actually sequentially solve various tasks that share some similarities? What is the cost of meta-learning, how should exploration be dealt with when the agent knows from the get-go that there will be many different tasks? These are Meta-Reinforcement Learning questions and I use the Lifelong term to insist on the sequential nature of the process (the agent does not see all the tasks at once and cannot jump freely from one another). Initial ideas for these questions were sparked by reflections on Non-stationary bandits, where the environment can slowly vary, and Sparse bandits, where only a subset of the actions have positive value.
Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. MohammadJavad Azizi, et al. bib. Arxiv 2022.
Weighted Linear Bandits for Non-Stationary Environments. Yoan Russac, Claire Vernade, and Olivier Cappé. NeurIPS 2019.
Sparse stochastic bandits. Joon Kwon; Vianney Perchet; Claire Vernade. COLT 2017.
Twisting bandit models for more realistic applications
During my PhD and later when at Amazon and at Deepmind, I studied various aspect of this beautiful sequential learning problem that are Bandits. I started by exploring combinatorial action spaces (e.g. ranking problems), and the impact of delays and their censoring effect on binary observations.
Delays and censoring:
Contextual bandits under delayed feedback. Claire Vernade, Alexandra Carpentier, Giovanni Zappella, Beyza Ermis, and Michael Brueckner. ICML 2020
Non-Stationary Delayed bandits with Intermediate Observation, Claire vernade*, Andras Gyorgy*, Timothy Mann. ICML 2020
Stochastic Bandits with Arm-Dependent Delays Anne-Gaelle Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko. ICML 2020
Stochastic bandit models for delayed conversions Vernade, Claire; Cappé, Olivier; Perchet, Vianney; UAI 2017
Combinatorial action spaces:
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling Cindy Trinh, Emilie Kaufmann, Claire Vernade, and Richard Combes. ALT 2020
Stochastic rank-1 bandits Sumeet Katariya; Branislav Kveton; Csaba Szepesvari; Claire Vernade; Zheng Wen . AISTATS 2017.
Multiple-play bandits in the position-based model Paul Lagrée; Claire Vernade; Olivier Cappé . NIPS 2016
Foundations of Bandit Algorithms and Reinforcement Learning
Sometimes (often), even the fundamental models have remaining open questions and since 2018, I've spent some time thinking about a few questions. First, we explored off-policy evaluation estimators for Contextual Bandits such as the clever Self-Normalized Importance Weighted one, and according confidence bounds. We also closed an important gap in Linear Bandits by showing that Information Directed Sampling can have optimal regret bounds both in the minimax and problem-dependent sense. Recently, we studied Distributional Reinforcement Learning and its power and limitations.
Beyond Average Rewards in Markov Decision Processes. Alexandre Marthe, Aurelien Garivier, Claire Vernade. EWRL 2023
Game-theoretic algorithm design for PCA and beyond
I got to collaborate with Brian Mc Williams, Ian Gemp and others at Deepmind on an elegant new method for computing the PCA of very large matrices. The idea is simple but the optimisation ideas are subtle and the results are just mind-blowing.
PhD Thesis Bandit models for interactive applications (please reach out if you really want to read it)
July 2023: Lifelong Statistical Testing at the Machine Learning for Science conference in Tuebingen.
September 2021: Non-Stationary Delayed Bandits (recording), and RL tutorial at the First ELLIS Symposium in Tuebingen.
March 2020: Linear Bandits with Censoring Delays. Workshop on Optimisation and Machine Learning at CIRM (France)