Claire Vernade
Bio
Claire is a Group Leader at the University of Tübingen, in the Cluster of Excellence Machine Learning for Science(*). She was awarded an Emmy Noether award under the AI Initiative call in 2022 for the project FoLiReL, and an ERC Starting Grant in 2024 for the project ConSequentIAL.
Her research is on sequential decision making. It mostly spans bandit problems, and theoretical Reinforcement Learning, but her research interests extend to Learning Theory and principled learning algorithms. Her work "Eigengame: PCA as a Nash Equilibrium" was recognized by an Outstanding Paper Award at ICLR 2021 (with I.Gemp, B.McWilliams and T.Graepel).
Her goal is to contribute to the understanding and development of interactive and adaptive learning systems.
Between November 2018 and December 2022, she was a Research Scientist at DeepMind in London UK in the Foundations team lead by Prof. Csaba Szepesvari. She did a post-doc in 2018 with Prof. Alexandra Carpentier at the University of Magdeburg in Germany while working part-time as an Applied Scientist at Amazon in Berlin. She received her PhD from Telecom ParisTech in October 2017, under the guidance of Prof. Olivier Cappé.
--
(*) check out the work of my great colleagues on the Cluster's blog
I am engaged in promoting diversity and inclusivity in the ML community (see the Women in ML page). For instance, I am co-leading the Women in Learning Theory initiative as well as the new group of Tübingen Women in Machine Learning, please reach out if you have questions or if you'd like to help. More generally, I support organizing efforts and activism in ML and in the Tech industry, and I was a union rep at Unite the union in London, UK while working as a Research Scientist at Deepmind London.
News
I will be giving a Keynote at EWRL in Toulouse on October 28-30 2024, and at ALT 2025 in Milan on February 24-28 2025.
I am co-chairing the Tutorials for ICML 2025 in Vancouver with Claire Monteleoni (TBA).
Our workshop on the Foundations of Reinforcement Learning and Control (FoRLaC) was accepted at ICML 2024! I am really excited to have such a great line-up of speakers. The workshop took place in Vienna on July 27th and all talks were recorded and available on the ICML page (needs to log in with icml.cc account).
One paper accepted at NeurIPS 2023! Beyond Average Return in Markov Decision Processes, with Alexandre Marthe and Aurélien Garivier (ENS Lyon). We explore what statistical functionals can be learnt using dynamic programming on MDPs. Our main result, perhaps surprisingly, reduces the set of such functionals to only linear and exponential utilities.
I was a co-Program chair of Algorithmic Learning Theory (ALT) 2024 with Daniel Hsu.
I will be looking for Postdocs starting next Spring (2025). If you have (or are about to graduate) a PhD on a topic related to Bandit algorithms, RL theory, optimal / adaptive control, please reach out to applications.vernadelab@gmail.com and add [ERC25POST] at the beginning of the subject title(*). Please send a short (~1 page) motivation letter in the email, as well as a CV. More info on the group can be found on the Projects page, and on the FAQ.
I will be interviewing candidates for 1 PhD position starting anytime after June 2025. At all times, I am mainly considering applications through the IMPRS-IS and ELLIS doctoral programs. Due to the large amount of emails I receive, I will not be able to reply to individual applications. If you consider applying to my group, please read the FAQ
Master students: I will release a short list of topics shortly on this website and will consider candidates, mainly from courses I taught (i.e. at MVA Paris 2024, or Uni. Tübingen RL class 2024). Please stay tuned for upcoming announcements early November.
(*) Without this code, your email will be immediately archived and I will never see it.
For the full list of my publications, see my Google Scholar:
Research projects and selected publications
See the Projects page for more details on the ERC and Emmy Noether projects.
Lifelong Reinforcement Learning
In standard RL models, the dynamics of the environment and the reward function to be optimized are assumed to be fixed (and unknown). What if they can vary, perhaps with some constraints, or within some boundaries? What is the agent must actually sequentially solve various tasks that share some similarities? What is the cost of meta-learning, how should exploration be dealt with when the agent knows from the get-go that there will be many different tasks? These are Meta-Reinforcement Learning questions and I use the Lifelong term to insist on the sequential nature of the process (the agent does not see all the tasks at once and cannot jump freely from one another). Initial ideas for these questions were sparked by reflections on Non-stationary bandits, where the environment can slowly vary, and Sparse bandits, where only a subset of the actions have positive value.
POMRL: No-Regret Learning-to-Plan with Increasing Horizons. Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy, TMLR 2023. bib
Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. MohammadJavad Azizi, et al. RLC 2024.
Weighted Linear Bandits for Non-Stationary Environments. Yoan Russac, Claire Vernade, and Olivier Cappé. NeurIPS 2019.
Sparse stochastic bandits. Joon Kwon, Vianney Perchet, Claire Vernade. COLT 2017.
Bandit algorithms with complex or delayed feedback structure
During my PhD and later when at Amazon and at Deepmind, I studied various aspect of this beautiful sequential learning problem that are Bandits. I started by exploring combinatorial action spaces (e.g. ranking problems), and the impact of delays and their censoring effect on binary observations.
Delays and censoring:
Contextual bandits under delayed feedback. Claire Vernade, Alexandra Carpentier, Giovanni Zappella, Beyza Ermis, and Michael Brueckner. ICML 2020
Non-Stationary Delayed bandits with Intermediate Observation, Claire vernade*, Andras Gyorgy*, Timothy Mann. ICML 2020
Stochastic Bandits with Arm-Dependent Delays Anne-Gaelle Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko. ICML 2020
Stochastic bandit models for delayed conversions Vernade, Claire; Cappé, Olivier; Perchet, Vianney; UAI 2017
Combinatorial action spaces:
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling Cindy Trinh, Emilie Kaufmann, Claire Vernade, and Richard Combes. ALT 2020
Stochastic rank-1 bandits Sumeet Katariya; Branislav Kveton; Csaba Szepesvari; Claire Vernade; Zheng Wen . AISTATS 2017.
Multiple-play bandits in the position-based model Paul Lagrée; Claire Vernade; Olivier Cappé . NIPS 2016
Foundations of Bandit Algorithms and Reinforcement Learning
Sometimes (often), even the fundamental models have remaining open questions and since 2018, I've spent some time thinking about a few questions. First, we explored off-policy evaluation estimators for Contextual Bandits such as the clever Self-Normalized Importance Weighted one, and according confidence bounds. We also closed an important gap in Linear Bandits by showing that Information Directed Sampling can have optimal regret bounds both in the minimax and problem-dependent sense. Recently, we studied Distributional Reinforcement Learning and its power and limitations.
Asymptotically Optimal Information-Directed Sampling. Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári.. COLT 2021.
Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting .Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári. AISTATS 2021
Beyond Average Rewards in Markov Decision Processes. Alexandre Marthe, Aurelien Garivier, Claire Vernade. EWRL 2023
Game-theoretic algorithm design for PCA and beyond
I got to collaborate with Brian Mc Williams, Ian Gemp and others at Deepmind on an elegant new method for computing the PCA of very large matrices. The idea is simple but the optimisation ideas are subtle and the results are just mind-blowing.
EigenGame: PCA as a Nash Equilibrium I. Gemp, B. McWilliams, C. Vernade, T. Graepel. (Outstanding Paper Awards (and Oral Presentation) at ICLR 2021
EigenGame Unloaded: When playing games is better than optimizing I. Gemp, B. McWilliams, C. Vernade, T. Graepel. ICLR 2022.
PhD Thesis Bandit models for interactive applications (please reach out if you really want to read it)
Selected Talks
Upcoming: Keynote at EWRL 2024 and at ALT 2025.
May 2024: Reinforcement Learning Theory towards Robust Discovery in Science. Colloquium of the Cluster of Excellence ML for Science.
July 2023: Lifelong Statistical Testing at the Machine Learning for Science conference in Tuebingen.
September 2021: Non-Stationary Delayed Bandits (recording), and RL tutorial at the First ELLIS Symposium in Tuebingen.
March 2020: Linear Bandits with Censoring Delays. Workshop on Optimisation and Machine Learning at CIRM (France)