Claire Vernade

contact: first.last @ uni-tuebingen . de

X / Twitter: @vernadec

LinkedIn: Claire Vernade

Bio

Claire is a Group Leader at the University of Tübingen, in the Cluster of Excellence Machine Learning for Science(*). She was awarded an Emmy Noether award under the AI Initiative call in 2022 for the project FoLiReL, and an ERC Starting Grant in 2024 for the project ConSequentIAL.

Her research is on sequential decision making. It mostly spans bandit problems, and theoretical Reinforcement Learning, but her research interests extend to Learning Theory and principled learning algorithms. Her work "Eigengame: PCA as a Nash Equilibrium" was recognized by an Outstanding Paper Award at ICLR 2021 (with I.Gemp, B.McWilliams and T.Graepel). 

Her goal is to contribute to the understanding and development of interactive and adaptive learning systems. 

Between November 2018 and December 2022, she was a Research Scientist at DeepMind in London UK in the Foundations team lead by Prof. Csaba Szepesvari. She did a post-doc in 2018 with Prof. Alexandra Carpentier at the University of Magdeburg in Germany while working part-time as an Applied Scientist at Amazon in Berlin. She received her PhD from Telecom ParisTech in October 2017, under the guidance of Prof. Olivier Cappé

--

(*)  check out the work of my great colleagues on the Cluster's blog

I am engaged in promoting diversity and inclusivity in the ML community (see the Women in ML page). For instance, I am co-leading the Women in Learning Theory initiative as well as the new group of Tübingen Women in Machine Learning, please reach out if you have questions or if you'd like to help. More generally, I support organizing efforts and activism in ML and in the Tech industry, and I was a union rep at Unite the union in London, UK while working as a Research Scientist at Deepmind London. 

News

Applications to my group

I will be looking for Postdocs starting next Spring (2025). If you have (or are about to graduate) a PhD on a topic related to Bandit algorithms, RL theory, optimal / adaptive control, please reach out to applications.vernadelab@gmail.com and add [ERC25POST] at the beginning of the subject title(*). Please send a short (~1 page) motivation letter in the email, as well as a CV. More info on the group can be found on the Projects page, and on the FAQ.  

I will be interviewing candidates for 1 PhD position starting anytime after June 2025. At all times, I am mainly considering applications through the IMPRS-IS and ELLIS doctoral programs. Due to the large amount of emails I receive, I will not be able to reply to individual applications. If you consider applying to my group, please read the FAQ

Master students:  I will release a short list of topics shortly on this website and will consider candidates, mainly from courses I taught (i.e. at MVA Paris 2024, or Uni. Tübingen RL class 2024). Please stay tuned for upcoming announcements early November. 

(*) Without this code, your email will be immediately archived and I will never see it. 

For the full list of my publications, see my Google Scholar:

Research projects and selected publications

See the Projects page for more details on the ERC and Emmy Noether projects.

Lifelong Reinforcement  Learning

In standard RL models, the dynamics of the environment and the reward function to be optimized are assumed to be fixed (and unknown). What if they can vary, perhaps with some constraints, or within some boundaries? What is the agent must actually sequentially solve various tasks that share some similarities? What is the cost of meta-learning, how should exploration be dealt with when the agent knows from the get-go that there will be many different tasks? These are Meta-Reinforcement Learning questions and I use the Lifelong term to insist on the sequential nature of the process (the agent does not see all the tasks at once and cannot jump freely from one another). Initial ideas for these questions were sparked by reflections on Non-stationary bandits, where the environment can slowly vary, and Sparse bandits, where only a subset of the actions have positive value.  


Bandit algorithms with complex or delayed feedback structure

During my PhD and later when at Amazon and at Deepmind, I studied various aspect of this beautiful sequential learning problem that are Bandits. I started by exploring combinatorial action spaces (e.g. ranking problems), and the impact of delays and their censoring effect on binary observations.  

Delays and censoring: 

Combinatorial action spaces:

Foundations of Bandit Algorithms  and Reinforcement Learning

Sometimes (often), even the fundamental models have remaining open questions and since 2018, I've spent some time thinking about a few questions.  First, we explored off-policy evaluation estimators for Contextual Bandits such as the clever Self-Normalized Importance Weighted one, and according confidence bounds. We also closed an important gap in Linear Bandits by showing that Information Directed Sampling can have optimal regret bounds both in the minimax and problem-dependent sense. Recently, we studied Distributional Reinforcement Learning and its power and limitations.


Game-theoretic algorithm design for PCA and beyond

I got to collaborate with Brian Mc Williams, Ian Gemp and others at Deepmind on an elegant new method for computing the PCA of very large matrices. The idea is simple but the optimisation ideas are subtle and the results are just mind-blowing. 


PhD Thesis   Bandit models for interactive applications (please reach out if you really want to read it)

Selected Talks

Upcoming: Keynote at EWRL 2024 and at ALT 2025.