Education and Research

Since 2018, I am a Research Scientist at DeepMind in London, UK, in Csaba Szepesvari's team called Foundations.

I work on online learning with partial feedback, also known as Bandit algorithms. Perhaps the best reference on the topic is the already classic Bandit Algorithms book by my amazing colleagues Tor Lattimore and Csaba Szepesvari.

During my PhD, I focused on modelling more realistic user behaviors on recommendation platforms: including loss of attention as the user scrolls down a long list (Position-Based Model), modeling interaction of several independent factors (rank-one bandits), or allowing delayed responses to update the learning model (censoring effect of delays in online learning).

In my (short) postdoc at Amazon Berlin, I continued studying the impact of delays in recommendation systems, especially in contextual settings.

Since I am at DeepMind, I have collaborated with Timothy Mann and Andras Gyorgy on a new bandit model which includes side-observations and disentangles delays and potential non-stationarity in the reward function. This model is closer to more general reinforcement learning than bandits as it is based on multi-state MDP, but with only one-step horizon (see ICML 2020 paper). I also studied off-policy evaluation in contextual bandit, where the learner must decide on the next policy based on logged bandit data. With Ilja Kuzborkij, Andras Gyorgy and Csaba Szepesvari, we proposed a new finite-time confidence bound on various off-policy evaluation techniques and we show that using a high-probability lower-bound to make decisions leads to significantly improved performance of a test set.

I also enjoy discovering new problems and learning new techniques and ideas. I have been collaborating with Ian Gemp, Brian McWilliams and Thore Graepel on a new game-theoretic approach to PCA called EigenGame (Oral presentation at ICLR). I am also learning some elements of deep learning theory with Julia Hoerrmann, and studying some applications to the problem of detecting adversarial examples with Yoann Lemesle, a Bachelor student from ENS Rennes, France. Finally, I recently started learning about meta-learning and meta-reinforcement learning, and I am interested in potential application of these methods in some continual bandit scenarios.