Education and Research 

My most recent CV can be found here

My group, Foundations of Reinforcement Learning, focuses on designing algorithms and studying the theoretical foundations of sequential decision making, whose currently most advanced form is found in RL. 

During my PhD, I focused on modelling more realistic user behaviors on recommendation platforms: including loss of attention as the user scrolls down a long list (Position-Based Model),  modeling interaction of several independent factors (rank-one bandits), or allowing delayed responses to update the learning model (censoring effect of delays in online learning). 

In my (short) postdoc at Amazon Berlin, I continued studying the impact of delays in recommendation systems, especially in contextual settings. 

Since I am at DeepMind, I have collaborated with Timothy Mann and Andras Gyorgy on a new bandit model which includes side-observations and disentangles delays and potential non-stationarity in the reward function. This model is closer to more general reinforcement learning than bandits as it is based on multi-state MDP, but with only one-step horizon (see ICML 2020 paper). I also studied off-policy evaluation in contextual bandit, where the learner must decide on the next policy based on logged bandit data. With Ilja Kuzborkij, Andras Gyorgy and Csaba Szepesvari, we proposed a new finite-time confidence bound on various off-policy evaluation techniques and we show that using a high-probability lower-bound to make decisions leads to significantly improved performance of a test set. 

I also enjoy discovering new problems and learning new techniques and ideas. I have been collaborating with Ian Gemp, Brian McWilliams and Thore Graepel on a new game-theoretic approach to PCA called EigenGame (Oral presentation at ICLR).