Optimistic tools for RL: MVA 2024
This year I am giving 2 lectures in the RL class of the MVA master, led by Emmanuel Rachelson. These classes are heavily inspired by the material designed by Emilie Kaufmann for previous years of this same class.
November 4th: Bandit algorithms and the Optimism Principle: slides and practice notebook (MAB)
November 18th: Optimism in RL (Theory): 'blank' slides, annotated slides
Link to the overleaf with the assignment (see notebook therein): https://www.overleaf.com/read/gkymnpqfjptp#8f0e2d
Introduction to Reinforcement Learning
I gave a tutorial called "Introduction to RL" at the Mediterranean ML Summer School in September 2024.
This was meant be a (relaxed) 1h15 introduction to the essentials of MDPs (value iteration, policy iteration), temporal difference learning (SARSA, Q-Learning) and (quickly) policy gradient. The slides are below. For a more in-depth lecture, I would recommend David Silver's lecture series at UCL, and more specifically for an intro to Policy gradient methods, I always recommend the great talk by Niao He at RLSS in Barcelona in 2023 (many more great recorded talks). There are a few more references in the slides. Please reach out if you find mistakes or if you have questions.
IA318 -- Reinforcement Learning
Telecom ParisTech (2021 - 2022)
Same as last year !
Good notes slides available here
IA318 -- Reinforcement Learning
Telecom ParisTech (2020 - 2021)
I am teaching three classes in the Reinforcement Learning module at Telecom Paristech (also in the Data AI masters with Polytechnique) directed by Prof. Thomas Bonald.
Introduction to Multi-Armed Bandits
Contextual Linear Bandits
Monte Carlo Tree Search and introduction to planning
This year I taught online as everyone these days... and I experimented with handwritten notes using GoodNotes on iPad. The "clean" versions of the notes are below.
Summer School HI! Paris 2021
I will give a tutorial on sequential decision making and present motivating problems in Reinforcement Learning and Marketing. I will introduce the multi-armed bandit model as a way to pose the statistical problem of exploration-versus-exploitation and show how Thompson Sampling provides and elegant and simple solution. The full notes will be posted here shortly after the class but you can already find a preview below.
Practical Session: In this Lab session, we will learn how to implement Thompson Sampling for linear bandits. The Colab is here :)
Resources:
A Tutorial on Thompson Sampling. D. Russo et al : https://arxiv.org/abs/1707.02038
Bandit Algorithms. T.Lattimore and C. Szepesvari. https://tor-lattimore.com/downloads/book/book.pdf (big pdf !). See also www.banditalgs.com .