Optimistic tools for RL: MVA 2024

This year I am giving 2 lectures in the RL class of the MVA master, led by Emmanuel Rachelson. These classes are heavily inspired by the material designed by Emilie Kaufmann for previous years of this same class.

November 4th: Bandit algorithms and the Optimism Principle: slides and practice notebook (MAB)
November 18th: Optimism in RL (Theory): 'blank' slides, annotated slides
Link to the overleaf with the assignment (see notebook therein): https://www.overleaf.com/read/gkymnpqfjptp#8f0e2d

Introduction to Reinforcement Learning

I gave a tutorial called "Introduction to RL" at the Mediterranean ML Summer School in September 2024.

This was meant be a (relaxed) 1h15 introduction to the essentials of MDPs (value iteration, policy iteration), temporal difference learning (SARSA, Q-Learning) and (quickly) policy gradient. The slides are below. For a more in-depth lecture, I would recommend David Silver's lecture series at UCL, and more specifically for an intro to Policy gradient methods, I always recommend the great talk by Niao He at RLSS in Barcelona in 2023 (many more great recorded talks). There are a few more references in the slides. Please reach out if you find mistakes or if you have questions.

Introduction to RL.pdf

IA318 -- Reinforcement Learning

Telecom ParisTech (2021 - 2022)

Same as last year !

Good notes slides available here

Linear Bandits 2022.pdf

IA318 -- Reinforcement Learning

Telecom ParisTech (2020 - 2021)

I am teaching three classes in the Reinforcement Learning module at Telecom Paristech (also in the Data AI masters with Polytechnique) directed by Prof. Thomas Bonald.

Introduction to Multi-Armed Bandits
Contextual Linear Bandits
Monte Carlo Tree Search and introduction to planning

This year I taught online as everyone these days... and I experimented with handwritten notes using GoodNotes on iPad. The "clean" versions of the notes are below.

Multi-armed Bandits.pdf

Linear Bandits 2021.pdf

mcts.pdf

Summer School HI! Paris 2021

I will give a tutorial on sequential decision making and present motivating problems in Reinforcement Learning and Marketing. I will introduce the multi-armed bandit model as a way to pose the statistical problem of exploration-versus-exploitation and show how Thompson Sampling provides and elegant and simple solution. The full notes will be posted here shortly after the class but you can already find a preview below.

Practical Session: In this Lab session, we will learn how to implement Thompson Sampling for linear bandits. The Colab is here :)

Resources:

A Tutorial on Thompson Sampling. D. Russo et al : https://arxiv.org/abs/1707.02038
Bandit Algorithms. T.Lattimore and C. Szepesvari. https://tor-lattimore.com/downloads/book/book.pdf (big pdf !). See also www.banditalgs.com .

Summer School_ Online Decision Making.pdf