Workshop on Bandits and Statistical Tests

November 23-24th 2023 in Neues Palais, Potsdam, Germany

We are pleased to invite you to our upcoming Workshop on Statistical Tests in Machine Learning, with a specific emphasis on Online Learning and Bandit Algorithms. The workshop aims to provide participants with a comprehensive understanding of statistical testing methodologies in the context of machine learning, particularly in the dynamic and ever-changing landscape of online and bandit settings.

This workshop is intended for researchers, practitioners, and students interested in the intersection of machine learning, statistics, and algorithm design

This workshop is sponsored by the DFG and the Cluster of Excellence "Machine Learning: New perspectives for science". 

Registrations are closed now. 


Keynote speakers: 

[The full schedule will be finalized soon.]

November 23rd

- 8:30: WELCOME  

- 9 :00-10:15 : Tutorial 1: Nicolas Verzelen on Minimax Tests

- 10:15-10:45 : BREAK 

- 10:45-12:00: Tutorial  2: Aurelien Garivier on Bandit algorithms and testing

12:00-14:00: LUNCHBREAK (provided)

- 13:00-14:45: Peter Grünwald

- 14:45-15:30: Cristina Butucea

 - 15:30-16:00 : BREAK 

- 16:00-16:45: David Preinerstorfer

- 16:45-17:30:  Emilie Kaufmann

- 17:30-18:30: POSTER SESSION (light aperitivo)

Dinner on your own

November 24th

- 9:00-9 :15 :  welcome coffee

- 9 :15-10:00: Thomas Berrett

- 10:00 - 10:45 : Remy Degenne

- 10:45-11:15 : BREAK

- 11:15-12:00:  Ariane Marandon

- 12:00-12:45:  Rui Castro

12:45--: LUNCH (provided)

== THURSDAY 23rd ==

Tutorial 1: Minimax tests 

Tutorial 2: Notions of complexity for some sequential tests

Abstract: This tutorial will gently present some  information theoretic ideas that are useful for identifying the sample complexity of sequential procedures, from testing to bandit models.


Peter Grünwald: Beyond Neyman-Pearson: testing without setting alpha in advance


A standard practice in statistical hypothesis testing is to mention the p-value alongside the accept/reject decision. We show a major advantage of mentioning an e-value instead. With p-values, we simply cannot use an extreme observation (e.g. p << alpha) for getting better frequentist decisions. With e-values we can, since they provide  Type-I risk control in a generalized Neyman-Pearson setting with the decision task (a general loss function) determined post-hoc, after observation of the data --- thereby providing a handle on the age-old "roving alpha" problem in statistics: we obtain robust "Type-I risk bounds" which hold independently of any preset alpha or loss function. We argue that in practice, one is really more often interested in this setting rather than in the 'classical' Neyman-Pearson one. The reasoning can be extended  to confidence intervals.  When Type-II risks are taken into consideration, the only admissible decision rules in the post-hoc setting turn out to be e-value-based. Similarly, if the loss incurred when specifying a faulty confidence interval is not fixed in advance, standard confidence intervals and distributions may fail whereas e-confidence sets and e-posteriors still provide valid risk guarantees. We end by giving a brief overview of the statistical testing problems for which competitive e-values have been designed.


This work is based on:

Peter Grünwald. Beyond Neyman-Pearson. Accepted for publication in PNAS (Proceedings National Academy of Sciences of the USA), 2023.

Peter Grünwald. The E-Posterior. Proc. Phil. Trans. Soc. London Series A, 2023. 


Cristina Butucea: Prediction and testing of mixtures of translated features

We consider a model where a signal (discrete or continuous) is observed with an additive Gaussian noise process. The signal is issued from a linear combination of a finite but increasing number of translated features - it is a non-linear extension of the classical high-dimensional regression model. The features are continuously parameterized by their location and depend on some scale parameter. 

First, we give prediction bounds for off-the-grid estimators by taking into account that the scale parameter may vary. The prediction bounds are analogous to those for the linear regression model, under sufficient conditions. Next, we propose a goodness-of-fit test for the model and give non-asymptotic upper bounds of the testing risk and of the minimax separation rate between two distinguishable signals. 

This is joint work with J.-F. Delmas, A. Dutfoy, C. Hardy.


David Preinerstorfer: Bandits targeting poverty, inequality, and welfare.


I will talk about some of our recent contributions to bandit problems with "distributional targets", i.e., where, instead of targeting the (conditional) expected outcome, the decision maker targets a general functional of the outcome cdfs. A particular focus will be put on targets geared towards econometric problems, allowing the decision maker to incorporate poverty, inequality, or welfare considerations. The talk is based on joint work with Anders B. Kock and Bezirgen Veliyev.


Emilie Kaufmann: A Tale of Top Two Algorithms

Thompson Sampling, or posterior sampling, is a popular Bayesian randomized strategy for adaptively selecting arms in a multi-armed bandit model with the goal of maximizing the sum of observed samples, viewed as rewards. While Thompson Sampling can be asymptotically optimal for this reward maximization objective, even in some non parametric families of bandit models, we shall see that it is sub-optimal for the dual adaptive testing objective of identifying the best arm. The first “Top Two” algorithm was proposed in 2016 by Dan Russo as a “fix” of Thompson Sampling for identifying the best arm. Since then, several others Top Two algorithms have been proposed, some of which move away from their original Bayesian flavor. Those algorithms are straightforward to implement and achieve very good theoretical and practical performance for identifying the best arm at a fixed confidence level, and beyond.

== FRIDAY 24th ==

Tom Berrett: Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility

 Given a set of incomplete observations, we study the nonparametric problem of testing whether data are Missing Completely At Random (MCAR).  Our first contribution is to characterise precisely the set of alternatives that can be distinguished from the MCAR null hypothesis.  This reveals interesting and novel links to the theory of Fréchet classes (in particular, compatible distributions) and linear programming, that allow us to propose MCAR tests that are consistent against all detectable alternatives.  We define an incompatibility index as a natural measure of ease of detectability, establish its key properties, and show how it can be computed exactly in some cases and bounded in others. Moreover, we prove that our tests can attain the minimax separation rate according to this measure, up to logarithmic factors.  Our methodology does not require any complete cases to be effective, and is available in the R package MCARtest.


Rémy Degenne : Lower Bounds for Sub-Gaussian Mean Estimation

I will present information-theoretic techniques used in bandits and tests to obtain lower bounds (on the stopping time of a bandit algorithm, on the error of en estimator...). Using these techniques, we prove a lower bound on the error of mean estimators for distributions with at least two finite moments. A sub-Gaussian mean estimator is parametrized by a number of samples $n$ and a confidence parameter $\delta$ and it returns an estimate of the mean of a distribution which is within $L\sqrt{2\sigma^2 \log(1/\delta) / n}$ of the true mean, with probability $1 - \delta$, where $\sigma^2$ is the unknown variance of the distribution and $L$ is a constant which should be as close to 1 as possible. One goal of the recent literature on this subject is to obtain a sub-Gaussian estimator for all distributions with two finite moments. Besides the lower bound on the performance of such estimators, we also discuss a new estimator inspired from the lower bound construction.


Ariane Marendon: Conformal p-values for novelty detection with FDR guarantee. Extension to link prediction. 


This talk combines two works on conformal p-values. The first studies the semi-supervised novelty detection problem where a set of “typical” measurements is available to the researcher. Motivated by recent advances in multiple testing and conformal inference, we propose AdaDetect, a flexible method that is able to wrap around any probabilistic classification algorithm and control the false discovery rate (FDR) on detected novelties in finite samples without any distributional assumption other than exchangeability. In contrast to classical FDR-controlling procedures that are often committed to a pre-specified p-value function, AdaDetect learns the transformation in a data-adaptive manner to focus the power on the directions that distinguish between inliers and outliers. The second work consists of extending this procedure to the problem of link prediction. 


Rui Castro: Anomaly detection when monitoring a large number of units: a permutation/rank-based higher criticism approach

Anomaly detection when monitoring a large number of units is essential in a variety of applications, ranging from epidemiological studies to monitoring of complex systems. In this work we take a distribution-free stance and introduce a variant of the higher criticism test that does not require knowledge of the null distributions for proper calibration. This results in an exact test in finite samples. Our rank-based approach is also suitable when each unit is associated with a set of observations of potentially very different nature. We provide an asymptotic test power characterization and show detection is characterized by a phase-transition reminiscent of that of detection of sparse heteroskedastic Gaussian mixtures. Our analysis requires a refined understanding of the distribution of the ranks under the presence of anomalies, and in particular of the rank-induced dependencies. Within the exponential family and a family of convolutional models, we analytically quantify the asymptotic performance of our methodology and the performance of a suitable oracle, and show the difference is small for many common models. Simulations confirm these results. As the proposed test itself does not rely on asymptotic approximations it typically perform better than popular variants of higher criticism relying on such approximations. Finally, we show the applicability of the methodology through an analysis of quality control data of a pharmaceutical manufacturing process. (based on past and ongoing work with Ivo Stoepker and Ery Arias-Castro)


An Adaptive Experiment to Boost Online Skills Signaling. Morgane HOFFMANN, Bertille PICARD, Charly MARIE and Guillaume BIED

Online Learning with Feedback Graphs: The True Shape of Regret. Tomáš Kocák and Alexandra Carpentier

An ε-Best-Arm Identification Algorithm for Fixed-Confidence and Beyond. Marc Jourdan, Rémy Degenne, Emilie Kaufmann

On Uniformly Optimal Algorithms for Best Arm identification in Two-Armed Bandits with Fixed budget. Po-An Wang, Kaito Ariu, Alexandre Proutiere

Hierarchical Bayesian Fixed-Budget Best-Arm Identification. Nicolas Nguyen, Imad Aouali, Andras Gyorgy, Claire Vernade

MAB with a guaranteed revenue per arm. Dorian Baudry, Hugo Richard, Mathieu Molina, Nadav Merlis, Vianney Perchet

A novel Non-Parametric Thompson Sampling for Heavy-Tail distributions. Dorian Baudry, Kazuya Suzuki, Junya Honda (focus on Section 5 of the paper)

Adaptive Algorithms for Relaxed Pareto Set Identification. Cyrille Kone, Emilie Kaufmann, Laura Richert

On Complexity of Differentially Private Best-Arm Identification with Fixed Confidence. Achraf Azize, Marc Jourdan, Aymen Al Marjani, Debabrota Basu

Monotonicity Test of Regressions in Sublinear Time. Zhi Liu, Housen Li, Axel Munk

Locally Optimal Best Arm Identification with a Fixed Budget. Masahiro Kato, Takuya Ishihara, Toru Kitagawa

Test of MCAR based on observed correlation matrices. Alberto Bordino, Thomas Berrett


The venue is in the campus Neues Palais of the Universität Potsdam in Germany (Am Neuen Palais 10, 14469 Potsdam). 

The room for the talks is in building 8, and is room 058. The room for the poster session  is also in building 8, and is room 056.

To come to the workshop:

Please check the DB website for precise time schedules and itineraries.

The coffee breaks will be served in front of the room for the talks (building 8, room 058), and the lunches will be taken at the Mensa, which is the restaurant of the university (in building 12).

Important practical information regarding hotels in Potsdam: many hotels in Potsdam have a check-in that closes relatively early - some even before 20:00 - so that if you arrive late in Potsdam, you might not be able to check-in. Please check carefully with your accommodation for check-in times.

Choice of hotel: Although we do not have specific recommendations for hotels, some participants have already booked rooms at Holiday Inn Express & Suites in the city center, and it is reasonable to choose a hotel in the city center, close to public transport to be able to reach the venue easily. 

Call for posters: We accept submissions of abstracts to be presented as posters. We accept submissions of preliminary or published work (no proceedings) on topics including but not limited to

Posters will be presented on the first day and will be an occasion for students and attendees to exchange about their research. There will not be proceedings of this workshop. 

Instructions:  Please send by email a one-page pdf document describing your work. The format is free, it can be either a short abstract (paper-style, free template) or directly a miniature version of your poster if you already have prepared it. The submissions should be sent to  claire . vernade [at] uni-tuebingen . de before September 15th EOD (Anywhere on Earth). Decisions will be sent by October 15th. Please do not forget to register using the form linked above even if you have already sent your submission by email (this submission is not a pre-registration).