Multi-Armed Bandit Tutorial
Interactive demonstration of Multi-Armed Bandit algorithms and human decision-making.
Multi-Armed Bandit Problem
The Multi-Armed Bandit problem involves sequential decision-making with limited information. Each "arm" (slot machine) has an unknown reward distribution:
\[r_i \sim \mathcal{N}(\mu_i, \sigma_i^2)\]where \(\mu_i\) is the mean reward and \(\sigma_i^2\) is the variance for arm \(i\).
Common Strategies
1. ε-Greedy:
\[a_t = \begin{cases} \arg\max_i \hat{\mu}_i & \text{with probability } 1-\varepsilon \\ \text{random arm} & \text{with probability } \varepsilon \end{cases}\]2. Upper Confidence Bound (UCB1):
\[a_t = \arg\max_i \left(\hat{\mu}_i + \sqrt{\frac{2\ln t}{n_i}}\right)\]where \(t\) is the total number of pulls and \(n_i\) is the number of times arm \(i\) was pulled.
3. Thompson Sampling:
\[\theta_i \sim \text{Beta}(\alpha_i, \beta_i)\] \[a_t = \arg\max_i \theta_i\]
Your Rewards
Random
ε-Greedy
UCB1
Thompson Sampling
Experiment Parameters
5
0.1
0.5
20
Your Performance
Total Reward: 0
Average Reward: 0
Best Arm Found: -
AI Performance
Random Reward: 0
ε-Greedy Reward: 0
UCB1 Reward: 0
Thompson Reward: 0