# Reinforcement learning

## What is Reinforcement Learning

"Reinforcement learning (RL) is learning from interaction with an environment, from the consequences of action, rather than from explicit teaching." -- Rich Sutton

## Evaluative Feedback (Chapter 2)

### Softmax Action Selection

Softmax action selection is the way to maintain exploration and exploitation balance. The softmax policy will choose action a on period t with probability: ${\displaystyle {\frac {\exp({\frac {Q_{t}(a)}{\tau }})}{\sum _{b=1}^{n}\exp({\frac {Q_{t}(b)}{\tau }})}}}$

