Knowledge Graph — Coursera Notes › Academic disciplines › Computer Science / Information Technology › Artificial Intelligence › Machine Learning & Data
Reinforcement Learning
concept · part of Machine Learning & Data
A machine learning paradigm where an agent learns a policy through trial and error to maximize cumulative reward from an environment.
Key algorithms include Q-learning, policy gradients, and temporal-difference learning, as detailed in Sutton and Barto's definitive textbook.
Inside Reinforcement Learning (9)
- Average reward per episode — Cumulative reward divided by number of episodes, smoothing fluctuations to reveal learning trends.
- Cumulative reward — Total rewards collected in an episode or across episodes, often discounted by factor γ.
- Episode length — Number of steps taken to complete an episode; decreasing length may indicate more efficient goal-reaching.
- Exploration vs. exploitation ratio — Balance between trying new actions and using known high-reward ones, e.g., ε-greedy with ε=0.1.
- Policy stability — How often the agent changes its policy after convergence, indicating confidence and consistency.
- Reinforcement learning in production — Reinforcement learning is used by Netflix to optimize recommendation algorithms over time by maximizing user satisfaction and engagement.
- Sample efficiency — How well the agent learns from few experiences; critical when data collection is expensive.
- Success rate — Ratio of successful episodes to total episodes, a direct measure of task completion.
- Time to convergence — Number of episodes or steps until the policy stabilizes; lower is better when training is costly.
Connections
- Related to Supervised Learning
- Related to Unsupervised Learning
This is the text view of an interactive 3D knowledge graph — open this page with JavaScript enabled to explore it visually.