Reinforcement learning is a machine learning model similar to supervised learning, but the algorithm isn’t trained using sample data. This model learns as it goes by using trial and error. A sequence of successful outcomes is reinforced to develop the best recommendation for a given problem. The foundation of reinforcement learning is rewarding the “right” behavior and punishing the “wrong” behavior.
You might be wondering, what does it mean to “reward” a machine? Good question! Rewarding a machine means that you give your agent positive reinforcement for performing the “right” thing and negative reinforcement for performing the “wrong” things.
As a machine learns through trial and error, it tries a prediction, then compares it with data in its corpus.
- Each time the comparison is positive, the machine receives positive numerical feedback, or a reward.
- Each time the comparison is negative, the machine receives negative numerical feedback, or a penalty.
Over time, a machine’s predictions will grow to be more accurate. It accomplishes this automatically based on feedback, rather than through human intervention.