cover

85. Introduction and Examples of Reinforcement Learning#

85.1. Introduction#

Reinforcement learning (RL) is a very cutting-edge discipline and may be one of the means to achieve strong (human-like) artificial intelligence. Understanding and mastering basic reinforcement learning methods will deepen your understanding of the concept of artificial intelligence.

85.2. Key Points#

  • Introduction to Reinforcement Learning

  • Classification of Reinforcement Learning Algorithms

  • Applications of Reinforcement Learning

  • Recommended Extra-curricular Content

85.3. Introduction to Reinforcement Learning#

Machine learning is generally divided into four major branches, namely: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. In this course, we will introduce the concept of reinforcement learning and complete algorithm application practice. Since the content covered in the course is relatively difficult, it is required that you have basic machine learning knowledge.

Reinforcement learning emphasizes how to act based on the environment to achieve the maximum expected benefit. Reinforcement learning is a rapidly developing branch of discipline, and many scientists believe that it may be one of the ways to achieve strong artificial intelligence (general artificial intelligence).

The process of reinforcement learning generally consists of five elements, namely: Agent, Environment, Action, State, and Reward. The relationships among them are as follows:

https://cdn.aibydoing.com/aibydoing/images/document-uid214893labid6102timestamp1531891104587.png

Next, we will use a maze example to introduce the specific meanings of the five elements of reinforcement learning.

image

As shown in the figure above, assume that the little lion wants to get out of a maze through reinforcement learning. First of all, the little lion is the Agent, and the maze is the Environment it is in.

When the little lion tries to get out of the maze, it can generate 4 actions (Actions) when standing on each grid, namely moving up, down, left, or right. When each action acts on the environment, the maze will give the little lion a reward. The reward can be positive (positive reward) or negative (negative reward). For example, we usually set a large positive reward at the exit so that the little lion can find the exit of the maze.

So, what is the State? Whenever the little lion takes an action, it then enters the next State. The State is similar to a summary of the previous historical actions and is used to guide the next action. And how does the State summarize the historical actions? This involves the Markov decision process, which we will elaborate on in the experiment.

Next, the little lion will continuously try and make mistakes in the maze and finally find the exit. When it reaches the exit, it is when the sum of the positive rewards is the largest.

Finally, let’s summarize the process of reinforcement learning again through the little lion walking through the maze: The little lion (Agent) continuously executes a certain action (Action) in the environment (Environment), then transitions to the next state (State), and receives a reward (Reward) from the environment (Environment), thereby updating and training the Agent.

One thing needs to be added here. The reward feedback in reinforcement learning is often delayed because you may take many steps before getting a positive or negative reward. This is also one of the important characteristics of reinforcement learning.

85.4. Differences between Reinforcement Learning and Supervised Learning#

After reading the above introduction to the typical process of reinforcement learning, you may wonder what the differences are between it and supervised learning?

Simply put, supervised learning is a process of finding patterns from labeled known data and making predictions on unknown data. Reinforcement learning, on the other hand, is an active learning process where the agent continuously becomes smarter based on the feedback rewards received from the environment.

In fact, in supervised learning, the dataset can also be regarded as the environment, except that we already have a comprehensive understanding of the environment and its correct labels from the beginning. Reinforcement learning is completely different. We are almost completely ignorant of the environment at the beginning and can only continuously learn about the environment by receiving rewards (labels) from the environment.

In fact, there are completely different learning environments and methods between supervised learning and reinforcement learning, and their applicable scenarios are also not quite the same. You will clearly feel that reinforcement learning is more similar to the way people learn when exploring the unknown. Take another example. When a child is learning about right and wrong, there are two ways.

  • If in the way of supervised learning, it is similar to an adult telling a child that doing this is right and doing that is wrong. Then, the child can know what is right and what is wrong. There is no reward mechanism in this process, and no rewards can be obtained from the environment.

  • If in the way of reinforcement learning, it is similar to an adult not telling the child anything. When the child swears, they will be beaten (negative reward), and when the child does something right, they will be given a candy (positive reward). Finally, the child will also establish a sense of right and wrong. In order to get more candies, the child will try to avoid doing wrong things.

85.5. Classification of Reinforcement Learning Algorithms#

Similar to supervised and unsupervised learning methods, reinforcement learning also includes a variety of different algorithms. These algorithms can be subdivided along multiple dimensions, as shown in the following table:

https://cdn.aibydoing.com/aibydoing/images/document-uid214893labid6102timestamp1531891110741.png

85.6. Reinforcement Learning Applications#

Here, we introduce several practical applications of reinforcement learning. Among them, the most well-known one should be the AlphaGo Go program. AlphaGo is a Go program developed by Google DeepMind since 2014, and it is almost invincible against human players.

https://cdn.aibydoing.com/aibydoing/images/document-uid214893labid6102timestamp1531891112305.png

According to the paper published by DeepMind, AlphaGo uses Monte Carlo tree search in reinforcement learning and completes moves with the help of the Value network (a deep neural network) and the Policy network (a deep neural network).

In addition, startups like COVARIANT.AI have started using reinforcement learning to build more powerful industrial robots, JPMorgan Chase has used reinforcement learning to establish a new securities trading system, and Facebook has used reinforcement learning to train chatbots that can participate in negotiations.

Therefore, welcome to follow the content of this course to learn and master the introductory algorithms and basic knowledge points related to reinforcement learning.


○ Sharethis article link to your social media, blog, forum, etc. More external links will increase the search engine ranking of this site.