First steps in Reinforcement Learning (with practice on NES games).

When talking about automation and AI multiple ideas of use in our daily lives come to mind, how to start and with what means you can develop all these automated mechanisms, is a mystery even for many developers who understand the subject and this is where we come in, we will give you a first hand tour in this world of deep reinforcement basing our articles in different parts, in this we will give you an introduction to the main topics that you must understand to achieve the goal of making your first agent in a video game through learning algorithms, so that you can thus overcome levels in NES games ("The limit is up to you").


The first and most important thing is to talk about what has been said at the beginning, but we still do not really know what it is about ("if it is your first time reading about this topic"), and that is learning by reinforcement. When we talk about learning in specific we get the definition of an acquisition of knowledge. Now to talk about learning by reinforcement, let's be clear about the term behavioral psychology which is developed in the experimental analysis and to put it easy is: what behavior an individual takes when an action is exerted on him. All these terms will help us to understand the theoretical basis of the study and not only the functionalities that can be performed with this, now with this review we have in mind that these behaviors can be oriented to machines or ("agents") to generate actions and learn from them achieving the best possible way to a solution.    


Reinforcement learning has a very important characteristic to allude to, and that is that it does not have a specific end ("exit or end"), so it is clear that our algorithm must learn by itself without any constraint in order to achieve its objective, and at the same time all the factors that can stop it will be taken into account.


Now we will briefly discuss the components that are part of reinforcement learning, one of them is the ("agent") that will be our test subject, which must be trained to make decisions and learn from them, the ("environment") that will be our environment, where the interactions with our agent are developed and is responsible for putting the necessary limitations which we must overcome. There are functions for them as the ("Action") that makes part of the possible movements that our agent can establish, the ("State") that are elements of the environment and with them the limitations subject to it, and the ("Rewards") that is the way to guide if the actions taken by the agent are oriented by the correct path to the solution, or in its case ("punishments") that identifies erroneous paths or actions.

Then we have that when the agent performs an action that does not contain consequences and can move forward to achieve the best path, it will be rewarded and will get feedback from these rewards to continue taking this same path in the future. While when the agent performs an action that ends up as a punishment, it will avoid following that path in the future ("it must take into account the exploration of different actions and avoid stalling the algorithm by not taking risky actions").

Finally, we made it clear that the only way for our agent to learn is to make mistakes and from these mistakes to achieve the necessary rewards to advance to the end of the processes. With this we have the necessary theoretical basis to begin to lend a hand in the real development of reinforcement learning, I ask you to stay tuned to the next blog that I will develop where we will talk about Q-Learning and show you how to develop this theory in a real environment, see you next time!


You may also be interested in

Leave a Reply

Your email address will not be published. Required fields are marked *