When you do something well, you’re rewarded. This simple principle has guided humans since the beginning of time, and now, more than ever before, it is the key principle behind a growing number of reinforcement learning (RL) agents within the technologies we use and rely upon every day.
What Is Reinforcement Learning?
Reinforcement learning is a technique for training machine learning models to make a series of decisions, usually based on uncertain and complicated data sets. The RL agent is rewarded for correct decisions and penalized for incorrect decisions. Its goal is to maximize the rewards and, in short, treat each problem as a game. Reinforcement learning is used in gaming AI, robot navigation, or any application requiring that the model acquire new skills or make fast decisions in real time.
The design team sets up the rules for the model to determine what generates rewards or penalties, but once these are set, there is no further intervention. The model runs on its own and, through trial and error, learns to maximize the rewards to develop sophisticated skills in pattern recognition and problem-solving.
Importance of Reinforcement Learning
Reinforcement learning agents have specific goals to achieve, can sense and adapt to changes in their environments and can choose actions to influence those environments. Of all the types of machine learning, it is the closest to how humans learn, and, some would argue, RL can exhibit creativity.
When reinforcement learning is done well, it can achieve results that are faster and make fewer errors than humans. If given sufficient computer resources, an RL model can process thousands of parallel gameplays, learning from each one. The result is often more innovative solutions to problems.
The benefits of RL models are seen when they are used in environments with complex data or environments when the data is changing quickly. Because they are trained with a system of penalties and rewards, rather than being directed down predetermined paths of decisions, RL models are able to create their own solutions to unforeseen circumstances.
Consider, for example, a navigation application that has to respond to changing traffic situations or a complex supply chain, where any number of factors can change without notice. RL models are able to respond quickly and efficiently in environments where humans would often take much longer to respond and, often, not as accurately.
How Does Reinforcement Learning Work?
Reinforcement learning is always based on the reward hypothesis. A Reward (Rt) is a scalar feedback system, which indicates how well the agent is doing at Step t. The agent’s goal is to maximize its cumulative reward.
- The RL agent receives a state (S) from the environment.
- The agent takes an Action (A) based on S.
- The environment is now in a new state, S2, due to the agent’s interaction with it, and a reward (or penalty) is given to the agent.
The rewards or penalties vary with what the agent is expected to achieve. In the case of a video game, the agent may receive points as rewards, just as a human player would. In the case of a car’s autopilot, rewards may be given for keeping the vehicle on its expected trajectory, while penalties would be given in the event of a collision. A robot may be rewarded for walking or penalized for falling over.
Throughout this process, the agent uses a value function to evaluate the changes in state due to its possible actions and to predict future rewards. It can then choose the correct course of action.
Reinforcement Learning Algorithms
At the core of each RL agent are the algorithms that process the data from the environment. There are numerous open-source libraries of algorithms available, such as:
- Stable Baseline 3 (SB3): Is a set of reliable RL algorithms in PyTorch. The RL Baselines3 Zoo includes a collection of pre-trained agents, scripts for training, evaluating agents, tuning hyperparameters, plotting results and also recording videos.
- ACME: A library of RL building blocks designed for Python. It includes numerous baseline agents to serve as starting points for new implementations, as well as serving as strong baselines for comparing the performance of new iterations.
- Dopamine: A library of RL agents that supports Tensorflow and Jax. A large portion of the library is designed for Atari 2600 video games. At the default settings, a single training experiment will last 200 million frames.
- RLib: Supports TensorFlow and PyTorch, with a strong focus on high scalability and a unified API. It offers different environments for a single agent and single policy, multiple agents, as well as multiple agents with multiple policies.
Applications of Reinforcement Learning
In 2021, researchers at the University of California, Berkeley, trained a bipedal robot named Cassie how to walk. In Japan, robotics manufacturer Fanuc has been training robots to distinguish random objects in boxes, retrieve them and place them in another box. This may seem like child’s play because it is. If you consider how difficult such tasks are for toddlers to master, you can appreciate how revolutionary it is for a machine to program itself to do these things with RL.
In finance, RL models are used for trades, outperforming the best hedge fund managers, by observing the state of stock prices and taking timely action to maximize gains. Meanwhile, in eCommerce, RL agents are replacing older recommendation systems for consumers. Alibaba Group, for example, is now using an RL agent, Robust DQN, to improve recommendations to its customers by taking into account ever-changing data on customer preferences. The same company also uses RL agents to bid on paid online ads to optimize reach without needlessly overbidding.
Reinforcement learning agents can also be found in automated industrial control systems, as well. Google’s AlphaGo, built by Deepmind, is currently being used for optimizing cooling systems. Salesforce uses an RL agent for summarizing long textual content, including articles, blogs and other documents.
Reinforcement Learning with Domino Data Lab
Every week, organizations announce breakthroughs in reinforcement learning technologies, pushing thresholds on what is considered plausible, and building one successful agent after another. Regardless of how large your data science team is or which sector you work in, you will need the libraries, tools and resources available only in an enterprise-class MLOps platform to take your collaborative projects from inception to successful deployment.
Domino’s Enterprise MLOps is today’s leader in machine learning platforms and is now used by 20 Fortune 100 companies. To begin exploring the advantages of Domino’s Enterprise MLOps platform, sign up for a free 14-day trial.
David Weedmark is a published author who has worked as a project manager, software developer and as a network security consultant.
New to Domino? Consider a Guided Tour.Start Your Free Domino Trial