Home BlogBlog Detail

Enterprise Applications of Reinforcement Learning: Recommenders and Simulation Modeling

By Ben Lorica | March 25, 2020

When I wrote a post about reinforcement learning (RL) applications in industry over two years ago, there were a few early signs that companies were beginning to explore applications of RL. Fast forward to today and there are indications that more enterprises are actively working on RL tools and technologies. In recent years machine learning research – particularly research in deep learning – has had a profound impact on enterprise applications. We’re now also seeing more researchers studying RL and some of these investments will begin to show up in applications. In 2019 about one-fifth of all machine learning papers posted on arxiv mentioned “reinforcement learning” (in comparison, about half of all ML papers mentioned “deep learning”):

The growth in RL research is being accompanied by interest in RL talent. A quick search of job postings reveals that many companies are already looking to hire people with expertise in RL. At this point there are several hundred job postings that mention RL scattered throughout the major US technology hubs. To put this in perspective, job postings that mention “reinforcement learning” are already about one-eighth the number of job postings that mention “deep learning”. A simple analysis of job descriptions suggests that aside from exploratory R&D, companies hiring for RL talent are interested in building recommender systems, computer vision, and natural language applications. This aligns with a 2019 O’Reilly Survey that drew more than 1,300 respondents: in that survey, 22% signaled they are beginning to use reinforcement learning.

In previous surveys that I’ve been a part of, we found that companies who were still in the early stages of adopting AI technologies often struggled with finding the right use cases. This is particularly challenging for RL as many data scientists within companies have yet to familiarize themselves with this class of modeling techniques. So what class of problems is RL best suited for? UC Berkeley’s Ben Recht created a taxonomy that situates RL in the context of unsupervised and supervised learning:

Descriptive analytics refers to summarizing data in a way to make it more interpretable. Unsupervised learning is a form of descriptive analytics. Predictive analytics aims to estimate outcomes from current data. Supervised learning is a kind of predictive analytics. Finally, prescriptive analytics guides actions to take in order to guarantee outcomes. RL as described here falls into this bucket.
… The most challenging form of analytics and the one that can return the most value is prescriptive analytics. The value proposition is clear: prescriptive analysis and reinforcement learning demand interventions with the promise that these actions will directly lead to valuable returns. Prescriptive analysis consumes new data about an uncertain and evolving environment, makes predictions, and uses these predictions to impact the world.
– Make it Happen, Benjamin Recht

When does RL make sense as an alternative to supervised learning? In certain situations it is difficult to obtain the labeled data needed for supervised learning because it’s cost-prohibitive or labels are difficult to assign (e.g., level of user annoyance with an ad). There are also application environments where “memory” plays a role – these are situations where pertinent decisions depend on a series rather than a singular event that can be conveniently labeled (e.g., customer churn might be due to an accumulation of minor events).

Fortunately there are many business problems and applications that can be cast as “sequential decision-making”. And for this set of problems, RL is starting to be evaluated alongside more familiar approaches including optimal control, dynamic programming, and other optimization techniques. Let me discuss a couple of examples in detail: recommenders and simulation modeling.

LinkRecommenders

Several technology companies have given talks or published articles describing how they are beginning to incorporate RL into existing recommenders or personalization systems. The idea is to utilize real-time user feedback and behavior in recommender systems by building RL agents that optimize desired reward functions (e.g., user engagement, long term satisfaction). Examples include:

Top-K recommender system at YouTube
Deep Reinforcement Learning for Page-wise Recommendations at JD.com
Horizon, Facebook’s open source applied Reinforcement Learning platform is used to power two types of personalized notifications: Push Notifications to users and Page Administrator Notifications
Artwork Personalization at Netflix

These companies have overcome several challenges in order to integrate RL into their large-scale recommendation systems, such as an extremely large and ever changing number of items to recommend, many of which come with sparse feedback, and user preferences that continue to evolve and shift over time. RL-based recommenders are also very difficult, or often impossible, to test offline.

These early examples of RL for recommenders have drawn interest, but as with any new technology or technique, adoption depends on examples and training materials. While these few, well-known companies have written or spoken about their use of RL for recommenders, broad adoption will require access to open source tools, sample code, tips on setting up simulation environments or how to learn from historical logs, and detailed tutorials to help practitioners explore how to incorporate RL into their existing recommendation systems. Once these ingredients are in place, more companies will definitely explore RL for recommendations and personalization.

LinkSimulation Modeling and Optimization

Simulation modeling and optimization have many real-world applications in business and industrial settings. Areas where these techniques have long been used include logistics, retail, manufacturing, industrial engineering and management, engineering, oil and gas, defense and security, and more. There are software providers that let companies test and explore “what-if” scenarios to improve their forecasting and planning, and some simulators even come with 2D/3D animation tools for visualizing simulations.

Once you understand the problems the simulation modeling tools are attempting to solve, it’s no surprise that RL is beginning to be integrated into such software systems. Applications of RL in playing complex, multiplayer video games hint at future software systems designed to model and simulate business processes and problems. The same RL technology used to learn how to play multiplayer games at an advanced level is capable of handling many scenarios addressed by current simulation modeling solutions. Through San Francisco startups Pathmind and Bonsai (now part of Microsoft), I’ve already seen early glimpses of how RL is being incorporated into simulation modeling software. I expect this to be an arena where RL will be extensively used (albeit in the background).

LinkBetter Tools

Open source libraries for deep learning, combined with researchers who published code in these popular frameworks, opened up deep learning to the community of Python data scientists. Tools like TensorFlow and PyTorch, and the ecosystem of sharing that sprouted around them, made deep learning something that practitioners could more easily try and use.

A similar pattern is beginning to play out in RL. We are beginning to see more RL tools, both open source and cloud SaaS offerings. Open source tools include Keras-RL, Intel RL Coach, Facebook ReAgent, and Ray RLlib, while SaaS options include Amazon Sagemaker and CogitAI. These tools pave the way for practitioners who want to experiment with RL but may not have the expertise to build their own suite of tools.

The growing number of users adopting Ray RLlib indicates that experimentation with RL is well underway. RLlib is proving to be a great library for practitioners because all algorithms included are distributed and accessible through simple Python APIs. In addition, researchers can take advantage of the fact that RLlib provides a unified API for different types of RL training. Thus RLlib is now being used by both RL researchers and practitioners at companies like Pathmind (simulation modeling), Intel (manufacturing and testing), JPMorgan (financial trading), and Ericsson (network optimization and shaping).

LinkMany Challenges Remain

The growth in the number of tools and the enterprise use cases bode well for the practical application of RL. With that said, RL has well-documented challenges that remain, including:

Lack of detailed tutorials that target enterprise use cases.
The burden of building real-world simulators or mechanisms for training off-line from logs.
Challenges when it comes to explainability and reproducibility.
Computational inefficiency.
High-dimensional state and action spaces
Using RL safely and effectively in production can be difficult. RL introduces additional algorithmic and system complexity that go beyond supervised machine learning. It is also difficult to test, ensure the reliability of, and improve RL systems that run within live systems that cannot be effectively simulated (e.g., recommenders). The difficulty of ensuring reliability is also shared in settings where simulation is possible.

LinkSummary

The success of reinforcement learning in game play (Atari, Go, multiplayer video games) and in industrial settings (e.g., data center efficiency) has led to considerable interest from industrial data scientists and machine learning engineers. We are beginning to see an expanding number of open source tools and cloud services for RL. This has led to a series of enterprise use cases in recommendations, personalization, and simulation modeling. There are challenges that remain, foremost being the need for better tutorials that target industry users interested in applications to real-world problems, and for new technologies that address the complex technical difficulties of using RL in production settings. But all signs point to 2020 as the year reinforcement learning gets added to the toolbox of data scientists and machine learning engineers.

Recommenders
Simulation Modeling and Optimization
Better Tools
Many Challenges Remain
Summary

Sharing

Sign up for product updates

Deploy DeepSeek‑R1 with vLLM and Ray Serve on Kubernetes

Introducing KubeRay v1.4

The architecture of a Reinforcement Learning (RL) library is split into two primary components: Generation and Training. During the generation phase, an LLM Engine performs multi-turn rollouts within an environment to produce data and reward signals. This output is then fed into the training phase to update the model's parameters. This process forms a feedback loop, where the progressively improved model generates the next iteration of data for continuous refinement.

Open Source RL Libraries for LLMs

Ready to try Anyscale?

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.