Home BlogBlog Detail

Smart supply chain management with reinforcement learning at Dow

By Erik Martinez | May 3, 2022

As proven through the pandemic, supply chains are complex and sensitive business processes to manage. Kinks or breaks in the chain could spell disaster for manufacturers and buyers alike. This is why Dow has decided to double down in its digitization efforts, which include the increased use of machine learning, advanced modeling techniques, robotics, and more.

Adam Kelloway, sometimes data scientist, other times ML or data engineer for Dow’s Digital Fulfillment Center, is helping lead the charge by driving reinforcement learning-, machine learning-, and mixed integer programming-based agents to accomplish Dow’s multi-agent automated and intelligent digital supply chain. The goal is to enable better and faster decision making that positively impacts customers, financial performance, and shareholder value.

One such project in the effort is AlphaDow, which creates reinforcement learning-based agents for production scheduling using RLlib and Ray Tune on Azure compute clusters where Ray’s implementation of population-based bandits is a huge help for tuning.

At the recent Production RL Summit, Adam dives into what it took to make AlphaDow viable and eventually integral to Dow’s supply chain management and scheduling. Challenges were common in taking on something so unique.

For instance, in-house RL means that everything can be a potential design variable, from actions and rewards to observations and design. Nothing is independent in Dow’s systems and the task required lots of time and resources to achieve, especially around compute. In their search for the right solution, Adam and his colleagues discovered the Anyscale ecosystem and Ray libraries and capabilities. Still there were eventually challenges around deployment too — particularly going from simulations and concepts that are hard to map to real-world data and sources. They overcame this problem again with the help of Ray.

Watch Adam’s Production RL Summit talk to learn exactly how his team used Ray to make AlphaDow possible and eventually turn it into the lighthouse project that got the whole company onboard with the possibilities of reinforcement learning and machine learning as a whole.

Sharing

Sign up for product updates

Deploy DeepSeek‑R1 with vLLM and Ray Serve on Kubernetes

Introducing KubeRay v1.4

The architecture of a Reinforcement Learning (RL) library is split into two primary components: Generation and Training. During the generation phase, an LLM Engine performs multi-turn rollouts within an environment to produce data and reward signals. This output is then fed into the training phase to update the model's parameters. This process forms a feedback loop, where the progressively improved model generates the next iteration of data for continuous refinement.

Open Source RL Libraries for LLMs

Ready to try Anyscale?

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.