Ray Use Cases

Distributed training with Ray on Kubernetes

Ray Summit 2022

Is your training infrastructure built on Kubernetes? Do you want to enable Ray on Kubernetes? Our ML platform is completely based on Kubernetes because of its scalability and rapid bootstrapping time of resources. In this talk we will demonstrate how we are leveraging Ray on Kubernetes to create an infrastructure to perform distributed training. We will showcase our custom SDKs that let users spawn on-demand Ray clusters to train models from notebooks. Our SDKs abstract and hide the complexities of spawning and bringing down the on-demand cluster from our users so that they can focus on the "what" while the platform takes care of the "how."

About Anindya

Anindya Saha is a machine learning platform engineer at Lyft. He led and implemented the Spark Notebooks on Kubernetes feature on the platform for ML prototyping on large data and creating on-demand Spark Kubernetes Cluster. He is currently working on enabling scalable distributed training on the ML platform. He also developed model deployment workflow and model monitoring capabilities on the ML platform.

About Han

Han Wang is the tech lead for the Lyft Machine Learning Platform, focusing on distributed computing and machine learning solutions. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon, and Quantlab. Han is the founder of the Fugue project, aiming at democratizing distributed computing and machine learning.

Anindya Saha

Staff Engineer, Lyft Inc.

Han Wang

Lead Engineer, Lyft Inc.
Ray Summit 2022 horizontal logo

Ready to Register?

Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.

Save your spot

Join the Conversation

Ready to get involved in the Ray community before the conference? Ask a question in the forums. Open a pull request. Or share why you’re excited with the hashtag #RaySummit on Twitter.