Ray Summit 2022
Is your training infrastructure built on Kubernetes? Do you want to enable Ray on Kubernetes? Our ML platform is completely based on Kubernetes because of its scalability and rapid bootstrapping time of resources. In this talk we will demonstrate how we are leveraging Ray on Kubernetes to create an infrastructure to perform distributed training. We will showcase our custom SDKs that let users spawn on-demand Ray clusters to train models from notebooks. Our SDKs abstract and hide the complexities of spawning and bringing down the on-demand cluster from our users so that they can focus on the "what" while the platform takes care of the "how."
Anindya Saha is a machine learning platform engineer at Lyft. He led and implemented the Spark Notebooks on Kubernetes feature on the platform for ML prototyping on large data and creating on-demand Spark Kubernetes Cluster. He is currently working on enabling scalable distributed training on the ML platform. He also developed model deployment workflow and model monitoring capabilities on the ML platform.
Han Wang is the tech lead for the Lyft Machine Learning Platform, focusing on distributed computing and machine learning solutions. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon, and Quantlab. Han is the founder of the Fugue project, aiming at democratizing distributed computing and machine learning.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.
Save your spot