Home EventsProductionizing ML at scale with Ray Serve

Ray Meetup

Productionizing ML at scale with Ray Serve

Thursday, April 14, 1:00AM UTC

It’s time for April's Ray Meetup, a monthly series where we get together to discuss Ray and Ray’s native libraries for scaling machine learning workloads. This month we will discuss Ray Serve, Ray’s ML framework-agnostic, production-ready, operational, and scalable model serving library.

LinkMaterials & Resources

Along with a demo, the talks will cover three functional areas of model serving with Ray Serve:

An overview of Ray Serve features and functionality and roadmap
On building multi-model inference pipelines with Ray Serve and scaling with Ray
Operationalizing Ray Serve

Join us if you are interested in serving and operationalizing ML models at scale using Ray Serve!

Agenda

6:00 PM Welcome remarks, announcements & agenda by Jules Damji, Anyscale
6:05 PM “Ray Serve: Overview and roadmap,” Edward Oakes, Anyscale
6:15 PM Q&A
6:20 PM “Developing and deploying scalable multi-model inference pipelines,” Jiao Dong, Anyscale
6:45 PM Q&A
7:00 PM “Operationalizing Ray Serve,” Shreyas Krishnaswamy, Anyscale
7:25 PM Q&A
7:30 PM Demo
7:45 PM Q&A

LinkTalk 1: Ray Serve: Overview and future roadmap

In this introductory session, we’ll discuss the motivation behind Ray Serve, who’s using Ray Serve and why, and recent features and updates, including a look at the future feature roadmap as we approach Ray 2.0.

LinkTalk 2: Developing and deploying scalable multi-model inference pipelines

In this talk, we aim to show how to leverage the programmable and general-purpose distributed computing ability of Ray to facilitate authoring, orchestrating, scaling, and deployment of complex serving pipelines as a DAG under one set of APIs, like a microservice. Learn how you can program multiple models dynamically on your laptop as if you’re writing a local Python script, deploy to production at scale, and upgrade individually.

LinkTalk 3: Operationalizing Ray Serve

In this session, we will introduce you to a new declarative REST API for Ray Serve, which allows you to configure and update your Ray Serve applications without modifying application files. Incorporate this API into your existing CI/CD process to manage applications on Ray Serve as part of your MLOps lifecycle.

Speakers

Edward Oakes

Software Engineer

Edward Oakes is a software engineer and project lead on the Ray Serve team. He works across the stack at Anyscale, from Ray Core to Ray Serve to the Anyscale platform.

Jiao Dong

Software Engineer, Anyscale

Jiao Dong is a software engineer focusing on Ray Serve and Ray infrastructure at Anyscale.

Shreyas Krishnaswamy

Software Engineer, Anyscale

Shreyas Krishnaswamy is a software engineer focusing on Ray Serve and Ray infrastructure at Anyscale.

Other Events

[Ray Meetup] Scaling Multimodal AI: Lessons from Netflix

06 . 26 . 2025 , 12:30 AM (PST)

[Ray Meetup] Ray + vLLM in Action: Lessons from Pinterest and Deepseek Deployments

06 . 11 . 2025 , 01:00 AM (PST)

[Ray Meetup] LLMs on Ray + LanceDB

04 . 18 . 2025 , 01:00 AM (PST)

Ray Meetup

Productionizing ML at scale with Ray Serve

LinkMaterials & Resources

Q&A >>>

“Ray Serve: Overview and roadmap” slides >>>

"Developing and deploying scalable multi-model inference pipelines" slides >>>

“Operationalizing Ray Serve" slides >>>

Deployment graph documentation with code >>>

Demo code >>>

Agenda

LinkTalk 1: Ray Serve: Overview and future roadmap

LinkTalk 2: Developing and deploying scalable multi-model inference pipelines

LinkTalk 3: Operationalizing Ray Serve

Speakers

Edward Oakes

Software Engineer

Jiao Dong

Software Engineer, Anyscale

Shreyas Krishnaswamy

Software Engineer, Anyscale

Other Events

[Ray Meetup] Scaling Multimodal AI: Lessons from Netflix

[Ray Meetup] Ray + vLLM in Action: Lessons from Pinterest and Deepseek Deployments

[Ray Meetup] LLMs on Ray + LanceDB