Introduction to Ray Serve

Wednesday, March 23, 4:00PM UTC

Ray Serve is Ray’s model serving library. Traditionally, model serving requires configuring a web server or a cloud-hosted solution. These approaches either lack scalability or hinder development through framework-specific tooling, vendor lock-in, and general inflexibility. Ray Serve overcomes these limitations. It offers a developer-friendly and framework-agnostic interface that provides scalable, production-ready model serving.

Ray Serve is

  • Scalable: It provides fine-grained resource management and scaling using Ray.

  • Framework-agnostic: It works with any Python code, regardless of framework.

  • Production-ready: It comes with a web server out of the box and handles routing, testing, and scaling logic for deployments.

  • Developer-friendly: It offers a decorator-based API that converts existing applications into Ray Serve deployments with minimal refactoring.

This presentation introduces Ray Serve, including its use cases and its features. It walks through Ray Serve setup and integration with existing machine learning models. To learn more about Ray Serve, please visit the following sites:

Join the discussion with fellow Ray and Ray Serve users on the Ray forum and the Ray Slack.




Shreyas Krishnaswamy

Software Engineer, Anyscale

Shreyas Krishnaswamy is a software engineer focusing on Ray Serve and Ray infrastructure at Anyscale.

Simon Mo

Simon Mo

Software Engineer, Anyscale

Simon Mo is a software engineer working on Ray Serve at Anyscale. Before Anyscale, he was a student at UC Berkeley participating in research at the RISELab. He focuses on studying and building systems for machine learning, in particular, how to make ML model serving systems more efficient, ergonomic, and scalable.