Ray Serve supports inference on CPUs, GPUs (even fractional GPUs!), and other accelerators – using just Python code.
In addition to single-node serving, Serve enables seamless multi-model inference pipelines (also known as model composition); autoscaling via Kubernetes, both locally and in the cloud; and integrations between business logic and machine learning model code. You can run Ray Serve applications on a single node, or on a cluster, with minimal to zero code changes.
Ray Serve is:
Framework-agnostic: Use a single toolkit to serve everything from deep learning models built with frameworks like PyTorch, Tensorflow, and Keras, to Scikit-Learn models, to arbitrary Python business logic.
Python-first: Configure your model serving declaratively in pure Python, without needing YAML or JSON configs.
Natively integrated with FastAPI, and supports any arbitrary Python web server.
By the end of the webinar, you will understand how to deploy a machine learning model either locally, or as a managed service on Anyscale (via AWS or GCP). No specialized machine learning knowledge is required to attend.
Paige Bailey is the developer experience and product lead for Ray and its open-source ecosystem. Prior to joining Anyscale, Paige was director of machine learning and MLOps at GitHub; lead PM for machine learning frameworks at Google Brain and DeepMind; and a senior software engineer at Microsoft. She has over a decade of experience as a machine learning practitioner, and cares deeply about reducing the friction for bringing large-scale models into production. You can find her on GitHub and Twitter at @dynamicwebpaige.
Simon Mo is a software engineer working on Ray Serve at Anyscale. Before Anyscale, he was a student at UC Berkeley participating in research at the RISELab. He focuses on studying and building systems for machine learning, in particular, how to make ML model serving systems more efficient, ergonomic, and scalable.
Edward Oakes is a software engineer and project lead on the Ray Serve team. He works across the stack at Anyscale, from Ray Core to Ray Serve to the Anyscale platform.