Ray Serve supports inference on CPUs, GPUs (even fractional GPUs!), and other accelerators – using just Python code.
In addition to single-node serving, Serve enables seamless multi-model inference pipelines (also known as model composition); autoscaling via Kubernetes, both locally and in the cloud; and integrations between business logic and machine learning model code. You can run Ray Serve applications on a single node, or on a cluster, with minimal to zero code changes.
Ray Serve is:
Framework-agnostic: Use a single toolkit to serve everything from deep learning models built with frameworks like PyTorch, Tensorflow, and Keras, to Scikit-Learn models, to arbitrary Python business logic.
Python-first: Configure your model serving declaratively in pure Python, without needing YAML or JSON configs.
Natively integrated with FastAPI, and supports any arbitrary Python web server.
By the end of the webinar, you will understand how to deploy a machine learning model either locally, or as a managed service on Anyscale (via AWS or GCP). No specialized machine learning knowledge is required to attend.
Join the discussion with fellow Ray and Managed Ray Serve users on the Ray forum and the Ray Slack.