Posts by Simon Mo

blog-deployment-graph-api-thumb
05 . 18 . 2022

Multi-model composition with Ray Serve deployment graphs

Learn more about the Ray Serve Deployment Graph API, now in alpha, which allows developers to build scalable and flexible inference serving pipelines as directed acyclic graphs (DAGs) that take advantage of Ray's compute for scaling.

blog-ray-serve-flask-thumb
03 . 02 . 2022

Deploying XGBoost models with Ray Serve

In this article, we’ll cover how to deploy XGBoost with two frameworks: Flask and Ray Serve. We’ll also highlight the advantages of Ray Serve over other serving solutions when comparing models in production.

blog-serving-pytorch-models-thumb
02 . 23 . 2022

Serving PyTorch models with FastAPI and Ray Serve

In this article, we will highlight the options available for serving a PyTorch model into production and deploying it with several frameworks, such as TorchServe, Flask, and FastAPI.

The live traffic pattern is variable and predictable. Serve’s replica autoscaling drives the cost down further during idle time.
10 . 19 . 2021

Cheaper and 3X Faster Parallel Model Inference with Ray Serve

Wildlife Studios’s ML team was deploying sets of ensemble models using Flask. It quickly became too hard and too expensive to scale. By using Ray Serve, Wildlife Studios was able to improve the latency and throughput while reducing the cost. Ray Serv...

Where Ray Serve Fits In
10 . 01 . 2021

Serving ML Models in Production: Common Patterns

Over the past couple years, we've listened to ML practitioners across many different industries to learn and improve the tooling around ML production use cases. Through this, we've seen 4 common patterns of machine learning in production: pipeline, e...