Posts by Simon Mo

The live traffic pattern is variable and predictable. Serve’s replica autoscaling drives the cost down further during idle time.
10 . 19 . 2021

Cheaper and 3X Faster Parallel Model Inference with Ray Serve

Wildlife Studios’s ML team was deploying sets of ensemble models using Flask. It quickly became too hard and too expensive to scale. By using Ray Serve, Wildlife Studios was able to improve the latency and throughput while reducing the cost. Ray Serv...

Where Ray Serve Fits In
10 . 01 . 2021

Serving ML Models in Production: Common Patterns

Over the past couple years, we've listened to ML practitioners across many different industries to learn and improve the tooling around ML production use cases. Through this, we've seen 4 common patterns of machine learning in production: pipeline, e...