Posts by Leonnardo Rabello

The live traffic pattern is variable and predictable. Serve’s replica autoscaling drives the cost down further during idle time.
10 . 19 . 2021

Cheaper and 3X Faster Parallel Model Inference with Ray Serve

Wildlife Studios’s ML team was deploying sets of ensemble models using Flask. It quickly became too hard and too expensive to scale. By using Ray Serve, Wildlife Studios was able to improve the latency and throughput while reducing the cost. Ray Serv...