Tuesday, August 23
1:30 PM - 2:00 PM
Machine learning is widely applied across various teams at Instacart to improve the experience for customers, shoppers, brand partners, and retailers. Instacart fulfillment ML helps more than 200,000 shoppers locate the fastest and most efficient delivery route for delivering products from 96,000 stores to hundreds of millions of homes all over the US and Canada. To predict regional-level fulfillment metrics, such as delivery ETA or supply/demand ratio, it is common to train and deploy one model per regional zone.
At Instacart, we have around 2,000 unique regional zones for which we want to build an efficient job-queue system to highly parallelize model training workflows so that each re-training iteration can be completed as quickly as possible. In this talk, we'll cover how we previously solved for this using a queueing system built with Celery deployed on AWS ECS, and how by moving to Ray distributed frameworks we've been able to scale up to more parallel workers with less cost compared with the previous service on ECS, decreasing training time from 2-3 days to just a few hours.
Han Li is a machine learning engineer on the Instacart ML Infrastructure team. Her current working area is focused on building and standardizing MLOps solutions to better support model development and deployment in Instacart production scale. Before joining Instacart, she pursued her masters degree in computer engineering at the University of Texas at Austin, and later joined Meta to work on various projects, including urban mesh networks, software-defined routing protocols, and a distributed training platform for large-scale recommendation systems.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.Save your spot