ML Infra + Apps

Zero-copy model loading with Ray and PyTorch

Ray Summit 2022

One of the challenges of using deep learning in production is managing the cost of loading models for inference. In this talk, we'll show how you can reduce this cost almost to zero by leveraging features of PyTorch and Ray. We'll introduce the concept of zero-copy model loading, storing the weights of a deep learning model in shared memory so that any process with access to the shared memory segment can load the model nearly instantaneously. We'll show, with code examples, how to implement zero-copy model loading with PyTorch and Ray. Then we'll introduce our open-source library, zerocopy, which lets you apply zero-copy model loading to your PyTorch model by changing two lines of Python code in your Ray application. We'll finish off with a simple benchmark study that shows how zero-copy model loading lets you run NLP models with stateless Ray tasks instead of heavyweight actors, giving you a self-tuning model deployment that delivers dramatically better performance than a naive deployment of the same models to Ray Serve.

About Fred

Fred Reiss is a principal research staff member at IBM Research - Almaden. He also works with IBM's Center for Open Source Data and AI Technologies (CODAIT.org), an IBM open source lab in San Francisco. Fred is the primary author of the zerocopy library, a component of IBM Research's Project CodeFlare. Fred received his PhD from UC Berkeley in 2006 and immediately joined IBM Research. Fred has published multiple peer-reviewed papers in the areas of natural language processing, database systems, and machine learning.

Fred Reiss

Principal Research Staff Member, IBM Research
chucks
Ray Summit 2022 horizontal logo

Ready to Register?

Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.

Save your spot
register-bottom-mobile
beanbags

Join the Conversation

Ready to get involved in the Ray community before the conference? Ask a question in the forums. Open a pull request. Or share why you’re excited with the hashtag #RaySummit on Twitter.