Tuesday, August 23
11:30 AM - 12:00 PM
One of the challenges of using deep learning in production is managing the cost of loading models for inference. In this talk, we'll show how you can reduce this cost almost to zero by leveraging features of PyTorch and Ray. We'll introduce the concept of zero-copy model loading, storing the weights of a deep learning model in shared memory so that any process with access to the shared memory segment can load the model nearly instantaneously. We'll show, with code examples, how to implement zero-copy model loading with PyTorch and Ray. Then we'll introduce our open-source library, zerocopy, which lets you apply zero-copy model loading to your PyTorch model by changing two lines of Python code in your Ray application. We'll finish off with a simple benchmark study that shows how zero-copy model loading lets you run NLP models with stateless Ray tasks instead of heavyweight actors, giving you a self-tuning model deployment that delivers dramatically better performance than a naive deployment of the same models to Ray Serve.
Fred Reiss is a principal research staff member at IBM Research - Almaden. He also works with IBM's Center for Open Source Data and AI Technologies (CODAIT.org), an IBM open source lab in San Francisco. Fred is the primary author of the zerocopy library, a component of IBM Research's Project CodeFlare. Fred received his PhD from UC Berkeley in 2006 and immediately joined IBM Research. Fred has published multiple peer-reviewed papers in the areas of natural language processing, database systems, and machine learning.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.Save your spot