ML Infra + Apps

Scalable training of language models using Ray, JAX, and TPUv4

Ray Summit 2022

Modern large language models require distributed training strategies due to their size. The challenges of efficiently and robustly training them are met with rapid developments on both software and hardware frontiers. In this talk, we explore challenges and design decisions associated with developing a scalable training framework, and present a quantitative analysis of efficiency improvements coming from adopting new software and hardware solutions, such as Ray, JAX pjit, and TPUv4.

About Joanna

Joanna Yoo is a machine learning engineer at Cohere, where she is building a scale-first training framework that powers language models. She uses JAX, TPUv4, and Ray to scale language models to hundreds of billions of parameters.

About Kuba

Kuba Perlin is a machine learning engineer at Cohere, working with JAX, TPUv4, and Ray to scale language models to hundreds of billions of parameters.

About Siddhartha Rao

Siddhartha Rao Kamalakara is a machine learning engineer and is one of the lead developers of FAX. His interests lie at the intersection of systems and ML. He has previously worked on ML + proteins, sparsity, and efficient matrix approximations. Outside of work, he is into filmmaking and photography.

Joanna Yoo

Machine Learning Engineer, Cohere

Kuba Perlin

Machine Learning Engineer, Cohere

Siddhartha Rao Kamalakara

Machine Learning Engineer, Cohere
chucks
Ray Summit 2022 horizontal logo

Ready to Register?

Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.

Save your spot
register-bottom-mobile
beanbags

Join the Conversation

Ready to get involved in the Ray community before the conference? Ask a question in the forums. Open a pull request. Or share why you’re excited with the hashtag #RaySummit on Twitter.