Anyscale Connect

Ray Train: Production-ready distributed deep learning

Wednesday, February 9, 5:00PM UTC

Today, most frameworks for deep learning prototyping, training, and distributing to a cluster are either powerful and inflexible, or nimble and toy-like. Data scientists are forced to choose between a great developer experience and a production-ready framework.

To fix this gap, the Ray ML team has developed Ray Train.

Ray Train is a library built on top of the Ray ecosystem that simplifies distributed deep learning. Currently in stable beta in Ray 1.9, Ray Train offers the following features:

  • Scales to multi-GPU and multi-node training with zero code changes

  • Runs seamlessly on any cloud (AWS, GCP, Azure, Kubernetes, or on-prem)

  • Supports PyTorch, TensorFlow, and Horovod

  • Distributed data shuffling and loading with Ray Datasets

  • Distributed hyperparameter tuning with Ray Tune

  • Built-in loggers for TensorBoard and MLflow

In this webinar, we'll talk through some of the challenges in large-scale computer vision ML training, and show a demo of Ray Train in action.

Resources

Speakers

Will Drevo

Will Drevo

Product Manager, Anyscale

Will is a Product Manager for ML at Anyscale. Previously, he was the first ML Engineer at Coinbase, and ran a couple of ML-related startups, one in the data labeling space and the other in the pharmaceutical space. He has a BS in CS and Music Composition from MIT, and did his master's thesis at MIT in machine learning systems. In his spare time, he produces electronic music, travels, and tries to find the best Ethiopian food in the Bay Area.

Matthew Deng

Matthew Deng

Software Engineer

Matthew Deng is a software engineer at Anyscale where he works on distributed machine learning libraries built on top of Ray. Before that, he was a software engineer at LinkedIn. He holds a BS in Electrical Engineering and Computer Science from UC Berkeley.

Amog Kamsetty

Amog Kamsetty

Software Engineer, Anyscale

Amog Kamsetty is a software engineer at Anyscale where he works on building distributed training libraries and integrations on top of Ray. He previously completed his MS degree at UC Berkeley working with Ion Stoica on machine learning for database systems.