Today, most frameworks for deep learning prototyping, training, and distributing to a cluster are either powerful and inflexible, or nimble and toy-like. Data scientists are forced to choose between a great developer experience and a production-ready framework.
To fix this gap, the Ray ML team has developed Ray Train.
Ray Train is a library built on top of the Ray ecosystem that simplifies distributed deep learning. Currently in stable beta in Ray 1.9, Ray Train offers the following features:
Scales to multi-GPU and multi-node training with zero code changes
Runs seamlessly on any cloud (AWS, GCP, Azure, Kubernetes, or on-prem)
Supports PyTorch, TensorFlow, and Horovod
Distributed data shuffling and loading with Ray Datasets
Distributed hyperparameter tuning with Ray Tune
Built-in loggers for TensorBoard and MLflow
In this webinar, we'll talk through some of the challenges in large-scale computer vision ML training, and show a demo of Ray Train in action.
Will is a Product Manager for ML at Anyscale. Previously, he was the first ML Engineer at Coinbase, and ran a couple of ML-related startups, one in the data labeling space and the other in the pharmaceutical space. He has a BS in CS and Music Composition from MIT, and did his master's thesis at MIT in machine learning systems. In his spare time, he produces electronic music, travels, and tries to find the best Ethiopian food in the Bay Area.
Matthew Deng is a software engineer at Anyscale where he works on distributed machine learning libraries built on top of Ray. Before that, he was a software engineer at LinkedIn. He holds a BS in Electrical Engineering and Computer Science from UC Berkeley.
Amog Kamsetty is a software engineer at Anyscale where he works on building distributed training libraries and integrations on top of Ray. He previously completed his MS degree at UC Berkeley working with Ion Stoica on machine learning for database systems.