Best Machine Learning Talks from Ray Summit 2021

By Michael Galarnyk   

Ray Summit 2021 had 12 keynotes, 45 sessions, and 2 tutorials. While Ray Summit is technically over, there is still time to listen to the content you may have missed as well as enjoy your favorite talks again on the Ray Summit platform. There were numerous impressive technical sessions about using Ray for scalable Python, machine learning (including deep learning), reinforcement learning and data processing across a wide variety of use cases. This post highlights some of the most popular and impressive machine learning talks in order to show what is possible with Ray and its ecosystem!

LinkML Platform Panel

ML Platform

The ML Platform on Ray Panel moderated by Zhe Zhang of Anyscale was a fan favorite. The panel was composed of industry experts and leaders from Uber, Shopify and Robinhood. In this panel discussion, they discussed the state of ML within their organizations, including the technology landscape and use cases. They also shared their motivations behind adopting Ray as the unified compute substrate for their production ML platforms and what they think about the future of ML.

LinkDistributed XGBoost on Ray

XGBoost
The talk showed how Uber has combined data processing in Spark and distributed model training on XGBoost-Ray and so much more.

Distributed XGBoost on Ray” was a two part talk where Kai Fricke from Anyscale introduced XGBoost on Ray and Michael Mui from Uber talked about how Uber is using XGBoost-Ray within their internal machine learning infrastructure. Some of the features of XGBoost-Ray are full GPU support, advanced fault tolerance, and a seamless integration with the hyperparameter optimization library Ray Tune. The talk covered some challenges of distributed ML and DL at scale like distributed training, distributed hyperparameter optimization, and dealing with heterogeneous compute across workflow stages (a lot of effort goes into stitching systems together). For more details on Uber’s motivation for using XGBoost-Ray as well as benchmarks showing the utility of XGBoost-Ray, check out Uber’s blog. To learn about how Ray is unifying the distributed machine learning ecosystem, check out the talk “A Growing Ecosystem of Scalable ML Libraries on Ray”.

LinkScaling Ecosystem Restoration: How Ray and Anyscale Make it Easy to do Massive-scale ML on Aerial Imagery

Dendra Ray Summit

Traditional methods of ecological restoration do not scale for a variety of reasons. The start of Dendra Systems’ talk from Richard Decal discussed this problem and how they are on a mission to create the tools needed to power scalable ecosystem restoration. One of these tools includes drones to do ultrahigh resolution mapping at an unprecedented scale and machine learning to analyze that imagery to derive insights as well as drones for highly efficient seeding.

Dendra ML Platform

The majority of this talk focused on how these tools were made possible through Dendra’s machine learning platform that utilizes Ray and Anyscale. This inspirational talk is definitely worth watching!

LinkEnd-to-End AutoML with Ludwig on Ray

Ludwig
The original implementation of Ludwig was mostly focused on a single machine.

Ludwig is an open source AutoML framework that allows you to train and deploy state-of-the-art deep learning models with no code required. The original implementation of Ludwig was mostly focused on a single machine. This approach had a few limitations like data needing to fit in memory, no distributed training, and no parallel evaluation. One of the co-maintainers of Ludwig, Travis Addair gave a talk “End-to-End AutoML with Ludwig on Ray” that showed that with only a single parameter on the command line, the same Ludwig configuration used to train models on your local machine can be scaled to train on massive datasets across hundreds of machines in parallel using Ludwig on Ray.

Ludwig on Ray
Ludwig on Ray provides Dask for processing, Horovod on Ray for doing distributed training, and Dask on Ray for doing distributed evaluation.

This talk is an excellent example of the power of combining multiple libraries in the Ray ecosystem.

LinkAnyscale Demo: Machine Learning Application from Dev to Prod

Ray Summit Anyscale Demo 2021
Anyscale empowers teams of all sizes to put ML in production.

The typical route for scaling Ray applications has been Ray’s Cluster Launcher. Anyscale was recently launched in a private beta as a fully managed alternative. Edward Oakes, Anyscale Software Engineer, presented the "Anyscale Demo: Machine Learning Application from Dev to Prod". The demo showed a full machine learning-based application lifecycle on Anyscale and how it makes it seamless both to go from developing on your laptop to running on a cluster and to transition from development to serving in production. To learn about how companies are using Anyscale, check out Anastasia’s talk about how they use Ray and Anyscale to speed up their ML processes as well as Dendra’s talk on how Ray and Anyscale make it easy to do massive-scale ML on aerial imagery

LinkConclusion

This article only highlighted some of the many impressive machine learning talks from Ray Summit 2021. There were also many impressive reinforcement learning talks at the event. You can watch all the talks and keynotes for free and on-demand, check out raysummit.anyscale.com. To keep up to date with all things Ray, consider following @raydistributed on twitter, sign up for the newsletter, and be sure to attend the next Ray Summit.

Next steps

Anyscale's Platform in your Cloud

Get started today with Anyscale's self-service AI/ML platform:


  • Powerful, unified platform for all your AI jobs from training to inference and fine-tuning
  • Powered by Ray. Built by the Ray creators. Ray is the high-performance technology behind many of the most sophisticated AI projects in the world (OpenAI, Uber, Netflix, Spotify)
  • AI App building and experimentation without the Infra and Ops headaches
  • Multi-cloud and on-prem hybrid support