Ray Deep Dives

Large-scale data shuffle in Ray with Exoshuffle

Ray Summit 2022

Shuffle is a key primitive in large-scale data processing applications. The difficulty of large-scale shuffle has inspired a myriad of implementations. While these have greatly improved shuffle performance and reliability over time, it comes at a cost: flexibility. We show that contrary to the popular wisdom, shuffle can be implemented with high performance and reliability on a general-purpose system for distributed computing: Ray. In this talk we present Exoshuffle, an application-level shuffle system that outperforms Spark and achieves 82% of theoretical performance on a 100TB sort on 100 nodes. In Ray 2.0, we have integrated Exoshuffle with the Datasets library to provide high-performance large-scale shuffle for ML users.

About Stephanie

Stephanie Wang is a PhD student in distributed systems at UC Berkeley, a software engineer at Anyscale, and a lead committer for the Ray project. Currently, she's working on problems such as fault tolerance and distributed memory management. She is generally interested in the problem of making general-purpose distributed programming possible and in designing fast and reliable distributed systems.

About Jiajun

Jiajun Yao is a software engineer at Anyscale and a committer for the Ray project. He is interested in making distributed computing easily accessible to everyone. Before joining Anyscale, Jiajun was a software engineer at LinkedIn building the graph database.

Stephanie Wang

Software Engineer, Anyscale

Jiajun Yao

Software Engineer, Anyscale
Ray Summit 2022 horizontal logo

Ready to Register?

Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.

Save your spot

Join the Conversation

Ready to get involved in the Ray community before the conference? Ask a question in the forums. Open a pull request. Or share why you’re excited with the hashtag #RaySummit on Twitter.