Ray Deep Dives

Deep dive into data ingest with AIR + Datasets

Wednesday, August 24
11:30 AM - 12:00 PM

Does your training dataset fit in memory on one machine? Or is it too big even for a large cluster? This talk is for both camps. We will cover how Ray AIR uses Datasets for efficient data loading and preprocessing of training data. We start by walking through high level utilities and ways to set up and debug data loading performance in AIR. Next we cover support for different image and data format modalities. Finally, we dive into what's happening in Ray under the hood when you're running training at scale with AIR, for a full understanding of the stack.

About Clark

Clark Zinzow is a software engineer at Anyscale. He loves working at the boundaries of service, data, and compute scale. When he's not reading papers or trapped under a stack of books, you can probably find him biking, skiing, or trapped under a different stack of books.

Clark Zinzow

Software Engineer, Anyscale
chucks
Ray Summit 2022 horizontal logo

Ready to Register?

Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.

Save your spot
register-bottom-mobile
beanbags

Join the Conversation

Ready to get involved in the Ray community before the conference? Ask a question in the forums. Open a pull request. Or share why you’re excited with the hashtag #RaySummit on Twitter.