Multimodal data: Architecting pipelines that don’t break at scale

Thursday, May 14 8:30 AM PDT | 11:30 AM EDT | 5:30 PM CEST

Multimodal AI has finally unlocked 90% of the world’s data – video audio, PDFs, sensor streams etc. But using AI to process this data surfaces a hard engineering problem: multimodal pipelines interleave CPU-bound preprocessing with GPU-bound inference. Traditional batch architectures are inefficient at handling both, and the mismatch leaves GPUs idle more than 50% of the time.

In this session you will learn:

Why multimodal data pipelines break traditional batch processing engines and architectures
How to avoid I/O between CPU → GPU steps that lead to +50% GPU idle time
How to build a streaming, CPU+GPU pipeline with Ray Data

You'll leave with a framework for architecting multimodal data pipelines that scale, and a clear view of how Ray and Anyscale work together to keep your GPUs busy.

If you are a computer vision engineer, researcher, or ML engineer working on physical AI or multimodal systems, this session is for you.