Multimodal AI has finally unlocked 90% of the world’s data – video audio, PDFs, sensor streams etc. But using AI to process this data surfaces a hard engineering problem: multimodal pipelines interleave CPU-bound preprocessing with GPU-bound inference. Traditional batch architectures are inefficient at handling both, and the mismatch leaves GPUs idle more than 50% of the time.
In this session you will learn:
Why multimodal data pipelines break traditional batch processing engines and architectures
How to avoid I/O between CPU → GPU steps that lead to +50% GPU idle time
How to build a streaming, CPU+GPU pipeline with Ray Data
You'll leave with a framework for architecting multimodal data pipelines that scale, and a clear view of how Ray and Anyscale work together to keep your GPUs busy.
If you are a computer vision engineer, researcher, or ML engineer working on physical AI or multimodal systems, this session is for you.