Data processing

Multimodal data curation

Build and run scalable pipelines to curate and prepare multimodal datasets for foundation model training with Ray on Anyscale.

Torc Robotic
prenuvo
Tripadvisor Logo RGB Pine For Light Background
runway

the problem

Multimodal data requires new infrastructure

Multimodal data pipelines demand tight coordination between CPU and GPU computation. Teams achieve this with manual stitching of separate systems e.g. Spark for CPU-bound processing, containerized Python for GPU-bound model inference. These compute silos add latency and increase operational costs.

Build and deploy multimodal data pipelines at scale

Run end-to-end multimodal pipelines with unified CPU and GPU processing with Ray on Anyscale.

ui-multimodal-data-processing

icon-scale

Process any modality

Scale processing from raw unstructured data to tensors for model training.

icon-fast-time

Fast CPU + GPU pipelines

Eliminate I/O in between steps and keep CPUs and GPUs busy with streaming execution

icon-distribution

Reliability at petabyte scale

Scale from one machine to thousands of nodes with elastic, fault-tolerant managed Ray clusters.

With Anyscale, our researchers can just write code without worrying about the underlying infrastructure.
Adrian Li-Bell avatar
Adrian Li-Bell
Member of Technical Staff
Ray scheduling heterogeneous workloads is something we couldn’t really do easily before. We see much lower idle time and much better utilization.
Sam Jenkins avatar
Sam Jenkins
Senior MLOps Engineer, Tripadvisor
The fact that we don’t have to dedicate a person to make all of the plumbing and infrastructure work has been really valuable.
Cindy Wang avatar
Cindy Wang
Staff ML Engineer
With Anyscale, our researchers can just write code without worrying about the underlying infrastructure.
Adrian Li-Bell avatar
Adrian Li-Bell
Member of Technical Staff

16+

Researchers building pipelines feeding VLA model training runs

Explore more on Anyscale

Frequently Asked Questions

Explore Anyscale today

Build, run, and scale any AI workload on Ray with a multi-cloud platform built for production AI.