Data processing

Multimodal data curation

Build and run scalable pipelines to curate and prepare multimodal datasets for foundation model training with Ray on Anyscale.

Start for Free Ray Data Docs

Build and deploy multimodal data pipelines at scale

Run end-to-end multimodal pipelines with unified CPU and GPU processing with Ray on Anyscale.

Process any modality

Scale processing from raw unstructured data to tensors for model training.

Fast CPU + GPU pipelines

Eliminate I/O in between steps and keep CPUs and GPUs busy with streaming execution

Reliability at petabyte scale

Scale from one machine to thousands of nodes with elastic, fault-tolerant managed Ray clusters.

With Anyscale, our researchers can just write code without worrying about the underlying infrastructure.”

Adrian Li-Bell

Member of Technical Staff

Ray scheduling heterogeneous workloads is something we couldn’t really do easily before. We see much lower idle time and much better utilization. ”

Sam Jenkins

Senior MLOps Engineer, Tripadvisor

The fact that we don’t have to dedicate a person to make all of the plumbing and infrastructure work has been really valuable.”

Cindy Wang

Staff ML Engineer

With Anyscale, our researchers can just write code without worrying about the underlying infrastructure.”

Adrian Li-Bell

Member of Technical Staff

16+

Researchers building pipelines feeding VLA model training runs

Streaming execution

Maximize throughput with continuous processing across different stages vs. batch execution in traditional systems

Native GPU support

Support for different accelerators and topologies, multi-node inference, and integration with vLLM and SGLang

Job-level checkpointing

Resume from previous state without reprocessing already completed data after pause or failure

Advanced observability

Use tree and DAG dashboard views pinpoint bottlenecks and errors for faster debugging and optimization

Fast autoscaling

Scale resources dynamically based on workload and gracefully handle node failures without job interruption

Spot instances

Run reliably on discounted spot instances with built-in preemption recovery and on-demand fallback

Build. Run. Scale. Repeat.

Scale multimodal pipelines without growing operational complexity with Ray on Anyscale.

Image classification

Run distributed batch inference using a pretrained model

Object detection video processing

Run GPU-accelerated video analytics using a fine-tuned object detection model

Text embedding

Deploy pipeline that combines Langchain TextSplitter and embedding model from HF

Learn More

Icon - network

Distributed training, fine-tuning

Scale existing training code from one machine to thousands of GPUs with intuitive scaling configs

Composite AI serving

Serve one or many models and Python applications working together as a single API endpoint

Embedding generation

Process large-scale multimodal datasets for AI and applications with your model of choice

Data processing

Multimodal data curation

the problem

Multimodal data requires new infrastructure

Build and deploy multimodal data pipelines at scale

Process any modality

Fast CPU + GPU pipelines

Reliability at petabyte scale

16+

End-to-end multimodal AI pipelines that scale

Streaming execution

Native GPU support

Job-level checkpointing

Advanced observability

Fast autoscaling

Spot instances

Build. Run. Scale. Repeat.

Image classification

Object detection video processing

Text embedding

Explore more on Anyscale

Distributed training, fine-tuning

Composite AI serving

Embedding generation

Frequently Asked Questions

Explore Anyscale today

Data processing

Multimodal data curation

the problem

Multimodal data requires new infrastructure

Build and deploy multimodal data pipelines at scale

Process any modality

Fast CPU + GPU pipelines

Reliability at petabyte scale

16+

End-to-end multimodal AI pipelines that scale

Streaming execution

Native GPU support

Job-level checkpointing

Advanced observability

Fast autoscaling

Spot instances

Build. Run. Scale. Repeat.

Image classification

Object detection video processing

Text embedding

Explore more on Anyscale

Distributed training, fine-tuning

Composite AI serving

Embedding generation

Frequently Asked Questions

What is Ray Data?+-

What examples of multimodal AI pipelines does Ray Data support?+-

What’s the difference between Ray and Anyscale? +-

Beyond data processing what other workloads can I run in the Anyscale platform?+-

Where do my workloads run when using the Anyscale Platform?+-

Explore Anyscale today