HomeResourcesScaling Batch Inference: From Computer Vision to LLMs

Scaling Batch Inference: From Computer Vision to LLMs

Batch inference is a core pattern in AI pipelines, and scaling it effectively is essential for running AI in production.

Running batch inference on unstructured datasets, whether running computer vision models on images and/or video or running large language models (LLMs) on text, can be operationally complex due to the required coordinating of CPUs and GPUs. 

CPUs handle the heavy lifting of preprocessing tasks like image decoding or resizing, and text tokenization. But unless they deliver a steady stream of model-ready data, GPUs sit idle, underutilizing their capacity. 

Many teams run into this bottleneck with existing orchestration frameworks (e.g., Airflow, Dagster, Databricks Lakeflow), which weren’t built with this type of CPU/GPU coordination in mind.

In this session, you’ll see how the Ray open-source framework for distributed Python workloads addresses the challenge of GPU-centric, but CPU-bound workloads with practical examples, including:

  • Scaling image classification with PyTorch across large datasets.

  • Running LLM workloads such as embeddings and evaluation efficiently with vLLM.

  • Applying the same patterns using Ray Data for other multimodal data processing use cases for text, video, and audio processing at scale.

LinkWho should attend: 

AI and data engineers, ML practitioners, and platform teams building or evaluating solutions that scale for NLP, computer vision, video processing and RAG pipelines.


Ready to try Anyscale?

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.