Home ResourcesModernize your Spark-based AI and data platform with Ray on Anyscale

Modernize your Spark-based AI and data platform with Ray on Anyscale

Apache Spark has long been the go-to engine for large scale data processing, powering everything from BI to ETL pipelines. But as data teams increasingly work with complex transformations and diverse data types such as text, image, video, and audio, new processing patterns have emerged relying on Python UDFs and AI models. This is where Ray, the modern engine for distributed Python, can help – supporting all types of data and AI models, from traditional workloads to the cutting edge.

Join this session to learn how to enhance your investments in open data formats and governance frameworks by seamlessly integrating Anyscale and Ray into your data platform.

LinkWe will discuss:

How to process unstructured and structured data with Ray Data
Ray’s actor-based execution model and how it parallelizes image, text, and audio workloads at scale, improving hardware utilization.
Integration patterns for reading data from Unity Catalog with Ray Data, performing AI-powered data transformations, and writing back to Unity Catalog.
Ray Ecosystem Benefits: downstream use cases unlocked by making Ray’s compute backend available to developers.
Live demo: How to run data pipelines such as embedding processing with Ray
Production Considerations: Considerations for integrating Ray into a production data platform, and how Anyscale provides the shortest and best path to production.

LinkWho should attend:

Existing Spark users and platform architects across data engineering, data science, research, and ML engineering.

Ready to try Anyscale?

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.

Get started now