Posts by Eric Liang

blog-ray-datasets-thumb
02 . 14 . 2022

Ray Datasets for large-scale machine learning ingest and scoring

We're happy to introduce Ray Datasets: A data loading and preprocessing library built on Ray that leverages Ray’s task, actor, and object APIs to enable large-scale machine learning ingest, training, and inference within a single Python application.

3rdGenTasks andActors
11 . 30 . 2021

Deep Dive: Data Ingest in a Third Generation ML Architecture

Distributed libraries allow improved performance by exploiting the full bandwidth of distributed memory, and giving greater programmability. But how does that actually work? What does the code look like?

In this post, we’ll be looking at a concrete...

mlplatformCropped
10 . 06 . 2021

Why Third Generation ML Platforms are More Performant

In a previous blog post, we defined a "3rd generation ML platform" as one that offered full programmability for ML workflows. Key to a 3rd generation platform is the concept of a programmable compute layer. In this blog, we report on emerging pattern...

Ray Distributed Library Patterns (Figure 2)
06 . 14 . 2021

Ray Distributed Library Patterns

Ray has many library integrations, from machine learning libraries such as Horovod and Hugging Face to data processing frameworks such as Spark, Modin, and Dask. But what does it mean to be "integrated with Ray"? And what benefits does it provide to...

AspectDataProcessingImage
02 . 16 . 2021

Data Processing Support in Ray

This blog post highlights two features in the latest Ray 1.2 release: native support for spilling to external storage, and support for libraries from the Python data processing ecosystem, including integrations for PySpark and Dask.

Birds
11 . 05 . 2020

The Ideal Foundation for a General Purpose Serverless Platform

ray-1.0 image
09 . 30 . 2020

Announcing Ray 1.0