Posts by Stephanie Wang

Dask+Ray
06 . 29 . 2021

Analyzing memory management and performance in Dask-on-Ray

Ray is a general-purpose distributed system. One of Ray's goals is to seamlessly integrate data processing libraries (e.g., Dask, Spark) into distributed applications. As part of this goal, Ray provides a robust distributed memory manager. The goal...

ThumbnailNo MapReduceSystem
03 . 22 . 2021

Executing a distributed shuffle without a MapReduce system

A distributed shuffle is a data-intensive operation that usually calls for a system built specifically for that purpose. In this blog post, we’ll show how a distributed shuffle can be expressed in just a few lines of Python using Ray, a general-purpo...

AspectDataProcessingImage
02 . 16 . 2021

Data Processing Support in Ray

This blog post highlights two features in the latest Ray 1.2 release: native support for spilling to external storage, and support for libraries from the Python data processing ecosystem, including integrations for PySpark and Dask.