Ray is a general-purpose distributed system. One of Ray's goals is to seamlessly integrate data processing libraries (e.g., Dask, Spark) into distributed applications. As part of this goal, Ray provides a robust distributed memory manager. The goal...
A distributed shuffle is a data-intensive operation that usually calls for a system built specifically for that purpose. In this blog post, we’ll show how a distributed shuffle can be expressed in just a few lines of Python using Ray, a general-purpo...
This blog post highlights two features in the latest Ray 1.2 release: native support for spilling to external storage, and support for libraries from the Python data processing ecosystem, including integrations for PySpark and Dask.