Understanding the Ray Ecosystem and Community

By Ben Lorica and Ion Stoica   

Ray is both a general purpose distributed computing platform and a collection of libraries targeted at machine learning and other workloads.

Update: Ben has created a short video covering the highlights of this post.

Ray is usually described as a distributed computing platform that can be used to scale Python applications with minimal effort. While this is true, in recent conversations we’ve been highlighting the fact that Ray also draws users with very specific needs. There are now several popular open source libraries built on top of Ray that are attracting users on their own. In this post we describe some early observations about the Ray community.

LinkA framework for building frameworks

To understand its user base, let’s start by briefly describing some popular libraries and tools built on top of Ray. These examples show that Ray is general enough that companies and users are building flexible libraries and other frameworks on top of it.

pasted image 0

There are two types of libraries in the Ray ecosystem: (1) libraries for new workloads and (2) drop-in replacements for parallelizing existing libraries.

Library Description
Tune A popular Python library “for experiment execution and hyperparameter tuning at any scale”. In machine learning, hyperparameter tuning is the computationally intensive process of choosing an optimal set of hyperparameters for a model. There are many users of Tune spanning different application domains and it is now one of the more popular open source libraries for hyperparameter tuning.
RLlib An “open-source library for reinforcement learning that offers both high scalability and a unified API for a variety of applications”.  RLlib is already being used in several enterprises. For example, JPMorgan uses RLlib to power electronic trading models. As we start seeing more applications and use cases for reinforcement learning, we expect RLlib to attract a large share of developers needing to integrate RL into their applications.
RaySGD A new lightweight Python library that simplifies distributed training for PyTorch and TensorFlow. Users can scale to hundreds of GPUs across multiple nodes with a single parameter. This solves an increasingly common bottleneck – the need to train very large deep learning models. In a previous post, we outlined reasons why distributed training will increasingly become the norm.
Ray Serve A new model serving library that supports complex and scalable machine learning pipelines. It can be used to serve predictions for large-scale, interactive machine learning applications.
Classy Vision This end-to-end computer vision library from Facebook Research, uses Ray for cluster management and distributed training.
Thinc A new lightweight deep learning library (from the creators of spaCy), Thinc recommends Ray for distributed training.
Modin A distributed data processing library from UC Berkeley’s RISELab, aimed at Python developers who already use the pandas library for data analysis and data wrangling.
AsyncIO Ray “enables seamless integration” with Python asyncio. Python developers use asyncio to write concurrent code on a single CPU. Ray now allows them to scale their existing asyncio applications to multi-core and multi-node settings.
Distributed multiprocessing Ray allows developers to scale Python multiprocessing from a single node to a cluster. Python multiprocessing is a library that let’s Python programmers take advantage of multi-CPU machines.
Distributed scikit-learn and joblib This library makes it easy for developers to scale applications that use the popular machine learning library, scikit-learn, from a single node to a cluster.

LinkHaving a variety of libraries appeals to Python developers

Unlike other “end-to-end machine learning platforms”, each of these libraries and tools can be used on their own (a simple pip install and off you go). For example, if you are happy with your model training tools but want to replace your model serving framework, you can use Ray Serve without needing to introduce the other Ray libraries. This flexibility explains why Ray’s libraries have attracted many developers who need to build machine learning applications. Developers and machine learning engineers are accustomed to building applications that use many different libraries written by different parties (having a number of “import” statements in a Python program is quite common). 

LinkLibraries are helping grow Ray’s user base

Another important point to emphasize is that each of these libraries helps grow awareness and usage of Ray. In recent conversations with users we found that many of them began using Ray through one of these libraries and tools. Communities are even starting to sprout around each library (there’s an active Slack channel for each). Some of the libraries (Tune, RLlib) now boast a number of production use cases, while the newer libraries (RaySGD, Serve) are attracting many early users.

These libraries also serve as entry points through which many users learn about the broader Ray ecosystem. While Ray users can pick and choose which libraries and tools to use, we have long believed that most of them will end up using more than one of these libraries. Recent conversations with over forty companies have borne this out – as users become comfortable with one of these tools they end up using the other Ray libraries as well.

LinkThere are also many users of the core API

So far we’ve focused on users of libraries and tools built on top of Ray. But we’d like to stress that there are also many users of “Ray core”. They tend to fall into two main categories: those who need to scale a Python program or application and those who need to create scalable and high-performance libraries. For example, we recently spoke with a researcher from Oak Ridge National Laboratory who needed Ray for several large-scale machine learning and natural language applications. Another example comes from one of the largest financial services companies in China: Ant Financial has written libraries on top of Ray to support their massive scale fraud detection and personalization systems.

We plan to conduct a formal survey of Ray users in the very near future. But for now, think of Ray as both a general purpose distributed computing platform and as a collection of libraries targeted at machine learning and other workloads. Unlike monolithic platforms, users of Ray are free to use one or more of the existing libraries or to use Ray to build their own.

Next steps

Anyscale's Platform in your Cloud

Get started today with Anyscale's self-service AI/ML platform:

  • Powerful, unified platform for all your AI jobs from training to inference and fine-tuning
  • Powered by Ray. Built by the Ray creators. Ray is the high-performance technology behind many of the most sophisticated AI projects in the world (OpenAI, Uber, Netflix, Spotify)
  • AI App building and experimentation without the Infra and Ops headaches
  • Multi-cloud and on-prem hybrid support