Home BlogBlog Detail

Configuring and Scaling ML with Hydra + Ray

By Richard Liaw, Bill Chambers and Jieru Hu | January 26, 2021

Hydra, from Facebook AI, is a framework for elegantly configuring complex applications. Since its initial release, Hydra has become a popular framework adopted by researchers and practitioners. We are happy to announce that users can now scale and launch jobs to the cloud through the new Hydra Ray Launcher!

Ray is a library that fits Hydra’s needs perfectly. Ray is a simple yet powerful Python library for parallel and distributed programming with a great ecosystem of ML libraries (for distributed training, reinforcement learning, and model serving), as well as community libraries and integrations (e.g., Dask on Ray, Horovod on Ray).

Hydra Ray Launcher enables you to easily configure and launch your application on Ray in 3 different ways. You can launch your application by:

Starting or connecting to a Ray cluster on AWS EC2 for short lived clusters
Connecting to an existing Ray cluster if you have a long running Ray cluster.
Starting a new Ray cluster locally.

LinkWalkthrough

Below, we walk you through installation and running the example applications provided by the launcher. Please check Hydra Ray Launcher’s documentation for more details.

LinkInstallation

pip install hydra-ray-launcher --pre

LinkLaunch to an AWS cluster

Launching a Ray application on AWS can be done by setting hydra/launcher=aws. This will allow your application to run on a new or existing AWS EC2 Ray Cluster. Launching on AWS is built on top of Ray’s cluster launcher CLI (learn more here). The cluster launcher CLI expects an autoscaler yaml for cluster configurations.

You can configure your cluster just like how you configure any Hydra application: yaml, strict config and command line override.

1@hydra.main(config_name="config")
2def my_app(cfg: DictConfig) -> None:
3    log.info(f"Executing task {cfg.task}")
4    time.sleep(1)
5
6if __name__ == "__main__":
7    my_app()

From the command line:

LinkLaunch on an existing Ray cluster

You can also launch the application on an existing Ray cluster by configuring the launcher to connect to the cluster. For instance, you might want to run the same Ray script on your local machine by accessing the Ray cluster you started previously.

LinkLaunch by spinning up a new Ray cluster locally (For testing)

For a quick test locally, you can spin up a Ray cluster at initialization time. Note how the only difference is to specify the hydra launcher as ray, as opposed to ray_aws shown in the previous example.

LinkNext steps

Please check out Hydra Ray Launcher’s documentation for more details on the launcher. If you’d like to learn more about Hydra, check out the Hydra Website and join the community. If you’d like to learn more about Ray, check out the Ray Documentation, Ray Tutorials on Anyscale Academy, and join the Ray forums.

We’d love to hear your feedback and experience with the new Hydra Ray launcher!

Walkthrough

Sharing

Sign up for product updates

Deploy DeepSeek‑R1 with vLLM and Ray Serve on Kubernetes

Introducing KubeRay v1.4

The architecture of a Reinforcement Learning (RL) library is split into two primary components: Generation and Training. During the generation phase, an LLM Engine performs multi-turn rollouts within an environment to produce data and reward signals. This output is then fed into the training phase to update the model's parameters. This process forms a feedback loop, where the progressively improved model generates the next iteration of data for continuous refinement.

Open Source RL Libraries for LLMs

Ready to try Anyscale?

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.