New in KubeRay 0.2.0: Autoscaling (alpha), simplified installation, and more

By Jiaxin Shan and KubeRay Team   

Jiaxin Shan is a software engineer at ByteDance.

The KubeRay project was released in Oct 2021, and it was developed in collaboration with Anyscale, Ant Group, ByteDance, and Microsoft. Over the past 6 months, KubeRay has become a popular toolkit for managing Ray clusters on Kubernetes.

LinkWhat's new? 

The KubeRay 0.2 release introduces several important enhancements:

  • Integration with Ray autoscaler (alpha) and simplified autoscaler setup

  • gRPC service and CLI for easy integration

  • Simplified installation using the Kustomize tool

LinkAutoscaling alpha version support

Ray v1.11.0 released the minimum viable product for Ray autoscaler integration with KubeRay. It is not ready for prime time or general use, but should be enough for interested parties to get started. It adds a new NodeProvider implementation KuberayNodeProvider, which is used to interact with custom resource RayCluster defined in KubeRay. 

KubeRay further simplifies autoscaler setup by providing a new field, enableInTreeAutoscaling: bool. With this change, users no longer need to manually configure the autoscaler container in Ray head. 

LinkgRPC service and CLI for easy integration

User feedback has shown that there is a learning curve to manage Ray clusters in native Kubernetes, since this requires a sophisticated permission system and people need to carefully write YAML files correctly. 

In order to overcome these challenges for Ray users and improve the integration, KubeRay now includes a generic abstraction on top of RayCluster CRDs and introduces a backend service backed by gRPC and gateway. Users can easily talk to the service to operate a cluster using HTTP or gRPC. ByteDance is adopting this method to build their Ray testing infrastructure. A simple kuberay CLI is provided to end users to further reduce the learning curve. 

LinkSimplified installation

KubeRay now can be deployed using the Kustomize installation tool. This flexible installation pattern simplifies customization by overlaying manifests. A few companies use this pattern to replace images, inject environment variables, etc. by extending the base manifests. 

KubeRay also introduces Helm charts as an alternative to simplify the control plane installation experience for Helm users.

In addition to the major enhancements above, the community has also shipped a number of bug fixes and stability and performance enhancements. You can check out the full changelog here.

LinkWhat's coming?

KubeRay contributors are working on KubeRay 0.3 planning. Job and Serve CRD will be introduced to further simplify workload management efforts. The community would also like to improve the workspace-centric development experience and cluster observability. Please check out the milestones for the 0.3 release for more details. 

LinkGet involved

Achieving this milestone has truly been a community effort. We would like to thank everyone for their efforts on the KubeRay 0.2 release, especially the users, code contributors, and maintainers. As you can see from the extensive contributions to KubeRay 0.2, the KubeRay community is vibrant and diverse and is solving real-world problems for Ray users around the world. 

Newcomers to the KubeRay project are always welcome! The KubeRay contributors hold open meetings and are always looking for more volunteers and users to unlock the potential of Ray experiences on Kubernetes. Use the following resources to ask questions and troubleshoot issues as you get acquainted with the project. 

Next steps

Anyscale's Platform in your Cloud

Get started today with Anyscale's self-service AI/ML platform:


  • Powerful, unified platform for all your AI jobs from training to inference and fine-tuning
  • Powered by Ray. Built by the Ray creators. Ray is the high-performance technology behind many of the most sophisticated AI projects in the world (OpenAI, Uber, Netflix, Spotify)
  • AI App building and experimentation without the Infra and Ops headaches
  • Multi-cloud and on-prem hybrid support