Home BlogBlog Detail

Flexible, cross-language, distributed model inference framework: Ray Serve with Java API

By Tengwei Cai, Yang Liu, Chengxi Luo, Xiaofeng Yang, Simon Mo | December 13, 2022

This is a guest blog from Ray contributors at Ant Group and Anyscale's Ray team, showcasing how to use cross-language, distributed model inference framework using Ray Serve with Java API.

LinkWhat is Ray Serve?

Ray Serve is an online model inference and agnostic framework built atop the Ray framework. Compared with other inference service frameworks, Ray Serve focuses on elastic scaling, optimizing inference graphs composed of multiple models, and scenarios where a large number of models multiplexes a small amount of hardware. It highlights Ray's flexible scheduling capabilities and high-performance RPC. Ray Serve supports any machine learning framework for serving models, has its own batch processing function to improve throughput, and natively supports the FastAPI framework.

LinkBackground on Ray

Ray aims to provide a universal API for distributed computing. A core part of achieving this goal is to provide simple but general programming abstractions, letting the system do all the hard work. This philosophy is what makes it possible as a developer to use Ray with existing Python and Java libraries and systems.

Ray seeks to enable the development and composition of distributed applications and libraries in general. Concretely, this includes coarse-grained elastic workloads (i.e., types of serverless computing), machine learning training (e.g., Ray Train), online serving (e.g., Ray Serve), data processing (e.g., Ray Datasets, Modin, Dask-on-Ray), and ad-hoc computation (e.g., parallelizing Python apps, gluing together different distributed frameworks).

Ray's API enables developers to easily compose multiple libraries within a single distributed application. For example, Ray tasks and actors may call into or be called from distributed training (e.g., torch.distributed) or online serving workloads also running in Ray. In this sense, Ray makes for an excellent "distributed glue" system, because its API is general and performant enough to serve as the interface between many different workload types.

LinkRay Serve with Java

Java is one of the mainstream programming languages in the computer industry, and a large number of users use Java as their main language. As of Ray 2.0, Ray Serve supports Java natively. Users can deploy their own Java code as a Deployment, and call and manage it through the Java API. At the same time, the Python API can also act on Java Deployment across languages, and vice versa.

In this post, we will introduce the Java part of Ray Serve in detail and step through simple steps how to get started using Java with Ray Serve

LinkStarting Ray Serve

Start the Serve process in the same way as Ray Serve's Python API. Before using Ray Serve in Java, you need to start the operation to pull up the Controller and Proxy roles of Serve. For example:

1Serve.start(/*detached=*/true, /*dedicatedCpu=*/false, /*config=*/null);

LinkCreating a deployment

By specifying the full class name in the Serve.deployment() interface, users can create and deploy a deployment:

1public static class Counter {
2
3    private AtomicInteger value;
4
5    public Counter(String value) {
6      this.value= new AtomicInteger(Integer.valueOf(value));
7    }
8
9    public String call(String delta) {
10      return String. valueOf(value. addAndGet(Integer. valueOf(delta)));
11    }
12  }
13
14  public void create() {
15    Serve.deployment()
16        .setName("counter")
17        .setDeploymentDef(Counter.class.getName())
18        .setInitArgs(new Object[] {"1"})
19        .setNumReplicas(1)
20        .create()
21        .deploy(/*blocking=*/true);
22  }

LinkAccessing a deployment

Once the deployment is successfully created, it can be queried by its name:

1public Deployment query() {
2    Deployment deployment = Serve.getDeployment("counter");
3    return deployment;
4  }

LinkCalling a deployment

A Java Serve deployment can be called through Java's RayServeHandle, for example:

1Deployment deployment = Serve.getDeployment("counter");
2System.out.println(deployment.getHandle().remote("10").get());

Similarly, it can also be called via HTTP:

1curl -d '"10"' http://127.0.0.1:8000/counter

LinkUpdating a deployment

After a deployment, you can modify its configuration to redeploy. For example, the following code changes the number of copies of "counter" to 2:

1public void update() {
2    Serve.deployment()
3        .setName("counter")
4        .setDeploymentDef(Counter.class.getName())
5        .setInitArgs(new Object[] {"2"})
6        .setNumReplicas(1)
7        .create()
8        .deploy(/*blocking=*/true);
9  }

LinkConfiguring a deployment

Through the Java API, you can also configure the deployment:

expand and shrink the number of deployment replicas
and specify the CPU or GPU resources of each replica

LinkScaling out deployments

The numReplicas parameter control how many replicas are deployed. You can adjust this parameter dynamically, for example:

1public void scaleOut() {
2    Deployment deployment = Serve.getDeployment("counter");
3
4    // Scale up to 2 replicas.
5    deployment.options().setNumReplicas(2).create().deploy(/*blocking=*/true);
6
7    // Scale down to 1 replica.deployment.options
8    deployment.options().setNumReplicas(1).create().deploy(/*blocking=*/true);
9  }

LinkResource management (CPUs, GPUs)

Through the rayActorOptions parameter of the deployment, you can set the binding of each deployment replica to a given resource, for example a GPU:

1public void manageResource() {
2    Map<String, Object> rayActorOptions = new HashMap<>();
3    rayActorOptions.put("num_gpus", 1);
4    Serve.deployment()
5        .setName("counter")
6        .setDeploymentDef(Counter.class.getName ())
7        .setRayActorOptions(rayActorOptions)
8        .create()
9        .deploy(/*blocking=*/true);
10  }

LinkCross language deployment

Through the Java API, you can also deploy and call Python deployment across languages. Suppose there is a Python file counter.py in the /path/to/code/ directory:

1from ray import serve
2
3@serve.deployment
4class Counter(object):
5    def __init__(self, value):
6      self. value = int(value)
7
8    def increase(self, delta):
9        self. value += int(delta)
10        return str(self.value

An example of deploying and calling this Python deployment is as follows:

1import io.ray.api.Ray;
2import io.ray.serve.api.Serve;
3import io.ray.serve.deployment.Deployment;
4import io .ray.serve.generated.DeploymentLanguage;
5import java.io.File;
6
7public class ManagePythonDeployment {
8
9  public static void main(String[] args) {
10
11    System. setProperty(
12        "ray.job.code-search-path",
13        System.getProperty("java.class.path") + File.pathSeparator + "/path/to/code/");
14
15    Serve.start(true, false, null) ;
16
17    Deployment deployment =
18        Serve.deployment()
19            .setDeploymentLanguage(DeploymentLanguage.PYTHON)
20            .setName("counter")
21            .setDeploymentDef("counter.Counter")
22            .setNumReplicas(1)
23            .setInitArgs(new Object[] {"1"})
24            .create();
25    deployment.deploy(/*blocking=*/true);
26
27    System.out.println(Ray.get(deployment.getHandle().method("increase").remote("2")));
28  }
29}

LinkSummary

To sum up, in this short, step-by-step getting-started tutorial, we illustrated via code samples the ease with which you can get started using cross language functionality of Ray Serve, in particular how Ray Serve supports Java. Although in Ray 2.0, the Java support is experimental, we encourage you to try our Java tutorial and provide us feedback by filing any issues you encounter.

What is Ray Serve?
Background on Ray
Ray Serve with Java
Starting Ray Serve
Creating a deployment
Accessing a deployment
Calling a deployment
Updating a deployment
Resource management (CPUs, GPUs)
Cross language deployment
Summary

Sharing

Sign up for product updates

Ray Serve’s New Grafana Dashboard: Debugging That Wasn’t Possible Before

Scaling Vision-Language-Action (VLA) Pipelines for Robotics with Ray on Anyscale

Scalable Distributed Training: From Single-GPU Limits to Reliable Multi-Node Runs with Ray on Anyscale

Ready to try Anyscale?

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.