Flexible, cross-language, distributed model inference framework: Ray Serve with Java API

By Tengwei Cai, Yang Liu, Chengxi Luo, Xiaofeng Yang, Simon Mo   

This is a guest blog from Ray contributors at Ant Group and Anyscale's Ray team, showcasing how to use cross-language, distributed model inference framework using Ray Serve with Java API.

LinkWhat is Ray Serve?

Ray Serve is an online model inference and agnostic framework built atop the Ray framework. Compared with other inference service frameworks, Ray Serve focuses on elastic scaling, optimizing inference graphs composed of multiple models, and scenarios where a large number of models multiplexes a small amount of hardware. It highlights Ray's flexible scheduling capabilities and high-performance RPC. Ray Serve supports any machine learning framework for serving models, has its own batch processing function to improve throughput, and natively supports the FastAPI framework.

LinkBackground on Ray

Ray aims to provide a universal API for distributed computing. A core part of achieving this goal is to provide simple but general programming abstractions, letting the system do all the hard work. This philosophy is what makes it possible as a developer to use Ray with existing Python and Java libraries and systems.

Ray seeks to enable the development and composition of distributed applications and libraries in general. Concretely, this includes coarse-grained elastic workloads (i.e., types of serverless computing), machine learning training (e.g., Ray Train), online serving (e.g., Ray Serve), data processing (e.g., Ray Datasets, Modin, Dask-on-Ray), and ad-hoc computation (e.g., parallelizing Python apps, gluing together different distributed frameworks).

Ray's API enables developers to easily compose multiple libraries within a single distributed application. For example, Ray tasks and actors may call into or be called from distributed training (e.g., torch.distributed) or online serving workloads also running in Ray. In this sense, Ray makes for an excellent "distributed glue" system, because its API is general and performant enough to serve as the interface between many different workload types.

LinkRay Serve with Java

Java is one of the mainstream programming languages ​​in the computer industry, and a large number of users use Java as their main language. As of Ray 2.0, Ray Serve supports Java natively. Users can deploy their own Java code as a Deployment, and call and manage it through the Java API. At the same time, the Python API can also act on Java Deployment across languages, and vice versa.

In this post, we will introduce the Java part of Ray Serve in detail and step through simple steps how to get started using Java with Ray Serve

LinkStarting Ray Serve

Start the Serve process in  the same way as Ray Serve's Python API. Before using Ray Serve in Java, you  need to start the operation to pull up the Controller and Proxy roles of Serve. For example:

1
Serve.start(/*detached=*/true, /*dedicatedCpu=*/false, /*config=*/null);

LinkCreating a deployment

By specifying the full class name in the Serve.deployment() interface, users can create and deploy a deployment:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public static class Counter {

    private AtomicInteger value;

    public Counter(String value) {
      this.value= new AtomicInteger(Integer.valueOf(value));
    }

    public String call(String delta) {
      return String. valueOf(value. addAndGet(Integer. valueOf(delta)));
    }
  }

  public void create() {
    Serve.deployment()
        .setName("counter")
        .setDeploymentDef(Counter.class.getName())
        .setInitArgs(new Object[] {"1"})
        .setNumReplicas(1)
        .create()
        .deploy(/*blocking=*/true);
  }

LinkAccessing a deployment

Once the deployment is successfully created, it can be queried by its name:

1
2
3
4
public Deployment query() {
    Deployment deployment = Serve.getDeployment("counter");
    return deployment;
  }

LinkCalling a deployment

A Java Serve deployment can be called through Java's RayServeHandle, for example:

1
2
Deployment deployment = Serve.getDeployment("counter");
System.out.println(deployment.getHandle().remote("10").get());

Similarly, it can also be called via HTTP:

1
curl -d '"10"' http://127.0.0.1:8000/counter

LinkUpdating a deployment

After a deployment, you can modify its configuration to redeploy. For example, the following code changes the number of copies of "counter" to 2:

1
2
3
4
5
6
7
8
9
public void update() {
    Serve.deployment()
        .setName("counter")
        .setDeploymentDef(Counter.class.getName())
        .setInitArgs(new Object[] {"2"})
        .setNumReplicas(1)
        .create()
        .deploy(/*blocking=*/true);
  }

LinkConfiguring a deployment

Through the Java API, you  can also configure the deployment:

  • expand and shrink the number of deployment replicas

  • and specify the CPU or GPU resources of each replica

LinkScaling out deployments

The numReplicas parameter control how many replicas are deployed. You can adjust this parameter dynamically, for example:

1
2
3
4
5
6
7
8
9
public void scaleOut() {
    Deployment deployment = Serve.getDeployment("counter");

    // Scale up to 2 replicas.
    deployment.options().setNumReplicas(2).create().deploy(/*blocking=*/true);

    // Scale down to 1 replica.deployment.options
    deployment.options().setNumReplicas(1).create().deploy(/*blocking=*/true);
  }

LinkResource management (CPUs, GPUs)

Through the rayActorOptions parameter of the deployment, you can set the binding of each deployment replica to a given resource, for example a GPU:

1
2
3
4
5
6
7
8
9
10
public void manageResource() {
    Map<String, Object> rayActorOptions = new HashMap<>();
    rayActorOptions.put("num_gpus", 1);
    Serve.deployment()
        .setName("counter")
        .setDeploymentDef(Counter.class.getName ())
        .setRayActorOptions(rayActorOptions)
        .create()
        .deploy(/*blocking=*/true);
  }

LinkCross language deployment

Through the Java API, you can also deploy and call Python deployment across languages. Suppose there is a Python file counter.py in the /path/to/code/ directory:

1
2
3
4
5
6
7
8
9
10
from ray import serve

@serve.deployment
class Counter(object):
    def __init__(self, value):
      self. value = int(value)

    def increase(self, delta):
        self. value += int(delta)
        return str(self.value

An example of deploying and calling this Python deployment is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import io.ray.api.Ray;
import io.ray.serve.api.Serve;
import io.ray.serve.deployment.Deployment;
import io .ray.serve.generated.DeploymentLanguage;
import java.io.File;

public class ManagePythonDeployment {

  public static void main(String[] args) {

    System. setProperty(
        "ray.job.code-search-path",
        System.getProperty("java.class.path") + File.pathSeparator + "/path/to/code/");

    Serve.start(true, false, null) ;

    Deployment deployment =
        Serve.deployment()
            .setDeploymentLanguage(DeploymentLanguage.PYTHON)
            .setName("counter")
            .setDeploymentDef("counter.Counter")
            .setNumReplicas(1)
            .setInitArgs(new Object[] {"1"})
            .create();
    deployment.deploy(/*blocking=*/true);

    System.out.println(Ray.get(deployment.getHandle().method("increase").remote("2")));
  }
}

LinkSummary

To sum up, in this short, step-by-step getting-started tutorial, we illustrated via code samples the ease with which you can get started using cross language functionality of Ray Serve, in particular how Ray Serve supports Java. Although in Ray 2.0, the Java support is experimental, we encourage you to try our Java tutorial and provide us feedback by filing any issues you encounter.

Ready to try Anyscale?

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.