Ray + Arize, Productionize ML for Scale and Usability

By Dat Ngo   

Don't miss Arize’s presentation at Ray Summit 2022!

If you’ve ever had the opportunity to bring a machine learning project to life from rapid prototyping all the way into production, you know that it is nothing short of yeoman’s work. Some would call it fun, some type-II fun, and others absolute hell – it all depends on your sense of humor and affinity for productionizing things.

But if you are on this hero’s journey, there are two arrows in your ML quiver that can make this process much more enjoyable and likely to hit the mark: Ray and Arize AI. This piece covers why you should consider using Ray’s distributed ML framework and ecosystem and Arize’s ML observability platform and how you can get started. 

LinkBackground

Imagine you just finished your ML prototype after weeks and weeks of trying to find the necessary data, cleaning/preprocessing all the data, trying to find the right model architecture, training, testing, and iterating over and over until you finally have a working model.

You present your model to the product team and tell an amazing story of how your model will improve the day-to-day operations, increase business KPIs, and a lot more. They love the presentation! They want to move forward and ask you the following: “How do we get this model serving the entire business line? And when the model goes into production, how will you know how it’s performing against our goals?”

What they are REALLY asking (in ML speak) is this: how do we get this off your laptop and take this into a very real production system at scale? And if issues around data quality, data drift or performance degradation happen, how will we catch and fix them quickly so that business outcomes aren’t negatively affected?

This blog is written to help give you a better understanding of how you can answer these questions and start tackling these productionalization tasks.

LinkThe Technologies

In this section, we will briefly review both the Ray and Arize technologies and the problems that each solve.

LinkWhat is Ray?

Ray is an open-source project developed at UC Berkeley’s RISELab. As a general-purpose and universal distributed compute framework, you can flexibly run any compute-intensive Python workload — from distributed training or hyperparameter tuning to deep reinforcement learning and production model serving.

Many times, as ML practitioners, we set out trying to bring value to our business through the ML models we build but oftentimes get sidetracked in learning and managing how to bring our models to a larger scale.

This is where Ray comes in. Ray enables the user to run Python code in a parallel fashion and across multiple machines without confining you to a specific framework – basically imagine Apache Spark but you have the availability of all things Python.

This makes it more of a general-purpose clustering and parallelization framework that can be used to build and run any type of distributed applications. Because of how Ray Core is architected, it is often thought of as a framework for building frameworks.

You can break down Ray into a couple different components. The first is Ray Core, which is a distributed computing framework. The second is the Ray Ecosystem, which broadly speaking is several task-specific libraries that come packaged with Ray.  

TL;DR on Ray:

  • Very intuitive to scale in a language that you’re comfortable with (going from laptop to distributed workloads in Python)

  • Vast ML ecosystem; not constrained by certain technologies or frameworks

  • Allows users to focus on building their ML use case, not distributed technologies

Want to go Deeper? Here are some resources:

Ray Tutorial

Ray Core

LinkWhat is Arize?

For many ML teams, once the model interacts with the real world is where the rubber meets the road. This is where Arize comes in. Arize is an ML observability platform that allows ML practitioners to easily tackle the myriad of issues likely to come across in the real world, such as:

  • Model Performance Issues: almost all models will experience some sort of performance degradation 

  • Model and Data Drift: the real world or model changing; risk to the model

  • Data Quality Issues: we all know this one

  • Model Explainability: knowing WHY my model is making the predictions it’s making

  • Model Fairness: treating groups or protected classes equitably

When it comes to model monitoring, it’s not just that we want to be alerted when there is an issue. Once a monitor fires, we want the ability to know where and why the issues happened, and how we can fix them quickly. Arize makes finding these issues intuitive and automated. Just like in software development, if you don’t know where the bug is or have no visibility into the problem then it can be painstakingly long and arduous to triage the situation.

Arize is built to do three things well. The first is to let you know when something has gone wrong. The second is helping you understand where that issue is, giving you workflows to quickly fix it. Both contribute to the third, which is to continually improve ML models once they’re in production.

As you think about scaling the infrastructure around ML models, you also want to think about scaling team capabilities. If your team is spending copious amounts of time maintaining basic model analytics and systems not purpose-built for ML monitoring and observability, there is less time spent building newer, better models for the business.

TL;DR on Arize:

  • Automated monitoring for issues your model will encounter in the wild

  • Strong troubleshooting workflows to fix issues quickly

  • Built for scale, intuition, and ease of use

Want to go Deeper? Here are some resources:

Machine Learning Observability 101

Arize Docs

LinkLet’s See it in Action

Below is a coded example of Ray with Arize. 

It’s quite a simple example but shows the scaffolding of both of the technologies working in tandem. Let’s break the notebook up into two major parts. 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# install dependencies

!pip install ray
!pip install arize
!pip install xgboost_ray

# import data, assign features and target

import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC

data = load_breast_cancer()

feature_names = data.feature_names
target_names = data.target_names
target = data.target
df = pd.DataFrame(data, columns=feature_names)

# train a model on the breast cancer dataset using Ray

from xgboost_ray import RayDMatrix, RayParams, train
from sklearn.datasets import load_breast_cancer

train_x, train_y = load_breast_cancer(return_X_y=True)
train_set = RayDMatrix(train_x, train_y)

evals_result = {}
bst = train(
    {
        "objective": "binary:logistic",
        "eval_metric": ["logloss", "error"],
    },
    train_set,
    evals_result=evals_result,
    evals=[(train_set, "train")],
    verbose_eval=False,
    ray_params=RayParams(
        num_actors=2,  # Number of remote actors
        cpus_per_actor=1))

bst.save_model("model.xgb")
print("Final training error: {:.4f}".format(
    evals_result["train"]["error"][-1]))

# model predictions and shap using Ray

from xgboost_ray import RayDMatrix, RayParams, predict
from sklearn.datasets import load_breast_cancer
import xgboost as xgb

batch = RayDMatrix(train_x)
bst = xgb.Booster(model_file="model.xgb")
pred_ray = predict(bst, batch, ray_params=RayParams(num_actors=2))

print(pred_ray.shape)

The first part is likely familiar, as this is one of the advantages when using Ray. Here you are training our model, using that model to predict on the breast cancer dataset, and calculating SHAP values.

A lot of the code should feel familiar, akin to using something like using sklearn fit() and predict(). Here, you are using Ray to distribute the work to two actors.

An actor is essentially a stateful worker (or a service). When a new actor is instantiated, a new worker is created, and methods of the actor are scheduled on that specific worker and can access and mutate the state of that worker. This allows you to distribute the work needed to train, predict, and compute SHAP (or do any other action that is computationally heavy).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# create shap values column names for Arize

inf_shap_values = predict(bst, batch, ray_params=RayParams(num_actors=2), pred_contribs=True)

shap_values_column_names_mapping = {
    f"{feat}": f"{feat}_shap" for feat in data["feature_names"]
}

# instantiate Arize client

from arize.pandas.logger import Client, Schema
from arize.utils.types import ModelTypes, Environments

SPACE_KEY = "SPACE_KEY"
API_KEY = "API_KEY"
arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)

model_id = "breast_cancer_prediction_SHAP"
model_type = ModelTypes.SCORE_CATEGORICAL

if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
    raise ValueError("❌ NEED TO CHANGE SPACE AND/OR API_KEY")
else:
    print("Step 2 ✅: Import and Setup Arize Client Done! Now we can start using Arize!")

# create functions to simulate UUIDS and timestamps for Arize 

import uuid
from datetime import datetime, timedelta

# Prediction ID is required for logging any dataset
def generate_prediction_ids(df):
    return pd.Series((str(uuid.uuid4()) for _ in range(len(df))), index=df.index)

# OPTIONAL: We can directly specify when inferences were made
def simulate_production_timestamps(df, days=30):
    t = datetime.now()
    current_t, earlier_t = t.timestamp(), (t - timedelta(days=days)).timestamp()
    return pd.Series(np.linspace(earlier_t, current_t, num=len(df)), index=df.index)

# assign predictions, labels, and shaps for Arize

prediction_label = pd.Series(map(lambda v: target_names[v], (pred_ray > 0.5).astype(int)))
prediction_score = pd.Series(pred_ray)
actual_label = pd.Series(map(lambda v: target_names[v], target))
actual_score = pd.Series(target)
shap_values = pd.DataFrame(inf_shap_values[:,:-1], columns=shap_values_column_names_mapping)
shap_values = shap_values.rename(columns=shap_values_column_names_mapping)

# create data frame to send to Arize

production_dataset = pd.DataFrame(train_x, columns=data["feature_names"]).join(
    [
        pd.DataFrame(
            {
                "prediction_id": generate_prediction_ids(pd.DataFrame(train_x)),
                "prediction_ts": simulate_production_timestamps(pd.DataFrame(train_x)),
                "prediction_label": prediction_label,
                "actual_label": actual_label,
                "prediction_score": prediction_score,
                "actual_score": actual_score,
            }
        ),
        shap_values,
    ]
)

production_dataset.head(5)

# Define a Schema() object for Arize to pick up data from the correct columns for logging
production_schema = Schema(
    prediction_id_column_name="prediction_id",  # REQUIRED
    timestamp_column_name="prediction_ts",
    prediction_label_column_name="prediction_label",
    prediction_score_column_name="prediction_score",
    actual_label_column_name="actual_label",
    actual_score_column_name="actual_score",
    feature_column_names=feature_names,
    shap_values_column_names=shap_values_column_names_mapping,
)

# arize_client.log returns a Response object from Python's requests module
response = arize_client.log(
    dataframe=production_dataset,
    schema=production_schema,
    model_id="ray_shap_model_example_classification",
    model_type=ModelTypes.SCORE_CATEGORICAL,
    environment=Environments.PRODUCTION,
)

# If successful, the server will return a status_code of 200
if response.status_code != 200:
    print(
        f"❌ logging failed with response code {response.status_code}, {response.text}"
    )
else:
    print(
        f"✅ You have successfully logged {len(production_dataset)} data points to Arize!"
    )

In the second part, you are prepping your production data – the data your model predicted on -- to be sent to Arize. Here, you are instantiating our Arize client, defining your schema, and logging your predictions to our Arize account.

Whether real time or batch architectures, you can log inference data to Arize to monitor and observe how the model is doing in production. In doing so, you have good visibility into when the model encounters performance degradation, drift, or data quality issues. If you come across these model issues, you will have the ability to quickly find and fix the issue with Arize.

LinkFood For Thought

As you think about your current ML operations, there is one thing you could probably use much more of: time. Ray and Arize can help. Instead of spending a lot of time learning how distributed technologies work or monitoring and troubleshooting models that are in production, it is worth considering offloading these tasks to technology to keep your team focused on what they do best: using deep business domain knowledge to build and deploy high-value ML models.

This blog was written in partnership with Dat Ngo, ML Solutions Architect at Arize AI

Next steps

Anyscale's Platform in your Cloud

Get started today with Anyscale's self-service AI/ML platform:


  • Powerful, unified platform for all your AI jobs from training to inference and fine-tuning
  • Powered by Ray. Built by the Ray creators. Ray is the high-performance technology behind many of the most sophisticated AI projects in the world (OpenAI, Uber, Netflix, Spotify)
  • AI App building and experimentation without the Infra and Ops headaches
  • Multi-cloud and on-prem hybrid support