Running and Monitoring Distributed ML with Ray and whylogs

By Anthony Naddeo and Danny Leybzon   

Running and monitoring distributed ML systems can be challenging. The need to manage multiple servers, and the fact that those servers emit different logs, means that there can be a lot of overhead involved in scaling up a distributed ML system. Fortunately, Ray makes parallelizing Python processes easy, and the open source whylogs enables users to monitor ML models in production, even if those models are running in a distributed environment.

Ray is an exciting project that allows you to parallelize pretty much anything written in Python. One of the advantages of the whylogs architecture is that it operates on mergeable profiles that can be easily generated in distributed systems and collected into a single profile downstream for analysis, enabling monitoring for distributed systems. This post will review some options that Ray users have for integrating whylogs into their architectures as a monitoring solution.

Working locally with large datasets

At the beginning of a new project or Kaggle tournament, you often have to analyze datasets on a laptop, perhaps in the form of Pandas DataFrames. This exploratory phase can often be sped up greatly with parallelization. Ray makes it easy to do anything in Python in parallel so it's pretty easy to get whylogs profiling large datasets with a speed boost from parallelization.

Figure 1

Imagine a notebook use case where you want to use whylogs to generate a profile of the entire dataset to glance at some high level statistics and distribution properties. Ray can be used to divide the dataset up and send different chunks to worker processes which would then use whylogs to generate profiles. The profiles can then be reduced into a single profile by merging all of the results.

First, we’ll set up the remote function log_frame for Ray to execute that will essentially convert Pandas DataFrames into whylogs profiles.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from functools import reduce
import pandas as pd
import ray
from whylogs.core.datasetprofile import DatasetProfile

data_files = ["data/data1.csv", "data/data2.csv", "data/data3.csv"]

@ray.remote
def log_frame(df: pd.DataFrame) -> DatasetProfile:
    profile = DatasetProfile("")
    profile.track_dataframe(df)
    return profile

def main_pipeline_iter() -> DatasetProfile:
    pipeline = ray.data.read_csv(data_files).window()
    pipelines = pipeline.iter_batches(batch_size=1000, batch_format="pandas")
    results = ray.get([log_frame.remote(batch) for batch in pipelines])
    profile = reduce(
        lambda acc, cur: acc.merge(cur),
        results,
        DatasetProfile(""))
    return profile

if __name__ == "__main__":
    ray.init()
    main_pipeline_iter()

The next part uses Ray pipelines to divide up the dataset and distribute the work across the available Ray nodes/processes. The iter_batches method is used to pick a batch size to divide the data. You can pick whatever number makes sense for the size of your dataset and the number of cores you’re using, just make sure you’re using the Pandas format if you’re using whylogs track_dataframe to log. At the end of this process, we’re left with a list of whylogs profiles that we have to merge back into a single profile, which can be done with a reduce.  A merged profile will end up looking similar to this, containing various statistical summaries derived from the data.

That will leave you with a single profile that you can explore using any of our examples. Ray pipeline’s iter_batches were used in the example above but Ray does have a few other abstractions that could have been used instead. Using pipeline.split(n) could have worked as well. There are a few ways to accomplish similar results.

Profiling data during inference with Ray Serve

Ray has an easy-to-use scalable model serving library called Ray Serve that lets you define a service in Python and then scale it out to a Ray cluster. Companies like Wildlife, cidaas, hutom, and Dendra Systems have already shown how it excels as a model serving framework, but it is also important to note that it can be used for other use cases as well. This section of the post goes over  a Ray Serve example with a dedicated inference endpoint that sends data to a secondary dedicated logging endpoint.

figure 2

This is a scrappy setup for experiments or prototyping. It uses a Ray actor as a dedicated state accumulator. As data flows in, workers will be invoked with dataframes. They’ll convert the dataframes into whylogs profiles and send those over to the stateful actor to merge the profile into the existing profile. Concurrency issues are resolved with an asyncio queue that serializes the profile mergers. The current profile state is queryable through the Logger endpoint at any point in time.

The inference endpoint takes data in CSV form. We can send data to it with curl by running curl 'http://127.0.0.1:8000/MyModel' --data-binary @data.csv.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import io
import time

import pandas as pd
import ray
from ray import serve
from starlette.requests import Request
from whylogs.core.datasetprofile import DatasetProfile

ray.init()
serve.start()

@ray.remote
class SingletonProfile:
    def __init__(self) -> None:
        self.profile = DatasetProfile("")

    def add_profile(self, profile: DatasetProfile):
        self.profile = self.profile.merge(profile)

    def get_summary(self):
        return str(self.profile.to_summary())

singleton = SingletonProfile.remote()

@serve.deployment()
class Logger:
    def log(self, df: pd.DataFrame):
        profile = DatasetProfile("")
        profile.track_dataframe(df)
        ray.get(singleton.add_profile.remote(profile))

    async def __call__(self, request: Request):
        return ray.get(singleton.get_summary.remote())

@serve.deployment
class MyModel:
    def __init__(self) -> None:
        self.logger = Logger.get_handle(sync=True)

    def predict(self, df: pd.DataFrame):
        # implement with a real model
        return []

    async def __call__(self, request: Request):
        bytes = await request.body()
        csv_text = bytes.decode(encoding='UTF-8')
        df = pd.read_csv(io.StringIO(csv_text))
        # log the data with whylogs asynchronously
        self.logger.log.remote(df)
        return self.predict(df)

Logger.deploy()
MyModel.deploy()

while True:
    time.sleep(5)

The logging endpoint will log DataFrames asynchronously so inference isn’t delayed. This example keeps state in the controllers indefinitely.

This endpoint returns the summary that whylog’s profiles produce when called with curl ‘http://127.0.0.1:8000/Logger'. We’ll use the log method directly from the inference endpoint. This will look a lot like the local Ray example we covered previously. Since the Logger endpoint can be called concurrently to eventually merge data into its lone dataset profile, we’re making use of asyncio to serialize updates to the profile state so we don’t step on our own toes when multiple requests are being logged.

To analyze the profiles, you need to get them out of Ray and into an application that allows you to analyze whylogs profiles. One such application is the open source profile viewer included in the whylogs library.

Using the whylogs container during inference

This is the best way to integrate whylogs into a Ray cluster in a production environment at the moment. You won’t have to worry about maintaining state on the cluster and you won’t risk losing any state after crashes.

figure 3

This integration option makes use of our whylogs container, which lives external to the Ray cluster. Instead of running whylogs in a dedicated endpoint within Ray, you have an endpoint dedicated to sending data in Pandas format to an external container that you host with your preferred container hosting solution. From there, the container can produce profiles for each hour/day (depending on your configuration) and upload them to s3.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import json
import requests
import io
import time

import pandas as pd
import ray
from ray import serve
from starlette.requests import Request

ray.init()
serve.start()

@serve.deployment()
class Logger:
    def log(self, df: pd.DataFrame):
        # Post request with data as the payload to your whylogs container
        request = {
            'datasetId': '123',
            'tags': {},
            'multiple': df.to_dict(orient='split')
        }
        requests.post(
            'http://localhost:8080/logs',
            json.dumps(request),
            headers={'X-API-Key': 'password'})

    async def __call__(self, request: Request):
        return "NoOp"

@serve.deployment
class MyModel:
    def __init__(self) -> None:
        self.logger = Logger.get_handle(sync=True)

    def predict(self, df: pd.DataFrame):
        # implement with a real model
        return []

    async def __call__(self, request: Request):
        bytes = await request.body()
        csv_text = bytes.decode(encoding='UTF-8')
        df = pd. read_csv(io.StringIO(csv_text))
        # log the data with whylogs asynchronously
        self.logger.log.remote(df)
        return self.predict(df)

Logger.deploy()
MyModel.deploy()

while True:
    time.sleep(5)

Now, instead of logging, we can send the data frame in JSON format (using the split orientation) to the container. This allows you to find all of the data you logged in your s3 bucket split into separate profiles for each configured time period. The container rotates logs according to its configuration, so you’ll end up with a profile for every hour or day.

This example depends on a whylogs container running on localhost. Reach out on our slack for help configuring and running one.

Conclusion

Ray provides an easy-to-use interface for parallelizing your Python workloads, including ML models. whylogs allows you to generate logs of the data being sent to your ML model and the predictions that it makes, enabling observability and monitoring for the model in production. This means that you can easily and robustly run ML models in a distributed production environment using only open-source software to help you accomplish your goals.

For full examples from this post, as well as other integration examples, check out the whylogs github. If you have any questions about whylogs, how to integrate it with Ray, or how to use it to monitor your data and ML applications, join our Community Slack. If you’re interested in creating awesome tools for AI practitioners, check out our job postings on the WhyLabs About Page.

Sharing