Anyscale Connect

MLOps: Ray in the Real World

Wednesday, August 12, 2:00PM UTC

How do you bridge the gap between data science and data engineering, which is necessary to reliably and repeatedly create production models and deploy them in production? You’ll hear from experts about various aspects of MLOps. They will discuss the unique challenges of deploying ML to production, along with the tools and techniques necessary for success.


10:00am: Introducing Ray Serve: Scalable and Programmable ML Serving Framework, Simon Mo (Anyscale)

10:15am: Building Scalable Natural Language Processing Pipelines with RayQingqing Mao (Dascena)

10:30am: Project Zouwu: Scalable AutoML for Telco Time Series Analysis using Ray and Analytics Zoo, Ding Ding (Intel)

10:45am: Panel discussion and audience Q&A moderated by Dean Wampler (Anyscale)

Introducing Ray Serve: Scalable and Programmable ML Serving FrameworkSimon Mo (Anyscale)

After data scientists train a machine learning (ML) model, the model needs to be served for interactive scoring or batch predictions. The go-to solution is often to wrap the model inside a Flask microservice. But when is that not enough? In this talk, Simon will discuss the short-comings of the Flask-only solution and then discuss the more common alternative, the “tensor prediction service” approach used by TFServing, SageMaker, and others. Simon will then introduce an easy-to-use, scalable ML serving system “Ray Serve” that overcomes the deficiencies of the two approaches. Simon will highlight the architectural innovations in Ray Serve.

Building Scalable Natural Language Processing Pipelines with Ray, Qingqing Mao (Primer AI)

At Primer AI, we build machines that can read and write, automating the analysis of very large document datasets. Our clients include some of the world’s largest government agencies, financial institutions, and Fortune 50 companies. It is challenging to build NLP analytical pipelines that are both comprehensive and scalable because different NLP tasks may have different computation requirements and the tasks may have interdependencies. This becomes more challenging when many clients require on-premise deployment with restricted computation capacity. We use Ray to build some of our NLP pipelines. Ray helps us narrow the gap between data science and engineering, and it enables our data scientists to write high-performance data analytics pipelines that can scale.

Project Zouwu: Scalable AutoML for Telco Time Series Analysis using Ray and Analytics ZooDing Ding (Intel)

Time series analysis plays a crucial rule in the telecom applications, such as network quality analysis, network capacity forecast, smart power management, etc. There’s a recent trend to apply machine learning methods (especially neural networks) to such problems, and they are reported to perform better in many cases than traditional methods such as autoregression and exponential smoothing.

However, building the machine learning applications for time series forecasting can be a laborious and knowledge-intensive process. In this talk, we present Project Zouwu, which provides Automated Machine Learning (AutoML) to time series analysis for Telco application. It is built on top of Ray ( and Analytics Zoo (, so as to automate the process of feature generation and selection, model selection and hyper-parameter tuning in a distributed fashion. We will also share some real-world experience and “war stories” of earlier users.


Ding Ding

Ding Ding

Ding Ding is a machine learning engineer in Intel’s ML solution platform team, where she works on developing distributed machine learning and deep learning algorithms. She is an active contributor to the BigDL and Analytics Zoo projects.

Simon Mo headshot

Simon Mo

Simon Mo is a software engineer at Anyscale. Before Anyscale, he was a student at UC Berkeley participating in research at the RISELab. He focuses on studying and building systems for machine learning, in particular, how to make ML model serving systems more efficient, ergonomic, and scalable. He works on Ray Serve at Anyscale.

Qingqing Mao headshot

Qingqing Mao

Qingqing Mao is the Head of Engineering and Data Science at Dascena, where he leads the development of compliant and scalable clinical data pipelines and the research on applying machine learning techniques in healthcare and medicine. Previously, he worked as a Senior Staff Data Scientist at Primer AI, where he focused on building complex and high-performance analytical pipelines that bring state-of-the-art natural language processing models into production and are capable of processing hundreds of millions of documents. He has a Ph.D. in Astrophysics from Vanderbilt University, where he researched large-scale galaxy distribution using computer simulations and datasets of millions of galaxies from the Sloan Digital Sky Survey.