Bay Area Community Ray Meetup

Thursday, November 2, 12:30AM UTC

Generic Ray Meetup

Please join us for an evening of community technical talks from the users of Data and Ray community. We want to thank PingCAP for being our gracious host and facilitating the meetup!

Agenda
(The times are not strict; they may vary slightly.)

5:30-6:00 pm: Networking, Snacks & Drinks
6:00 pm: Talk 1 (30-35 mins) : - How to build serverless database cloud service
Q & A (10 mins)
6:45 pm: Talk 2 (30-35 mins) : Multi-Region/Cloud Ray Pipeline with Distributed Caching
Q & A (10 mins)
7:20 pm: Talk 3 (30-35 mins): - Introduction to Ray for Distributed and ML/AI Applications in Python

👉 RSVP HERE: https://lu.ma/eb8tpu2f 👈

Talk 1: How to build a serverless database cloud service

Abstract:Relational databases have long been the core component of application systems, and their reliability and performance are critical to the stability and availability of applications. Distributed SQL, as the evolution direction of the next-generation database, offers built-in features such as horizontal scaling and high availability.

In this talk, Li Shen will introduce the architecture and key technologies of the open-source distributed SQL database - TiDB, and how we utilize the capabilities provided by the Public Cloud to build a cloud-native Serverless database service.

Agenda:

Introduction to TiDB
The design goals of TiDB Serverless
Key considerations and architecture
Demo of TiDB Serverless

Bio:

Li Shen is SVP and founding engineer of PingCAP, the company behind TiDB. He is a maintainer of several popular open-source projects including TiDB and TiKV, a distributed transactional key-value store and CNCF graduated project. Li has extensive experience in data infrastructure, software architecture design, and cloud computing.

Talk 2: Multi-Region/Cloud Ray Pipeline with Distributed Caching

Abstract: In some cases, the machine learning pipeline stages may be distributed across regions or clouds. Data preprocessing, model training, and inferencing are in different regions/clouds to leverage special resource types or services that exist in a particular cloud, and to reduce latency by placing inference near user-facing applications. Additionally, as GPUs remain scarce resources, it is getting more common to set up remote training clusters from where data resides. This multi-region/cloud scenario introduces challenges of losing data locality, resulting in latency and expensive data egress costs.

In this talk, Beinan Wang, Senior Staff Software Engineer from Alluxio, will discuss how Alluxio’s open-source distributed caching system integrates with Ray in the multi-region/cloud scenario:

The data locality challenges in the multi-region/cloud ML pipeline
The stack of Ray+PyTorch+Alluxio to overcome these challenges, optimize model training performance, save on costs, and improve reliability
The architecture and integration of Ray+PyTorch+Alluxio using POSIX or RESTful APIs
ResNet and BERT benchmark results showing performance gains and cost savings analysis
Real-world examples of how Zhihu, a top Q&A platform, leveraged Alluxio’s distributed caching and data management with Ray’s scalable distributed computing to optimize their multi-cloud model training performance

Bio: Beinan Wang

Dr. Beinan Wang is a Senior Staff Software Engineer at Alluxio and a TSC of PrestoDB. Prior to Alluxio, he was the Tech Lead of the Presto team at Twitter and he built large-scale distributed SQL systems for Twitter’s data platform. He has twelve-year of experience working on performance optimization, distributed caching, and volume data processing. He received his Ph.D. in computer engineering from Syracuse University on the symbolic model checking and runtime verification of distributed systems.

Talk 3: Introduction to Ray for ML/AI Applications in Python

Abstract: An introduction to Ray (https://www.ray.io/), the system for scaling your Python and machine learning applications from a laptop to a cluster. We'll start with a hands-on exploration of the core Ray API for distributed workloads, covering basic distributed Ray Core API patterns for scaling ML workloads:

Remote Python functions as tasks
Remote objects as futures
Remote Python classes as stateful actors
Multi-model training with Ray Core APIs patterns

If you are a data scientist, ML engineer, or a Python developer the key takeaways:

Understand what Ray 2.0 is and why to use it
Learn about Ray Core Python APIs and convert Python functions and classes into distributed stateless and stateful tasks
Use Ray Dashboard for inspection and observation metrics
Learn how use Ray core for multi-model training

Bio: Jules S. Damji is a lead developer advocate at Anyscale Inc, an MLflow contributor, and co-author of Learning Spark, 2nd Edition. He is a hands-on developer with over 25 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, Opsware/LoudCloud, VeriSign, ProQuest, Hortonworks, and Databricks, building large-scale distributed systems. He holds a B.Sc and M.Sc in computer science (from Oregon State University and Cal State, Chico respectively), and an MA in political advocacy and communication (from Johns Hopkins University).

Note: By registering at this meetup, you are agreeing to share your name and email with our hosts PingCAP.