5:00 p.m. Welcome remarks & upcoming announcements - Jules Damji, Anyscale
5:10 p.m. Talk 1: Observing and debugging Ray: Present and future
6:00 p.m. Q & A
6:10 p.m. Talk 2: Distributed Vector Generation in One Step with Ray
6:50 p.m. Q & A
Talk 1: Observing and debugging Ray: Present and future
Abstract: Machine Learning offline and online serving users leverage Ray in radically different ways, which makes it challenging to provide a common architecture to inspect and debug performance problems.
In this talk, we discuss Ray 2. x's current observability architecture, including how to view metrics and logs and inspect the state of tasks, actors, and other resources in Ray. We discuss features like the newly revamped ray dashboard and the added ray metrics and present a roadmap for where Ray observability is going in the future with a unified observability data model.
This talk is for you if you're interested in the following:
What’s new with ray metrics and dashboard revamp
How to debug Ray programs using the CLI, Dashboard, and logs
learn how to implement observability in a general-purpose distributed system such as Ray.
SangBin Cho is a software engineer at Anyscale and a committer on open-source Ray. He has contributed to various parts of Ray's core distributed systems, including scalable metrics infrastructure, new actor fault tolerance mechanism, placement group APIs, and stable memory management subsystems.
Ricky Xu is a software engineer working on Ray Core at Anyscale for Ray's open-source project. He graduated from Carnegie Mellon University and worked at Meta in the Core Data organization. He is passionate about building the future of distributed computing with Ray.
Talk2: Distributed Vector Generation in One Step with Ray
Abstract: The combination of big data and deep learning has fundamentally changed how we approach data science; through machine learning models and application-specific code, computers can " understand" unstructured data (think images and audio). To leverage these capabilities, most applications require a continuous pipeline of one or more trained models interlinked with data processing functions. The Towhee open-source project aims to provide users with a tool to create these end-to-end pipelines using a simple API. This is done by providing generic data processing scripts, pre-trained ML models, training tools, and a Pythonic toolkit to stitch all the parts into a pipeline on a local machine. Doing this locally was hard enough, but scaling these pipelines is one step above and has remained a key challenge for us from day 1.
This talk will discuss how integrating a Ray engine into Towhee has helped our users confidently scale their unstructured data processing pipelines and other ML applications. Using Ray as a backend, our users can easily schedule compute-heavy Towhee operations using Ray Actors found in Ray Core. First, we will discuss some challenges in building scalable machine learning pipelines. Second, we will elaborate on the pipeline development-to-deployment gap and why Ray is the obvious choice for our users and us. Finally, we will provide a live demo of a real-world data processing application scaled using Ray.
Filip Haltmayer is a Software Engineer at Zilliz currently working on both the open-source Towhee and Milvus projects. Along with working on developing both the core system architecture and distributed compute, he is also active in the growth of the project communities, guiding new users in deployments and creating welcoming, active environments for both projects.