Ray Summit 2022
ML offline and online serving users leverage Ray in radically different ways, which makes it challenging to provide a common architecture to inspect and debug performance problems. In this talk, we discuss Ray 2.0's current observability architecture, including how to view metrics, logs, and inspect the state of tasks, actors, and other resources in Ray. We discuss features like the task timeline and tracing, and present a roadmap for where Ray observability is going in the future with a unified observability data model. This talk is for you if you're interested in how to debug Ray programs, or are curious how one can implement observability in a general purpose distributed system like Ray.
SangBin Cho is a software engineer at Anyscale and a committer on open source Ray. He has contributed to various parts of Ray's core distributed systems including scalable metrics infrastructure, new actor fault tolerance mechanism, placement group APIs, and stable memory management subsystems.
Ricky Xu is a software engineer working on Ray Core at Anyscale for the open source project Ray. He was a graduate from Carnegie Mellon University, and worked at Meta in the Core Data organization. He is passionate about building the future of distributed computing with Ray.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.Save your spot