Ray Deep Dives

Ray Observability: Present and future

Ray Summit 2022

ML offline and online serving users leverage Ray in radically different ways, which makes it challenging to provide a common architecture to inspect and debug performance problems. In this talk, we discuss Ray 2.0's current observability architecture, including how to view metrics, logs, and inspect the state of tasks, actors, and other resources in Ray. We discuss features like the task timeline and tracing, and present a roadmap for where Ray observability is going in the future with a unified observability data model. This talk is for you if you're interested in how to debug Ray programs, or are curious how one can implement observability in a general purpose distributed system like Ray.

About SangBin

SangBin Cho is a software engineer at Anyscale and a committer on open source Ray. He has contributed to various parts of Ray's core distributed systems including scalable metrics infrastructure, new actor fault tolerance mechanism, placement group APIs, and stable memory management subsystems.

About Ricky

Ricky Xu is a software engineer working on Ray Core at Anyscale for the open source project Ray. He was a graduate from Carnegie Mellon University, and worked at Meta in the Core Data organization. He is passionate about building the future of distributed computing with Ray.

SangBin Cho

Software Engineer, Anyscale

Ricky Xu

Software Engineer, Anyscale
chucks
Ray Summit 2022 horizontal logo

Ready to Register?

Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.

Save your spot
register-bottom-mobile
beanbags

Join the Conversation

Ready to get involved in the Ray community before the conference? Ask a question in the forums. Open a pull request. Or share why you’re excited with the hashtag #RaySummit on Twitter.