ML Infra + Apps

A serverless resilient graph analysis platform built on Ray

Ray Summit 2022

Graph processing aims to process super large-scale graphs and execute graph data analysis tasks, such as PageRank, TriangleCount, community detection, etc. Most graph processing systems are under in-memory architecture, which makes it hard to process real-world large graphs with a gigantic number of edges (e.g., 100 trillion edges) at a cheap cost. In addition, few graph processing systems really implement serverless fault tolerance capability, hindering wide adoption in production. Hence, at ByteDance, we proposed an enterprise-level Graph Analytics Platform (GAP) for graph computing and graph mining, named ByteGAP, to process super large graphs such as Douyin and TikTok.

ByteGAP is built on a serverless engine atop KubeRay to provide flexible cluster resource management, automatic deployment in the cloud, elasticity for scalability, and fault tolerance. It eases the maintenance and deployment effort and lowers the number of machines and memory consumption but supports much larger graphs by a hierarchical out-of-core design of DRAM/PMEM/SSDs on Ray clusters. Thanks to Ray's powerful abstraction of tasks/actors/GCS, we have built a Ray-based control plane for rank management, agent (actor) rendezvous, and stateful fault tolerance as an infrastructure component. It handles failures at node level, agent level, and worker level end-to-end with synergy that MPI can not fully cover. The Ray agents of dynamically assigned ranks manage worker processes of different languages as well as checkpoints in PMEM. It can relaunch agents and workers of specific ranks, load mutable status of vertices from vertex table in PMEM, and ensure that they are automatically recoverable from any iteration for any workers of arbitrary ranks.

About Lin

Lin Ma is a senior researcher and software architect at ByteDance Technical Infrastructures-US-Applied Research Center. He has been working on end-to-end elastic and efficient big data and ML infrastructure, graph learning and computing infrastructure, as well as serverless computing paradigms by leveraging Ray as a substrate. He has published 15+ papers, 4 US patents, served on technical committees for 17 international conferences and 6 journals, and has chaired 2 conferences and workshops.

Lin Ma

Senior Researcher and Software Architect, ByteDance Inc.
Ray Summit 2022 horizontal logo

Ready to Register?

Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.

Save your spot

Join the Conversation

Ready to get involved in the Ray community before the conference? Ask a question in the forums. Open a pull request. Or share why you’re excited with the hashtag #RaySummit on Twitter.