Ray Summit 2022
As a distributed computing framework, Ray works best in clustered mode, where multiple Ray workers can connect to the Ray head and execute tasks in parallel. Managing multiple Ray clusters and workloads in production is challenging, especially when workload patterns are different and may have different computational requirements and dependencies. It becomes more challenging when it requires multi-tenancy and large-scale performance in a cost-efficient way. KubeRay is an open source toolkit to run Ray applications on Kubernetes. KubeRay provides several tools to improve the experience of running Ray workloads on Kubernetes by extending the Kubernetes API and functionality to support the creation of Ray clusters of containers with a single command. In this talk, we will discuss the architectural decisions and show you how KubeRay easily manages heterogeneous resources, job submission lifecycle, application dependencies, and autoscaling.
Ali Kanso is a principal software engineer in the autonomous systems group at Microsoft Seattle. He is an expert in distributed systems and cloud computing and focuses on Linux container technologies. Prior to Microsoft, he worked as a senior software engineer in cloud technologies at IBM T.J. Watson Research Center, NY. He was a main contributor to the IBM-Cloud Kubernetes Service (IKS) and was awarded the IBM research division award twice for his achievements. Dr. Kanso holds a PhD in computer engineering and has over 50 publications to his credit and dozens of patents filed/granted.
Jiaxin Shan is a software engineer focusing on serverless infrastructure and cloud-native adoption at ByteDance.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.
Save your spot