Ray Summit 2022
Traditionally, a workflow consists of a pipeline of tasks, executed and automated according to a set of procedural rules. Workflows enable coordinating and monitoring among distributed people, organizations, and tasks with strong durability, observability, and repeatability.
Recently there is a growing trend of workflow-as-code for applications pipelines in favor of these properties, represented by workflow systems like AirFlow, Prefect, Temporal, and so on. However, many workflows today are data workflows: they are application pipelines that may pass and process large amounts of data between steps. Examples include ETL workloads and ML pipelines. These aforementioned workflow systems are less efficient and flexible for data processing, while Ray offers both efficiency and flexibility for data-intensive workloads. Combining the advantages of both Ray and a workflow system, we show that efficiency, durability, and flexibility can be achieved simultaneously on data pipelines with durable Ray tasks via Ray Workflow.
This talk gives an introduction to Ray Workflow, how you can use Ray Workflow as durable Ray tasks, and how to program data pipelines with Ray Workflow. Ray Workflow will be available as alpha in Ray 2.0.
Siyuan Zhuang is a third-year PhD student at UC Berkeley RISELab / Sky Computing Lab, co-advised by Prof. Dawn Song and Ion Stoica. His research focuses on machine learning systems and distributed data processing systems. He is currently building a workflow system based on Ray to make data processing pipelines efficient, durable and flexible.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.Save your spot