Lightning Talk

BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

Ray Summit 2022

Applying AI models to end-to-end data analysis pipelines plays a critical role in today's large-scale, intelligent applications. On the other hand, AI projects usually start with a Python notebook running on a single laptop or workstation, and one needs to go through a mountain of pains to scale it to handle larger datasets with high performance (for both large-scale experimentation and production deployment). These often require data scientists to follow many manual, error-prone steps and even make intrusive code changes so as to fully take advantage of the available hardware resources. To address these challenges, we have open sourced BigDL 2.0 (https:/ / under the Apache 2.0 license (combining the original BigDL and Analytics Zoo projects), which allows users to build end-to-end AI pipelines that are transparently accelerated on a single node (with up to 9.6x speedup in our experiments) and seamlessly scaled out to a large cluster (across several hundreds of nodes in real-world use cases). It automatically provisions Big Data and AI systems (such as Ray and Apache Spark) for the distributed execution; on top of the underlying systems, it efficiently implements the distributed, in-memory data pipelines (for Spark Dataframes, TensorFlow Dataset, PyTorch, DataLoader, as well as arbitrary Python libraries), and transparently scales out deep learning (such as TensorFlow and PyTorch) training and inference on the distributed dataset (through scikit-learn style APIs). BigDL 2.0 has already been adopted by many real-world users (such as Mastercard, Burger King, Inspur, etc.) in production. In this session, we will demonstrate how to build an end-to-end AI pipeline using BigDL2.0 on Ray, and showcase real-world BigDL use cases.

About Jiao

Jiao (Jennie) Wang is an AI framework engineer on the Machine Learning Platform team at Intel, working in the area of big data analytics. She is a key contributor in developing and optimizing distributed ML/DL frameworks on big data systems and provides customer support for end-to-end AI solutions on big data platforms.

Jiao Wang

AI Framework Engineer, Intel
Ray Summit 2022 horizontal logo

Ready to Register?

Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.

Save your spot

Join the Conversation

Ready to get involved in the Ray community before the conference? Ask a question in the forums. Open a pull request. Or share why you’re excited with the hashtag #RaySummit on Twitter.