Ray Summit 2022
Applying AI models to end-to-end data analysis pipelines plays a critical role in today's large-scale, intelligent applications. On the other hand, AI projects usually start with a Python notebook running on a single laptop or workstation, and one needs to go through a mountain of pains to scale it to handle larger datasets with high performance (for both large-scale experimentation and production deployment). These often require data scientists to follow many manual, error-prone steps and even make intrusive code changes so as to fully take advantage of the available hardware resources. To address these challenges, we have open sourced BigDL 2.0 (https:/ /github.com/intel-analytics/BigDL/) under the Apache 2.0 license (combining the original BigDL and Analytics Zoo projects), which allows users to build end-to-end AI pipelines that are transparently accelerated on a single node (with up to 9.6x speedup in our experiments) and seamlessly scaled out to a large cluster (across several hundreds of nodes in real-world use cases). It automatically provisions Big Data and AI systems (such as Ray and Apache Spark) for the distributed execution; on top of the underlying systems, it efficiently implements the distributed, in-memory data pipelines (for Spark Dataframes, TensorFlow Dataset, PyTorch, DataLoader, as well as arbitrary Python libraries), and transparently scales out deep learning (such as TensorFlow and PyTorch) training and inference on the distributed dataset (through scikit-learn style APIs). BigDL 2.0 has already been adopted by many real-world users (such as Mastercard, Burger King, Inspur, etc.) in production. In this session, we will demonstrate how to build an end-to-end AI pipeline using BigDL2.0 on Ray, and showcase real-world BigDL use cases.
Guoqiong Song is an AI framework engineer on the machine learning performance team at Intel. She has a PhD degree in atmospheric and oceanic sciences from UCLA, with a focus on numerical modeling and optimization. She is interested in developing and optimizing distributed deep learning and reinforcement learning algorithms on Ray or Spark.
Jiao (Jennie) Wang is an AI framework engineer on the Machine Learning Platform team at Intel, working in the area of big data analytics. She is a key contributor in developing and optimizing distributed ML/DL frameworks on big data systems and provides customer support for end-to-end AI solutions on big data platforms.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.Save your spot