The Road to AutoML: Hyperparameter Tuning and Neural Architecture Search

June 17, 2020

The videos and slides are now available for the second Ray Summit Connect, June 17, 2020.

For your convenience, there are separate videos for each talk and for the Q&A and panel discussion at the end.

  • Distributed Hyperparameter Tuning, Richard Liaw (Anyscale)video, slides.

  • Geometry-Aware Gradient Algorithms for Neural Architecture Search, Liam Li (Determined AI)video, slides.

  • Trading off Model Size and Accuracy for BERT with Ray and SigOpt, 

    Meghana Ravikumar (SigOpt): videoslides.

  • Q&A and panel discussion with Richard Liaw, Liam Li, and Meghana Ravikumar, moderated by Dean Wampler: video.


The vision of AutoML is to remove as much manual effort and required expertise as possible when applying machine learning and artificial intelligence to real-world problems. This Ray Summit Connect explores two topics of AutoML, where research is ongoing, but many practical tools already exist. The first topic is hyperparameter tuning, techniques for determining the optimal model structure to use for your problem, before actual model training begins. The second topic is actually an important subset of hyperparameter tuning, neural architecture search, which seeks the optimal architecture for your neural network. The third topic explores the pragmatic decision of finding the best balance between model size, which requires more computation, and accuracy, which improves with larger models. You’ll hear from experts on the challenges of these topics and the current tools and techniques used for them.

Schedule

10:00 AM: Distributed Hyperparameter Tuning, Richard Liaw (Anyscale)

10:15 AM: Geometry-Aware Gradient Algorithms for Neural Architecture Search, Liam Li (Determined AI)

10:30 AM: Trading off Model Size and Accuracy for BERT with Ray and SigOpt, Meghana Ravikumar (SigOpt)

10:45 AM: Panel discussion moderated by Dean Wampler with audience Q&A 


Distributed Hyperparameter Tuning, Richard Liaw

Modern deep learning model performance is very dependent on the choice of model hyperparameters, and the tuning process is a major bottleneck in the machine learning pipeline. In this talk, we will first motivate the need for advancements in hyperparameter tuning methods. The talk will then overview standard methods for hyperparameter tuning: grid search, random search, and bayesian optimization. Then, we will motivate and discuss cutting edge methods for hyperparameter tuning: multi-fidelity bayesian optimization, successive halving algorithms (HyperBand), and population-based training. The talk will then present an overview of Tune (http://tune.io/), a scalable hyperparameter tuning system from the UC Berkeley RISELab, and demonstrate about how users can leverage cutting edge hyperparameter tuning methods implemented in Tune to quickly improve the performance of standard deep learning models. 

Geometry-Aware Gradient Algorithms for Neural Architecture Search, Liam Li 

Deep learning offers the promise of bypassing the process of manual feature engineering by learning representations in conjunction with statistical models in an end-to-end fashion. However, neural network architectures themselves are typically designed by experts in a painstaking, ad-hoc fashion. Neural architecture search (NAS) presents a promising path for alleviating this pain by automatically identifying architectures that are superior to hand-designed ones. In this talk we will present our recent GAEA framework, which provides principled and computationally efficient algorithms for NAS that yield SOTA performance on a wide range of leading NAS benchmarks in computer vision. We will also briefly discuss practical infrastructural hurdles associated with large-scale NAS workflows, and how we tackle these hurdles with Determined AI’s open-source training platform.

Trading off Model Size and Accuracy for BERT with Ray and SigOpt, Meghana Ravikumar

With the publication of BERT, transfer learning was suddenly accessible for NLP, unlocking a plethora of model zoos and boosting performances for domain specific problems.  Although BERT has accelerated many modeling efforts, its size is limiting for federated learning, edge computing, and for some production systems. In this talk, we will explore how to reduce the size of BERT while retaining its capacity in the context of Question Answering tasks. 

Our approach encompasses fine-tuning, distillation, and hyperparameter optimization at scale. First, we fine-tune BERT on SQUAD 2.0 (our teacher model) and use distillation to compress fine-tuned BERT to a smaller model (our student model). Then, combining SigOpt and Ray, we use multimetric hyperparameter optimization at scale to find the optimal architecture for the student model. Finally, we explore the trade-offs of our hyperparameter decisions to draw insights for our student model’s architecture.

About Richard Liaw

Richard Liaw is a Software Engineer at Anyscale and works on RayTune.

Richard Liaw headshot

About Liam Li

Liam Li recently completed a PhD in Machine Learning from Carnegie Mellon University, where he was advised by Ameet Talwalkar. His PhD research focused on developing efficient methods for automated machine learning. Since then, he has joined Determined AI as a machine learning engineer to continue making machine learning easier and more accessible.

Liam Li headshot

About Meghana Ravikumar

Meghana has worked with machine learning in academia and in industry, and is happiest working on natural language processing. Prior to SigOpt, she worked in biotech, employing NLP to mine and classify biomedical literature. When she’s not reading papers, developing models/tools, or trying to explain complicated topics, she enjoys doing yoga, traveling, and hunting for the perfect chai latte.

Meghana Ravikumar headshot