Deep reinforcement learning (Deep RL) has seen many successes, including learning to play Atari games, the classical game of Go, robotic locomotion and manipulation. However, now that Deep RL has become fairly capable of optimizing reward, a new challenge has arisen: How to choose the reward function that is to be optimized? Indeed, this often becomes the key engineering time sink for practitioners. In this talk, I will present some recent progress on human-in-the-loop reinforcement learning. The newly proposed algorithm, PEBBLE, empowers a human supervisor to directly teach an AI agent new skills without the usual extensive reward engineering or curriculum design efforts.
Pieter Abbeel is Professor at UC Berkeley, where he is Director of the Berkeley Robot Learning Lab and Co-Director of the Berkeley Artificial Intelligence (BAIR) Lab. Abbeel’s research strives to build ever more intelligent systems, with main emphasis on deep reinforcement learning, meta-learning. His lab also investigates how AI could advance other science and engineering disciplines. Abbeel has founded several companies, including Gradescope (AI to help instructors with grading homework and exams), Covariant (AI for robotic automation of warehouses and factories). Abbeel is also the host of The Robot Brains Podcast. Abbeel has received many awards and honors, including the PECASE, NSF-CAREER, ONR-YIP, Darpa-YFA, TR35. His work is frequently featured in the press, including the New York Times, Wall Street Journal, BBC, Rolling Stone, Wired, and Tech Review.