Scalable machine learning,
Scalable Python, for everyone
Deep reinforcement learning (Deep RL) has seen many successes, including learning to play Atari games, the classical game of Go, robotic locomotion and manipulation. However, now that Deep RL has become fairly capable of optimizing reward, a new challenge has arisen: How to choose the reward function that is to be optimized? Indeed, this often becomes the key engineering time sink for practitioners. In this talk, I will present some recent progress on human-in-the-loop reinforcement learning. The newly proposed algorithm, PEBBLE, empowers a human supervisor to directly teach an AI agent new skills without the usual extensive reward engineering or curriculum design efforts.
Professor, UC Berkeley | Founder, Covariant | Host, The Robot Brains Podcast