In 2000 -- literally last century -- I was a grad student trying to finalize his PhD on applying machine learning to time series. But running the experiments was taking way too long -- I worked out it was going to take more than a year to compute, and the beefy machines to run it more quickly were just too expensive. So, we built what was cutting edge at the time: a Beowulf cluster -- a stack of 18 PCs linked together to tackle easily parallelizable tasks. It did allow me to get my PhD done 6 months sooner, but parallelizing it could only be done at the coarsest level, and it was hard work.
Almost every project I’ve worked on since then has required dealing with the complexities of distributed computing. At Google, I used MapReduce to help synthesize hundreds of millions of map features into the data that underlies Google Maps, and later FlumeJava to help us process wifi surveys from tens of thousands of floors into a usable indoor location system deployed on billions of phones. At Uber, my teams used tools like Spark and Horovod to tackle problems as diverse as ray tracing for improved urban locations, real time traffic computation, and accident detection.
But every time, it was a similar pattern: you would spend something like 30% of the time developing code, and 70% of the time parallelizing it to work at scale. You would have to spend days working out how to shoehorn your ideas into programming models that made it easy for the work to be computed across multiple machines. I still remember one incident trying to compute road speed limits. The core algorithm fit in about 50 lines, yet I spent the next two weeks working out how to slice and dice the problem so it fit in the MapReduce framework.
Twenty years after I struggled with my PhD, distributed computing tools have improved, but there hasn’t been a fundamental shift in the model. Distributed computing is built around the assumption of tradeoffs. You either have complex flexible systems (like message passing) or simple and rigid “think about the problem our way” systems (like Hadoop).
Anyscale is challenging that. I love the company’s ambition in tackling this problem that has been a thorn in my professional side for 2 decades -- making distributed computing simple and flexible. There are no guarantees of success, but there is no doubt that the problem itself is worthy.
There are reasons to be confident. Anyscale’s library, Ray, demonstrates how it can be done. I’ve experienced it myself.
Recently, I was playing with the same decision trees that I had used in my PhD two decades earlier. Generally, decision trees are learned single threaded, because working out how to parallelize them has been difficult -- it can be done, but it involves a lot of complexity.
In the space of 20 minutes, I had parallelized the core algorithm for a 40% performance gain. In the space of 2 more hours, I parallelized one other part of the algorithm, and collectively, running on a decently powered machine I was able to get a total 60% performance gain. Then, with literally a single line change I was able to run the same code in the cloud for a whopping 15x performance gain. You can find the details here.
This is obviously a toy example and there are lots of simplifications and provisos, but it blew my mind -- 60% faster execution for 2 hours of work? Easy migration to the cloud for a 15x performance improvement? This was so unlike my past experiences with parallelization -- it was simple and flexible.
That’s why I’m joining Anyscale: they’re on an ambitious mission to make distributed computing simple and flexible. Ray is already showing immense promise as a demonstration of that -- with a rapidly growing ecosystem around it from both big companies like JP Morgan and Ant Group as well as many open source libraries. There’s a lot more work to do, and I’d be the first to acknowledge that it isn’t going to be as easy as in the above example; but it’s still amazing that this is possible for even a subset of problems.
If you’d like to join a team with an ambitious mission to pole-vault past the limitations of Moore’s law, come join us!