Runtime improvements to multi-dimensional array operations increasingly rely on parallelism, and Python's scientific computing community has a growing need to train on larger datasets. Existing SPMD distributed memory solutions present Python users with an uncommon programming model, and block-partitioned array abstractions that rely on task graph scheduling heuristics are optimized for general workloads instead of array operations. In this work, we present a novel approach to optimizing NumPy programs at runtime while maintaining an array abstraction that is faithful to the NumPy API, providing a Ray library that is both performant and easy to use. We explicitly formulate scheduling as an optimization problem and empirically show that our runtime optimizer achieves near optimal performance. Our library, called NumS, is able to provide a 10-20x speedup on basic linear algebra operations over Dask, and a 3-6x speedup on logistic regression compared to Dask and Spark on terabyte-scale data.
Melih Elibol is pursuing his PhD as a researcher at U.C. Berkeley's RISE lab, where he is advised by Ion Stoica and Michael I. Jordan. He rewrote Ray's object communication protocol to support multi-threading, and has since been working on project NumS, a distributed array abstraction optimized for Ray. Prior to joining the RISE lab, Melih worked as a research software engineer at Microsoft Research, New England, and completed his undergraduate degree at Harvard.