Every day, 90% of the data we generate is in unstructured form. However, current solutions for storing the data we create - Databases, Data Lakes, and Data Warehouses (or the Data 1.0 minions), are unfit for storing unstructured data. As a result, data scientists today work with unstructured data like developers used to work in the pre-database era. This slows down ML cycles, bottlenecks access speed and data transfer, and forces data scientists to wrangle with data instead of training models.
Creating Software 2.0 requires a new way of working with unstructured data, which we explore in this session. We present Data 2.0 - a framework bringing together all types of data under one umbrella, representing them in a unified tensorial form which is native to deep neural networks. The streaming process of the method is used for training and deploying machine learning models for both compute and data-bottlenecked operations as if the data is local to the machine. In addition, it allows version-controlling and collaborating on petabyte-scale datasets, as single numpy-like arrays on the cloud or locally. Lastly, we use Ray to improve our workflows.
Founding CEO at Activeloop, PhD on leave from Princeton, Y Combinator S18, UCL 16’, Working on Data 2.0.
Davit was first featured by TechCrunch at the age of 18. After completing his CompSci degree at UCL, he started his Ph.D. at Princeton University at 20. Davit is a recipient of the Gordon Wu Fellowship and AWS Machine Learning Research Award. His research involved reconstructing the connectome of the mouse brain under the supervision of Sebastian Seung. Trying to solve hurdles he faced analyzing large datasets in the neuroscience lab, David became the founding CEO of Activeloop, a Y-Combinator alum startup. He has been named among 30 innovative leaders in tech by HIVE Ventures under 30.