For those in the EDT timezone, this is a joint meetup with NYC ML/AI meetup for you. Join us if you are an ML practitioner and want to learn how you can:
Conduct distributed data processing with Python Daft Dataframes using Ray as its distributed compute engine.
Both are exclusive Ray community user talks, and the Ray team is delighted to have them share their Ray use cases and journeys with the community.
(The times are not strict; they may vary slightly.)
Talk 0: Welcome remarks & upcoming Ray announcements - Jules Damji, Anyscale
Talk 1 (30-35 mins): Forecasting at Scale with Nixtla and Ray - Max Mergenthaler & Frderico Ramirez, Nixtla
Talk 2 (30-35 mins) : Daft: The Ray-native Python dataframe for Complex Data - Jay Chia, Eventual
Talk1 : Forecasting at Scale with Nixtla and Ray
Abstract: Forecasting and anomaly detection at scale is important for various industries, including financial services, CPG/retail, and IOT/operational ML. Ray and Nixtla offer a robust open-source solution for processing time series at scale.The meetup talk will cover best practices for working with time series data, various time series modeling techniques, and how Anyscale can handle training millions of time series models efficiently.
Bio(s): Max is the CEO and Co-Founder of Nixtla, an open-source time-series research and deployment startup. He is also a seasoned entrepreneur with a proven track record as the founder of multiple technology startups. With a decade of experience in the ML industry, he has extensive expertise in building and leading international data teams. Max has also made notable contributions to the Data Science field through his co-authorship of papers on forecasting algorithms and decision theory. In addition, he is a co-maintainer of several open-source libraries in the Python ecosystem. He has been a speaker at major data conferences in different countries. Max's passion lies at the intersection of business and technology.
Fede is a highly experienced ML engineer, with a background in economics and mathematics. They is CTO and co-founder of Nixtla. With over a decade of expertise in deploying ML models in production for large financial institutions, Fede has a proven track record of delivering end to end products. They are passionate about creating usable, scalable, and open-source ML products, and are a co-maintainer of several popular Python libraries. Fede's expertise in the field has also earned them recognition as a speaker at multiple Pycons and author of peer reviewed papers, solidifying their status as a leading expert in time series analysis.
Talk 2: Daft: The Ray-native Python dataframe for Complex Data
Abstract: Modern machine learning is data-driven, however working interactively with “Complex Data” (data that doesn't usually fit in a SQL table such as images, videos, documents etc) is still extremely challenging.Daft is an open-sourced distributed dataframe library built for "Complex Data" . It provides a familiar dataframe tabular interface (similar to Pandas or PySpark) over your data, and its native integrations with Ray allow it to scale to terabytes of data on workloads such as: interactive SQL-like queries, running models on GPUs and interactive data exploration.In this meetup talk, we demo an interactive exploration of the Coco image dataset. We will then make use of Daft’s Ray Dataset integrations to train your models - all happening end-to-end out-of-core and streaming on the same Ray cluster!
Bio: Jay Chai is co-founder of Eventual, the company behind Daft (www.getdaft.io). He has worked in ML/data infrastructure in domains such as genomics and self-driving. He is frustrated that tabular data is so easy to work with using SQL and wishes that the same can exist for Complex Data!