Ready to move beyond memory limits and scale your LLM fine-tuning? Join us for a webinar where ML and platform engineers will explore how to fine-tune large language models (LLMs) across distributed GPU clusters using FSDP, DeepSpeed, and Ray. We will dive into the orchestration and memory management strategies required to train frontier-scale models efficiently.
In this virtual session you will learn:
How to fine-tune an LLM at scale using Ray and PyTorch.
Checkpoint saving and resuming with Ray Train
Configuring ZeRO for memory and performance (stages, mixed precision, CPU offload)
Launching a distributed training job
This session is more than a demo. You’ll leave with a working understanding of Ray, a reusable project you can build on, and a clear view of how Ray and Anyscale work together to accelerate LLM development.
Seats are limited to keep the experience interactive. Reserve your spot today, and come ready to code!