Course curriculum
-
-
Introduction
-
Imports & Loading the Dataset
-
Defining the Model
-
Train Loop Per Worker
-
Defining the Training Loop Configuration
-
Configuring the Scaling Config
-
Defining the Model Wrapper
-
Building the Dataloader
-
Reporting Metrics and Checkpointing
-
Persistent Storage
-
Putting it all together with TorchTrainer
-
Inspecting the Training Results
-
Inference with Your Trained Model
-
Full Chapter Notebook
-
-
-
Introduction
-
Train Loop Using Ray Data
-
Building a Ray Data-Backed Dataloader
-
Preparing and Loading the Dataset for Ray Data
-
Transformations with Ray Data
-
Configuring TorchTrainer and Launching Training
-
Full Chapter Notebook
-
-
-
Introduction
-
Checkpoint Loading for Fault Tolerance
-
Saving Fault Tolerant Checkpoints
-
Launching Fault Tolerant Training
-
Manually Restoring from Checkpoints
-
Cleaning up Cluster Storage and Conclusion
-
Concluding the Intro Tutorials and Next Steps
-
Full Chapter Notebook
-
-
-
Introduction
-
Imports and Dataloading
-
Building the Dataloader and Preprocessing
-
Distributing Data
-
Setting up the Distributed Training Loop
-
Running Distributed Training
-
Analyzing the Training Results
-
Demonstrating Fault Tolerance
-
Predicting with a Trained Model
-
Full Chapter Notebook
-
-
-
Introduction
-
Imports and the Dataset
-
Distributing Training
-
Evaluating the Trained Model and Inference
-
Further Training From a Checkpoint
-
Full Chapter Notebook
-
-
-
Introduction
-
Imports and the Dataset
-
Defining the Model
-
Distributed Training
-
Demonstrating Fault Tolerance
-
Inference and Conclusion
-
Full Chapter Notebook
-

About this course
- Free
- 72 lessons
- 3 hours of video content