Course curriculum

    1. Introduction

    2. Imports & Loading the Dataset

    3. Defining the Model

    4. Train Loop Per Worker

    5. Defining the Training Loop Configuration

    6. Configuring the Scaling Config

    7. Defining the Model Wrapper

    8. Building the Dataloader

    9. Reporting Metrics and Checkpointing

    10. Persistent Storage

    11. Putting it all together with TorchTrainer

    12. Inspecting the Training Results

    13. Inference with Your Trained Model

    14. Full Chapter Notebook

    1. Introduction

    2. Train Loop Using Ray Data

    3. Building a Ray Data-Backed Dataloader

    4. Preparing and Loading the Dataset for Ray Data

    5. Transformations with Ray Data

    6. Configuring TorchTrainer and Launching Training

    7. Full Chapter Notebook

    1. Introduction

    2. Checkpoint Loading for Fault Tolerance

    3. Saving Fault Tolerant Checkpoints

    4. Launching Fault Tolerant Training

    5. Manually Restoring from Checkpoints

    6. Cleaning up Cluster Storage and Conclusion

    7. Concluding the Intro Tutorials and Next Steps

    8. Full Chapter Notebook

    1. Introduction

    2. Imports and Dataloading

    3. Building the Dataloader and Preprocessing

    4. Distributing Data

    5. Setting up the Distributed Training Loop

    6. Running Distributed Training

    7. Analyzing the Training Results

    8. Demonstrating Fault Tolerance

    9. Predicting with a Trained Model

    10. Full Chapter Notebook

    1. Introduction

    2. Imports and the Dataset

    3. Distributing Training

    4. Evaluating the Trained Model and Inference

    5. Further Training From a Checkpoint

    6. Full Chapter Notebook

    1. Introduction

    2. Imports and the Dataset

    3. Defining the Model

    4. Distributed Training

    5. Demonstrating Fault Tolerance

    6. Inference and Conclusion

    7. Full Chapter Notebook

About this course

  • Free
  • 72 lessons
  • 3 hours of video content

Discover your potential, starting today