Local Training#
This guide covers how to train models on your local machine or server using Oumi’s command-line interface. Whether you’re working on a laptop or a multi-GPU server, this guide will help you get started with local training.
For cloud-based training options, see Running Jobs on Clusters.
Prerequisites#
Before starting local training, ensure you have:
Hardware Requirements
CUDA-capable GPU(s) recommended
Sufficient RAM (16GB minimum)
Adequate disk space for storing your models and datasets
Software Setup
Python environment configured &
oumi
installed
For detailed installation instructions, refer to our Installation guide.
Basic Usage#
Command Line Interface#
The main command for training is oumi train
. The CLI provides a flexible way to configure your training runs through both YAML configs and command-line parameter overrides.
# Basic usage
oumi train -c path/to/config.yaml
# With parameter overrides
oumi train -c path/to/config.yaml \
--training.learning_rate 1e-4 \
--training.num_train_epochs 5
For a complete reference of configuration options, see Training Configuration.
Training with GPUs#
Oumi supports both single and multi-GPU training setups.
Single GPU Training#
For training on a specific GPU:
# Using CUDA_VISIBLE_DEVICES
CUDA_VISIBLE_DEVICES=0 oumi train -c config.yaml
# Using device parameter
oumi train -c config.yaml --model.device_map cuda:0
Multi-GPU Training#
For distributed training across multiple GPUs:
# Using DDP
torchrun --standalone --nproc-per-node=<NUM_GPUS> oumi train -c config.yaml
# Using FSDP
oumi distributed torchrun -m oumi train -c config.yaml --fsdp.enable_fsdp true
For more details on distributed training options, see Training.
Monitoring#
Effective monitoring is crucial for understanding your model’s training progress. You have multiple options to monitor your training progress:
Terminal Output#
Monitor training progress directly in the terminal:
# Configure logging
oumi train -c config.yaml --training.logging_steps 10
TensorBoard#
Monitor metrics with TensorBoard for rich visualizations:
First add the following to your train.yaml
config file:
training:
enable_tensorboard: true
output_dir: oumi_output_dir
logging_steps: 10
Then run the following command to start TensorBoard:
# Start TensorBoard
tensorboard --logdir oumi_output_dir
Weights & Biases#
You can also track experiments with W&B for collaborative projects:
training:
enable_wandb: true
run_name: "experiment-1"
logging_steps: 10
For more monitoring options and best practices, see Monitoring & Debugging.
Next Steps#
Set up monitoring tools for tracking progress
Check out configuration options for detailed settings