Training#
Overview#
Oumi provides an end-to-end training framework designed to handle everything from small fine-tuning experiments to large-scale pre-training runs.
Oumi enables you to start small—in a notebook or local machine—and easily scale up as your needs grow while maintaining a consistent interface across different training scenarios and environments.
Key features include:
Multiple Training Methods: Supervised Fine-Tuning (SFT) to adapt models to your specific tasks, Vision-Language SFT for multimodal models, Pretraining for training from scratch, and Direct Preference Optimization (DPO) for preference-based fine-tuning
Parameter-Efficient Fine-Tuning (PEFT) & Full Fine-Tuning (FFT): Support for multiple PEFT methods including LoRA for efficient adapter training, QLoRA for quantized fine-tuning with 4-bit precision, and full fine-tuning for maximum performance
Flexible Environments: Train on local machines, with VSCode integration, in Jupyter notebooks, or in a cloud environment
Production-Ready: Ensure reproducibility through YAML-based configurations and gain insights with comprehensive monitoring & debugging tools
Scalable Training: Scale from single-GPU training to multi-node distributed training using Distributed Data Parallel (DDP) or Fully Sharded Data Parallel (FSDP)
Quick Start#
The fastest way to get started with training is using one of our pre-configured recipes.
For example, to train a small model (SmolLM-135M
) on a sample dataset (tatsu-lab/alpaca
), you can use the following command:
# Train a small model (SmolLM-135M)
oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
from oumi import train
from oumi.core.configs import TrainingConfig
# Load config from file
config = TrainingConfig.from_yaml("configs/recipes/smollm/sft/135m/quickstart_train.yaml")
# Start training
train(config)
Running this config will:
Download a small pre-trained model:
SmolLM-135M
Load a sample dataset:
tatsu-lab/alpaca
Run supervised fine-tuning using the
TRL_SFT
trainerSave the trained model to
config.output_dir
Configuration Guide#
At the heart of Oumi’s training system is a YAML-based configuration framework. This allows you to define all aspects of your training run in a single, version-controlled file.
Here’s a basic example with key parameters explained:
model:
model_name: "HuggingFaceTB/SmolLM2-135M-Instruct" # Base model to fine-tune
trust_remote_code: true # Required for some model architectures
dtype: "bfloat16" # Training precision (float32, float16, or bfloat16)
data:
train: # Training dataset mixture
datasets:
- dataset_name: "tatsu-lab/alpaca" # Training dataset
split: "train" # Dataset split to use
training:
output_dir: "output/my_training_run" # Where to save outputs
num_train_epochs: 3 # Number of training epochs
learning_rate: 5e-5 # Learning rate
save_steps: 100 # Checkpoint frequency
You can override any value either through the CLI or programmatically:
oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml \
--training.learning_rate 1e-4 \
--training.max_steps 30
from oumi import train
from oumi.core.configs import TrainingConfig
# Load base config
config = TrainingConfig.from_yaml("configs/recipes/smollm/sft/135m/quickstart_train.yaml")
# Override specific values
config.training.learning_rate = 1e-4
config.training.max_steps = 30
# Start training
train(config)
Common Workflows#
In the following sections, we’ll cover some common workflows for training.
Fine-tuning a Pre-trained Model#
The simplest workflow is to fine-tune a pre-trained model on a dataset. The following will fully finetune the model using SFT (supervised fine-tuning).
model:
model_name: "meta-llama/Meta-Llama-3.2-3B-Instruct" # Replace with your model
trust_remote_code: true
dtype: "bfloat16"
data:
train: # Training dataset mixture, can be a single dataset or a list of datasets
datasets:
- dataset_name: "yahma/alpaca-cleaned" # Replace with your dataset, or add more datasets
split: "train"
training:
output_dir: "output/llama-finetuned" # Where to save outputs
optimizer: "adamw_torch_fused"
learning_rate: 2e-5
max_steps: 10 # Number of training steps
Using Parameter-Efficient Fine-tuning (PEFT)#
Excellent results can be achieved at a fraction of the computational cost by fine-tuning your network with Low Rank (LoRA) adapters instead of updating all original parameters. The following adaptation enables parameter efficient fine-tuning with very few additions:
model:
model_name: "meta-llama/Meta-Llama-3.2-3B-Instruct" # Replace with your model
trust_remote_code: true
dtype: "bfloat16"
data:
train: # Training dataset mixture, can be a single dataset or a list of datasets
datasets:
- dataset_name: "yahma/alpaca-cleaned" # Replace with your dataset, or add more datasets
split: "train"
training:
output_dir: "output/llama-finetuned" # Where to save outputs
optimizer: "adamw_torch_fused"
learning_rate: 2e-5
max_steps: 10 # Number of training steps
use_peft: True # Activate Parameter Efficient Fine-Tuning
peft: # Control key hyper-parameters of the PEFT training process
lora_r: 64
lora_alpha: 128
lora_target_modules: # Select the modules for which adapters will be added
- "q_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
Fine-tuning a Vision-Language Model#
Multimodal support in Oumi is similar to support for text-only models with few config changes e.g., data collation. You can find more details in Vision-Language SFT, VL SFT Datasets, Multi-modal Inference, and Multi-modal Benchmarks.
Multi-GPU Training#
To train with multiple GPUs, we can extend that same configuration to use distributed training, using either DDP or FSDP:
# Using DDP (DistributedDataParallel)
oumi distributed torchrun \
-m oumi train \
-c configs/recipes/llama3_2/sft/3b_full/train.yaml
# Using FSDP (Fully Sharded Data Parallel)
oumi distributed torchrun \
-m oumi train \
-c configs/recipes/llama3_2/sft/3b_full/train.yaml \
--fsdp.enable_fsdp true \
--fsdp.sharding_strategy FULL_SHARD
Launch Remote Training#
To kick off a training run on a cloud environment, you can use the launcher system.
This will create a GCP job with the specified configuration and start training:
oumi launch up -c configs/recipes/llama3_2/sft/3b_full/gcp_job.yaml --cluster llama3b-sft
Thanks to the integration with Skypilot, most cloud providers are supported – make sure to check out Running Jobs on Clusters for more details.
Multi-node Training#
To train with multiple nodes using the Oumi launcher, set num_nodes
to your desired number of nodes.
Using Custom Datasets#
To use your own datasets, you can specify the path to the dataset in the configuration.
data:
train:
datasets:
- dataset_name: "text_sft"
dataset_path: "/path/to/dataset.jsonl"
In this case, the dataset is expected to be in the conversation
format. See Chat Formats for all the supported formats.
Training Output#
Throughout the training process, we generate logs and artifacts to help you track progress and debug issues in the config.output_dir
directory.
This includes model checkpoints for resuming training, detailed training logs, TensorBoard events for visualization, and a backup of the training configuration.
Next Steps#
Now that we covered the basics, as a next step you can:
Learn about different training methods
Set up your training environment and get started training
Explore configuration options