Core Concepts

Core Concepts#

Overview#

Oumi combines enterprise-grade reliability with research-friendly flexibility, supporting the complete foundation model lifecycle from pretraining to deployment.

This guide introduces the core concepts, and terminology used throughout Oumi, as well as the architecture and guiding design principles. Understanding these core terms will help you navigate Oumi’s documentation and features effectively.

The following diagram illustrates the typical workflow in Oumi. You can either start from scratch (pre-training), or continue from a SOTA model (post-training or continued pre-training).

Core Concepts#

Oumi CLI#

The CLI is the entry point for all Oumi commands.

oumi <command> [options]

For detailed help on any command, you can use the --help option:

oumi --help            # for general help
oumi <command> --help  # for command-specific help

The available commands are:

Command	Purpose
`train`	Train a model.
`evaluate`	Evaluate a model.
`infer`	Run inference on a model.
`launch`	Launch jobs remotely.
`judge`	Judge datasets, models or conversations.
`env`	Prints information about the current environment.
`distributed`	A wrapper for torchrun/accelerate with reasonable default values for distributed training.

Any Oumi command which takes a config path as an argument (train, evaluate, infer, etc.) can override parameters from the command line. For example:

oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml \
  --training.max_steps 20 \
  --training.learning_rate 1e-4 \
  --data.train.datasets[0].shuffle true \
  --training.output_dir output/smollm-135m-sft

See CLI Reference for full CLI details, including more details about CLI overrides.

Python API#

The Python API allows you to use Oumi to train, evaluate, infer, judge, and more. You can use it in a notebook, a script, or any custom workflow.

For example, to train a model, you can use the train function:

from oumi.train import train
from oumi.core.configs import TrainingConfig

config = TrainingConfig.from_yaml("path/to/config.yaml")
train(config)

See oumi for full API details.

Configs#

To provide recordability and reproducibility for common workflows, Oumi uses exhaustive configs to define all the parameters for each step.

Config Type	Purpose	Documentation
Training	Model training workflows	Training Configuration
Evaluation	Benchmark configurations	Evaluation
Inference	Inference settings	Inference
Launcher	Deployment settings	Running Jobs on Clusters

Example config structure:

# Example training recipe
model:
  name: meta-llama/Llama-3.1-70B-Instruct
  trust_remote_code: true

data:
  train:
    datasets:
      - dataset_name: text_sft
        dataset_path: path/to/data
    stream: true

training:
  trainer_type: TRL_SFT
  learning_rate: 1e-4

For a full list of recipes, you can explore the recipes page.

Other Key Concepts#

Term	Description	Documentation
Recipe	Predefined configurations in Oumi for common model training, evaluation and inference workflows	Recipes
Launcher	Oumi’s job orchestration system for running workloads across different cloud providers	Running Jobs on Clusters
Models	Model architectures and implementations. Oumi supports most models from HuggingFace’s `transformers` library, as well as custom models.	Custom Models
Datasets	Data loading and preprocessing pipelines	Datasets
Trainers	Orchestrate training process and optimization. Oumi supports custom trainers, as well as trainers from HuggingFace’s `transformers`, `TRL`, and many others in the future.	Training Methods
Data Mixtures	Oumi’s system for combining and weighting multiple datasets during training	Datasets
Oumi Judge	Built-in system for evaluating model outputs based on customizable attributes (e.g. helpfulness, honesty, and safety)	LLM Judge

Navigating the Repository#

To contribute to Oumi or troubleshoot issues, it’s helpful to understand how the repository is structured. Here’s a breakdown of the key directories:

Core Components#

src/oumi/: Main package directory
- core/: Core functionality and base classes
- models/: Model architectures and implementations
- datasets/: Dataset loading and processing
- inference/: Inference engines and serving
- evaluation/: Evaluation pipelines and metrics
- judges/: Implementation of Oumi Judge system
- launcher/: Job orchestration and resource management
- cli/: Command-line interface tools
- utils/: Common utilities and helper functions

Configuration and Examples#

configs/: YAML configuration files
- recipes/: Predefined workflows for common tasks
notebooks/: Example notebooks and tutorials
tests/: Test suite (mirrors src/ structure)
docs/: Documentation and guides

Development Tools#

pyproject.toml: Project dependencies and build settings
Makefile: Common development commands
scripts/: Utility scripts for development
.github/: CI/CD workflows and GitHub configurations

Next Steps#

Get started with Oumi: First install Oumi, then follow the Quickstart guide to run your first training job.
Explore example recipes: Check out the Recipes page and try running a few examples.
Dive deeper with tutorials: The Tutorials provide step-by-step guidance on specific tasks and workflows.
Learn more about key functionalities: Explore detailed guides on training, inference, evaluation, and model judging.

Core Concepts

Contents

Core Concepts#

Overview#

Core Concepts#

Oumi CLI#

Python API#

Configs#

Other Key Concepts#

Navigating the Repository#

Core Components#

Configuration and Examples#

Development Tools#

Next Steps#