Core Concepts#

Overview#

Oumi combines enterprise-grade reliability with research-friendly flexibility, supporting the complete foundation model lifecycle from pretraining to deployment.

This guide introduces the core concepts, and terminology used throughout Oumi, as well as the architecture and guiding design principles. Understanding these core terms will help you navigate Oumi’s documentation and features effectively.

The following diagram illustrates the typical workflow in Oumi. You can either start from scratch (pre-training), or continue from a SOTA model (post-training or continued pre-training).

        %%{init: {'theme': 'base', 'themeVariables': { 'background': '#f5f5f5'}}}%%
graph LR
    %% Data stage connections
    DS[Datasets] --> |Existing Datasets| TR[Training]
    DS --> |Data Synthesis| TR

    %% Training methods
    TR --> |Pre-training| EV[Evaluation]
    TR --> |SFT| EV
    TR --> |DPO| EV

    %% Evaluation methods spread horizontally
    EV --> |Generative| INF[Inference]
    EV --> |Multi-choice| INF
    EV --> |LLM Judge| INF

    %% Style for core workflow
    style DS fill:#1565c0,color:#ffffff
    style TR fill:#1565c0,color:#ffffff
    style EV fill:#1565c0,color:#ffffff
    style INF fill:#1565c0,color:#ffffff
    

Core Concepts#

Oumi CLI#

The CLI is the entry point for all Oumi commands.

oumi <command> [options]

For detailed help on any command, you can use the --help option:

oumi --help            # for general help
oumi <command> --help  # for command-specific help

The available commands are:

Command

Purpose

train

Train a model.

evaluate

Evaluate a model.

infer

Run inference on a model.

launch

Launch jobs remotely.

judge

Judge datasets, models or conversations.

env

Prints information about the current environment.

distributed

A wrapper for torchrun/accelerate with reasonable default values for distributed training.

See CLI Reference for full CLI details.

Python API#

The Python API allows you to use Oumi to train, evaluate, infer, judge, and more. You can use it in a notebook, a script, or any custom workflow.

For example, to train a model, you can use the train function:

from oumi.train import train
from oumi.core.configs import TrainingConfig

config = TrainingConfig.from_yaml("path/to/config.yaml")
train(config)

See oumi for full API details.

Configs#

To provide recordability and reproducibility for common workflows, Oumi uses exhaustive configs to define all the parameters for each step.

Config Type

Purpose

Documentation

Training

Model training workflows

Training Configuration

Evaluation

Benchmark configurations

Evaluation

Inference

Inference settings

Inference

Launcher

Deployment settings

Running Jobs on Clusters

Example config structure:

# Example training recipe
model:
  name: meta-llama/Llama-3.1-70B-Instruct
  trust_remote_code: true

data:
  train:
    datasets:
      - dataset_name: text_sft
        dataset_path: path/to/data
    stream: true

training:
  trainer_type: TRL_SFT
  learning_rate: 1e-4

For a full list of recipes, you can explore the recipes page.

Other Key Concepts#

Term

Description

Documentation

Recipe

Predefined configurations in Oumi for common model training, evaluation and inference workflows

Recipes

Launcher

Oumi’s job orchestration system for running workloads across different cloud providers

Running Jobs on Clusters

Models

Model architectures and implementations. Oumi supports most models from HuggingFace’s transformers library, as well as custom models.

Custom Models

Datasets

Data loading and preprocessing pipelines

Datasets

Trainers

Orchestrate training process and optimization. Oumi supports custom trainers, as well as trainers from HuggingFace’s transformers, TRL, and many others in the future.

Training Methods

Data Mixtures

Oumi’s system for combining and weighting multiple datasets during training

Datasets

Oumi Judge

Built-in system for evaluating model outputs based on customizable attributes (e.g. helpfulness, honesty, and safety)

LLM Judge

Next Steps#

  1. Get started with Oumi: First install Oumi, then follow the Quickstart guide to run your first training job.

  2. Explore example recipes: Check out the Recipes page and try running a few examples.

  3. Dive deeper with tutorials: The Tutorials provide step-by-step guidance on specific tasks and workflows.

  4. Learn more about key functionalities: Explore detailed guides on training, inference, evaluation, and model judging.