Models

Models#

Oumi provides a unified interface for working with foundation models from multiple providers, including HuggingFace, Meta, NanoGPT, or your own custom models. Whether you’re performing inference, fine-tuning, pre-training, or evaluation, Oumi simplifies the process with seamless integrations.

Out-of-the-box, Oumi supports popular causal LLMs and large vision-language models, with optimized implementations available for efficient use. For a comprehensive list of supported models, configuration examples, and best practices, see the Recipes page.

This guide provides a quick overview of Oumi’s unified interface, demonstrating how to instantiate models, customize their parameters, configure underlying tokenizers, and more, enabling seamless integration with your applications.

Main Model Interface#

Using the functions oumi.builders.build_model() and oumi.builders.build_tokenizer(), you can instantiate models and tokenizers, regardless of their architecture. To further configure and customize a model, you can use the oumi.core.configs.ModelParams class.

# Example using Oumi's main model interface
import torch
from oumi.builders import build_model, build_tokenizer
from oumi.core.configs import ModelParams

# Specify parameters to customize your model
model_params = ModelParams(model_name="HuggingFaceTB/SmolLM-135M", tokenizer_kwargs={'pad_token': '<|endoftext|>'})

# Build the model
device = torch.device("cpu") # or gpu, or mps, etc.
model = build_model(model_params).to(device)

# Build a corresponding tokenizer
tokenizer = build_tokenizer(model_params)
input_data = tokenizer("What are the benefits of open source coding?", return_tensors="pt")

# Use the same interface regardless of model type for generation
outputs = model.generate(input_data['input_ids'].to(device), attention_mask=input_data['attention_mask'].to(device), max_length=64)
print(tokenizer.decode(outputs[0]))

Hugging Face Hub Integration#

Oumi integrates directly with the Hugging Face Hub and Hugging Face transformers library, allowing you to use any model available on Hugging Face Hub:

from oumi.builders import build_model, build_tokenizer
from oumi.core.configs import ModelParams

# Configure model parameters
model_params = ModelParams(model_name="meta-llama/Llama-3.2-3B-Instruct")

# Build model and tokenizer
model = build_model(model_params)
tokenizer = build_tokenizer(model_params)

Custom Models#

You can also easily create custom models by extending our base classes:

from oumi.core.models import BaseModel

class MyCustomModel(BaseModel):
    """Create your own model architecture."""
    def __init__(self, config):
        super().__init__(config)
        # Define your architecture

For detailed implementation guidance on this subject, see the Custom Models documentation.

Advanced Topics#

Tokenizer Integration#

Oumi ensures consistent tokenizer handling through the core.tokenizers module. Tokenizers can be configured independently of models while maintaining full compatibility.

from builders import build_tokenizer
from core.configs import ModelParams

# Configure tokenizer with model
model_params = ModelParams(
    model_name="meta-llama/Llama-3.2-3B-Instruct",
    tokenizer_name="meta-llama/Llama-3.2-3B-Instruct",   # Optional: use different tokenizer
    model_max_length=4096,                               # Set custom max length
    chat_template="llama3-instruct"                      # Specify chat template
)

# Build tokenizer with settings
tokenizer = build_tokenizer(model_params)

For details on handling special tokens, refer to core.tokenizers.get_default_special_tokens().

Parameter Adapters and Quantization#

Oumi supports loading models with PEFT adapters and quantization for efficiency purposes. You can configure these through ModelParams:

from oumi.core.configs import ModelParams, PeftParams

# Load a model with a PEFT adapter
model_params = ModelParams(
    model_name="meta-llama/Llama-3.2-3B-Instruct",
    adapter_model="path/to/adapter",  # Load PEFT adapter
)

# Load a model with 8-bit quantization
model_params = ModelParams(
    model_name="meta-llama/Llama-3.2-3B-Instruct",
    peft_params=PeftParams(
        q_lora=True,  # Enable quantization
        q_lora_bits=8,
        lora_r=16,
        lora_alpha=32,
        lora_dropout=0.1
    )
)

# Build the model with adapter/quantization
model = build_model(model_params)

The framework supports:

PEFT Adapters: Load trained LoRA or other PEFT adapters using the adapter_model parameter
Quantization: Enable 8-bit (or 4-bit) quantization through PeftParams with q_lora and q_lora_bits
Mixed Precision: Control model precision using torch_dtype parameter

For more details on training with adapters and quantization, see Training Configuration.

Chat Templates#

Oumi uses Jinja2 templates to format conversations for different model architectures. Oumi’s default templates ensure that messages are formatted correctly for each model’s expected input format.

Available templates include:

default - Basic template without special tokens
llama3-instruct - For Llama 3 instruction models
llava - For LLaVA multimodal models
phi3-instruct - For Phi-3 instruction models
qwen2-vl-instruct - For Qwen2-VL instruction models
zephyr - For Zephyr models

All the templates expect a messages list, where each message is a dictionary with role and content keys in oumi format.

Here’s an example of the Llama3 template:

You can find all supported templates in the src/oumi/datasets/chat_templates directory. Each template is designed to match the training format of its corresponding model architecture.

Next Steps#

For more detailed information about working with models, see:

Recipes - Detailed configuration examples
Training - Model fine-tuning guide
Evaluation - Model evaluation and benchmarking
Inference - Inference guide