Models#

Oumi OSS provides a unified interface for working with foundation models from multiple providers, including HuggingFace, Meta, NanoGPT, or your own custom models. Whether you’re performing inference, fine-tuning, pre-training, or evaluation, Oumi OSS simplifies the process with seamless integrations.

Out-of-the-box, Oumi OSS supports popular causal LLMs and large vision-language models, with optimized implementations available for efficient use. For a comprehensive list of supported models, configuration examples, and best practices, see the Recipes page.

This guide provides a quick overview of Oumi OSS’s unified interface, demonstrating how to instantiate models, customize their parameters, configure underlying tokenizers, and more, enabling seamless integration with your applications.

Main Model Interface#

Using the functions oumi.builders.build_model() and oumi.builders.build_tokenizer(), you can instantiate models and tokenizers, regardless of their architecture. To further configure and customize a model, you can use the oumi.core.configs.ModelParams class.

# Example using Oumi OSS's main model interface
import torch
from oumi.builders import build_model, build_tokenizer
from oumi.core.configs import ModelParams

# Specify parameters to customize your model
model_params = ModelParams(model_name="HuggingFaceTB/SmolLM-135M", tokenizer_kwargs={'pad_token': '<|endoftext|>'})

# Build the model
device = torch.device("cpu") # or gpu, or mps, etc.
model = build_model(model_params).to(device)

# Build a corresponding tokenizer
tokenizer = build_tokenizer(model_params)
input_data = tokenizer("What are the benefits of open source coding?", return_tensors="pt")

# Use the same interface regardless of model type for generation
outputs = model.generate(input_data['input_ids'].to(device), attention_mask=input_data['attention_mask'].to(device), max_length=64)
print(tokenizer.decode(outputs[0]))

Hugging Face Hub Integration#

Oumi OSS integrates directly with the Hugging Face Hub and Hugging Face transformers library, allowing you to use any model available on Hugging Face Hub:

from oumi.builders import build_model, build_tokenizer
from oumi.core.configs import ModelParams

# Configure model parameters
model_params = ModelParams(model_name="meta-llama/Llama-3.2-3B-Instruct")

# Build model and tokenizer
model = build_model(model_params)
tokenizer = build_tokenizer(model_params)

Custom Models#

You can also easily create custom models by extending our base classes:

from oumi.core.models import BaseModel

class MyCustomModel(BaseModel):
    """Create your own model architecture."""
    def __init__(self, config):
        super().__init__(config)
        # Define your architecture

For detailed implementation guidance on this subject, see the Custom Models documentation.

Advanced Topics#

Tokenizer Integration#

Oumi OSS ensures consistent tokenizer handling through the oumi.core.tokenizers module. Tokenizers can be configured independently of models while maintaining full compatibility.

from oumi.builders import build_tokenizer
from oumi.core.configs import ModelParams

# Configure tokenizer with model
model_params = ModelParams(
    model_name="meta-llama/Llama-3.2-3B-Instruct",
    tokenizer_name="meta-llama/Llama-3.2-3B-Instruct",   # Optional: use different tokenizer
    model_max_length=4096,                               # Set custom max length
    chat_template="llama3-instruct"                      # Specify chat template
)

# Build tokenizer with settings
tokenizer = build_tokenizer(model_params)

For details on handling special tokens, refer to oumi.core.tokenizers.get_default_special_tokens().

Parameter Adapters and Quantization#

Oumi OSS supports loading models with PEFT adapters and quantization for efficiency purposes. You can configure these through ModelParams:

from oumi.core.configs import ModelParams, PeftParams

# Load a model with a PEFT adapter
model_params = ModelParams(
    model_name="meta-llama/Llama-3.2-3B-Instruct",
    adapter_model="path/to/adapter",  # Load PEFT adapter
)

# Load a model with 8-bit quantization
model_params = ModelParams(
    model_name="meta-llama/Llama-3.2-3B-Instruct",
    peft_params=PeftParams(
        q_lora=True,  # Enable quantization
        q_lora_bits=8,
        lora_r=16,
        lora_alpha=32,
        lora_dropout=0.1
    )
)

# Build the model with adapter/quantization
model = build_model(model_params)

The framework supports:

  • PEFT Adapters: Load trained LoRA or other PEFT adapters using the adapter_model parameter

  • Quantization: Enable 8-bit (or 4-bit) quantization through PeftParams with q_lora and q_lora_bits

  • Mixed Precision: Control model precision using torch_dtype parameter

For more details on training with adapters and quantization, see Training Configuration.

Chat Templates#

Oumi OSS uses Jinja2 templates to format conversations for different model architectures. Oumi OSS’s default templates ensure that messages are formatted correctly for each model’s expected input format.

Available templates include:

  • default - Basic template without special tokens

  • llama3-instruct - For Llama 3 instruction models

  • llava - For LLaVA multimodal models

  • phi3-instruct - For Phi-3 instruction models

  • qwen2-vl-instruct - For Qwen2-VL instruction models

  • qwen3-vl-instruct - For Qwen3-VL instruction models

  • zephyr - For Zephyr models

All the templates expect a messages list, where each message is a dictionary with role and content keys in oumi format.

Here’s an example of the Llama3 template:

src/oumi/datasets/chat_templates/llama3-instruct.jinja
{% set role_prefix = '<|start_header_id|>' %}
{% set role_suffix = '<|end_header_id|>\n\n' %}
{% set turn_suffix = '<|eot_id|>' %}
{% set image_token = '<|image|>' %}

{{ bos_token }}
{%- for message in messages -%}
    {{ role_prefix + message['role'] + role_suffix }}

    {%- if message['content'] is string -%}
        {{ message['content'] | trim }}
    {%- elif message['content'] is iterable -%}
        {%- for item in message['content'] -%}
            {%- if item['type'] == 'text' -%}
                {{  (item['text'] if 'text' in item else item['content']) | trim }}
            {%- elif item['type'].startswith('image') -%}
                {{  image_token }}
            {%- endif -%}
        {%- endfor -%}
    {%- endif -%}
    {{ turn_suffix }}
{%- endfor -%}

{%- if add_generation_prompt -%}
    {{ role_prefix + 'assistant' + role_suffix }}
{%- endif -%}

You can find all supported templates in the src/oumi/datasets/chat_templates directory. Each template is designed to match the training format of its corresponding model architecture.

Next Steps#

For more detailed information about working with models, see: