Models#
Oumi OSS provides a unified interface for working with foundation models from multiple providers, including HuggingFace, Meta, NanoGPT, or your own custom models. Whether you’re performing inference, fine-tuning, pre-training, or evaluation, Oumi OSS simplifies the process with seamless integrations.
Out-of-the-box, Oumi OSS supports popular causal LLMs and large vision-language models, with optimized implementations available for efficient use. For a comprehensive list of supported models, configuration examples, and best practices, see the Recipes page.
This guide provides a quick overview of Oumi OSS’s unified interface, demonstrating how to instantiate models, customize their parameters, configure underlying tokenizers, and more, enabling seamless integration with your applications.
Main Model Interface#
Using the functions oumi.builders.build_model() and oumi.builders.build_tokenizer(), you can instantiate models and tokenizers, regardless of their architecture. To further configure and customize a model, you can use the oumi.core.configs.ModelParams class.
# Example using Oumi OSS's main model interface
import torch
from oumi.builders import build_model, build_tokenizer
from oumi.core.configs import ModelParams
# Specify parameters to customize your model
model_params = ModelParams(model_name="HuggingFaceTB/SmolLM-135M", tokenizer_kwargs={'pad_token': '<|endoftext|>'})
# Build the model
device = torch.device("cpu") # or gpu, or mps, etc.
model = build_model(model_params).to(device)
# Build a corresponding tokenizer
tokenizer = build_tokenizer(model_params)
input_data = tokenizer("What are the benefits of open source coding?", return_tensors="pt")
# Use the same interface regardless of model type for generation
outputs = model.generate(input_data['input_ids'].to(device), attention_mask=input_data['attention_mask'].to(device), max_length=64)
print(tokenizer.decode(outputs[0]))
Hugging Face Hub Integration#
Oumi OSS integrates directly with the Hugging Face Hub and Hugging Face transformers library, allowing you to use any model available on Hugging Face Hub:
from oumi.builders import build_model, build_tokenizer
from oumi.core.configs import ModelParams
# Configure model parameters
model_params = ModelParams(model_name="meta-llama/Llama-3.2-3B-Instruct")
# Build model and tokenizer
model = build_model(model_params)
tokenizer = build_tokenizer(model_params)
Custom Models#
You can also easily create custom models by extending our base classes:
from oumi.core.models import BaseModel
class MyCustomModel(BaseModel):
"""Create your own model architecture."""
def __init__(self, config):
super().__init__(config)
# Define your architecture
For detailed implementation guidance on this subject, see the Custom Models documentation.
Advanced Topics#
Tokenizer Integration#
Oumi OSS ensures consistent tokenizer handling through the oumi.core.tokenizers module. Tokenizers can be configured independently of models while maintaining full compatibility.
from oumi.builders import build_tokenizer
from oumi.core.configs import ModelParams
# Configure tokenizer with model
model_params = ModelParams(
model_name="meta-llama/Llama-3.2-3B-Instruct",
tokenizer_name="meta-llama/Llama-3.2-3B-Instruct", # Optional: use different tokenizer
model_max_length=4096, # Set custom max length
chat_template="llama3-instruct" # Specify chat template
)
# Build tokenizer with settings
tokenizer = build_tokenizer(model_params)
For details on handling special tokens, refer to oumi.core.tokenizers.get_default_special_tokens().
Parameter Adapters and Quantization#
Oumi OSS supports loading models with PEFT adapters and quantization for efficiency purposes. You can configure these through ModelParams:
from oumi.core.configs import ModelParams, PeftParams
# Load a model with a PEFT adapter
model_params = ModelParams(
model_name="meta-llama/Llama-3.2-3B-Instruct",
adapter_model="path/to/adapter", # Load PEFT adapter
)
# Load a model with 8-bit quantization
model_params = ModelParams(
model_name="meta-llama/Llama-3.2-3B-Instruct",
peft_params=PeftParams(
q_lora=True, # Enable quantization
q_lora_bits=8,
lora_r=16,
lora_alpha=32,
lora_dropout=0.1
)
)
# Build the model with adapter/quantization
model = build_model(model_params)
The framework supports:
PEFT Adapters: Load trained LoRA or other PEFT adapters using the
adapter_modelparameterQuantization: Enable 8-bit (or 4-bit) quantization through
PeftParamswithq_loraandq_lora_bitsMixed Precision: Control model precision using
torch_dtypeparameter
For more details on training with adapters and quantization, see Training Configuration.
Chat Templates#
Oumi OSS uses Jinja2 templates to format conversations for different model architectures. Oumi OSS’s default templates ensure that messages are formatted correctly for each model’s expected input format.
Available templates include:
default- Basic template without special tokensllama3-instruct- For Llama 3 instruction modelsllava- For LLaVA multimodal modelsphi3-instruct- For Phi-3 instruction modelsqwen2-vl-instruct- For Qwen2-VL instruction modelsqwen3-vl-instruct- For Qwen3-VL instruction modelszephyr- For Zephyr models
All the templates expect a messages list, where each message is a dictionary with role and content keys in oumi format.
Here’s an example of the Llama3 template:
src/oumi/datasets/chat_templates/llama3-instruct.jinja
{% set role_prefix = '<|start_header_id|>' %}
{% set role_suffix = '<|end_header_id|>\n\n' %}
{% set turn_suffix = '<|eot_id|>' %}
{% set image_token = '<|image|>' %}
{{ bos_token }}
{%- for message in messages -%}
{{ role_prefix + message['role'] + role_suffix }}
{%- if message['content'] is string -%}
{{ message['content'] | trim }}
{%- elif message['content'] is iterable -%}
{%- for item in message['content'] -%}
{%- if item['type'] == 'text' -%}
{{ (item['text'] if 'text' in item else item['content']) | trim }}
{%- elif item['type'].startswith('image') -%}
{{ image_token }}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
{{ turn_suffix }}
{%- endfor -%}
{%- if add_generation_prompt -%}
{{ role_prefix + 'assistant' + role_suffix }}
{%- endif -%}
You can find all supported templates in the src/oumi/datasets/chat_templates directory. Each template is designed to match the training format of its corresponding model architecture.
Next Steps#
For more detailed information about working with models, see:
Recipes - Detailed configuration examples
Training - Model fine-tuning guide
Evaluation - Model evaluation and benchmarking
Inference - Inference guide