Chat Formats#
Oumi supports multiple popular input formats for representing conversations and messages for AI applications. For example, you can use the Oumi Conversation
format, which is an extension of OpenAI’s JSON format, or the Alpaca Instruction
format, which is a simplified format that is easier to work with for single-turn instruction models.
The chat data representation is designed to be:
Type-Safe: Built on pydantic models for strong type checking and validation
Flexible: Supports various content types including text, images, and multimodal conversations
Extensible: Easy to add metadata and custom attributes
Standardized: Follows common conventions for AI conversation formats
In this guide, we will look at examples of each supported format, and how to use each format in oumi. This is essential for using your own chat data with oumi train
, infer
, and judge
commands.
Using Your Own Chat Data#
In general, to use your own data with Oumi, you need to convert it into a format that can be loaded by one of the existing oumi.datasets
classes.
For chat datasets in particular, which are used for Supervised Fine-tuning, Preference Tuning, Vision-Language SFT training, and for all inference tasks, we recommend using TextSftJsonLinesDataset
(registered as text_sft
) for text-only conversations, or VLJsonlinesDataset
(registered as vl_sft
) for multimodal conversations.
Individual Example Formats#
The conversation format is used internally by all the SFT, Preference Tuning, and Vision-Language dataset classes. Each example is a JSON object with a list of messages:
{
"messages": [
{
"role": "user",
"content": "What's the weather like in Seattle today?"
},
{
"role": "assistant",
"content": "I apologize, but I don't have access to real-time weather information for Seattle."
}
]
}
This format is:
Used by internally by Oumi for most tasks. We recommend using it by default for your chat data.
Supports multi-turn conversations with system messages
Can handle multimodal content (text + images)
Allows for additional metadata
The Alpaca format can be used as an alternative to the conversation format. Each example is a JSON object with instruction/input/output fields:
{
"instruction": "What's the weather like in Seattle today?",
"input": "",
"output": "I apologize, but I don't have access to real-time weather information for Seattle."
}
This format is:
Simple and straightforward for single-turn interactions
However, it does not support system messages or multi-turn conversations
Loading Your Data#
Once you have converted each individual example into a Conversation
or Alpaca
object, and saved it to a file (with the appropriate extension: .jsonl
for jsonlines
, or .json
for json
), you can easily load it for training or inference.
For example, let’s look at a training config:
data:
train:
datasets:
- dataset_name: text_sft
dataset_path: path/to/conversations.jsonl
This dataset class:
Handles both conversation (similar to OpenAI’s format) and instruction (alpaca) formats
Supports system messages and multi-turn conversations
data:
train:
datasets:
- dataset_name: vl_sft
dataset_path: path/to/multimodal.jsonl
This dataset class:
Handles conversations with both text and images (similar to OpenAI’s format)
Supports multiple image formats (URL, local path, base64)
Can process image-text pairs and multi-turn visual conversations
You can also load it with the python API:
from oumi.datasets import TextSftJsonLinesDataset, VLJsonlinesDataset
text_dataset = TextSftJsonLinesDataset(dataset_path="path/to/conversations.jsonl")
print(text_dataset.conversation(0)) # prints the first conversation
vl_dataset = VLJsonlinesDataset(dataset_path="path/to/multimodal.jsonl")
print(vl_dataset.conversation(0)) # prints the first conversation
Or build examples from scratch:
from oumi.core.types.conversation import Conversation, Message, Role
conversation = Conversation(
messages=[Message(role=Role.USER, content="Hi there!")],
metadata={"timestamp": "2025-01-01"}
)
Examples#
Let’s looks at some examples of how these formats look in practice. You can further directly inspect small sample datasets in these formats available here.
Multi-turn with System Message#
Example with System Message
{
"messages": [
{
"role": "system",
"content": "You are a helpful AI assistant with knowledge about architecture."
},
{
"role": "user",
"content": "Tell me about the Golden Gate Bridge."
},
{
"role": "assistant",
"content": "The Golden Gate Bridge is an iconic suspension bridge in San Francisco."
}
]
}
Multimodal Conversation#
Example with Image and Text
{
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"content": "https://example.com/image_of_dog.jpg"
},
{
"type": "text",
"content": "What breed is this dog?"
}
]
},
{
"role": "assistant",
"content": "This appears to be a Shih Tzu puppy."
}
]
}
Conversation with Metadata#
Example with Additional Metadata
{
"messages": [
{
"role": "user",
"content": "How can I make good espresso at home?"
},
{
"role": "assistant",
"content": "Here are some key tips for making espresso at home:\n1. Use freshly roasted beans\n2. Grind just before brewing\n3. Use the right pressure\n4. Maintain proper temperature"
}
],
"metadata": {
"category": "coffee_brewing",
"timestamp": "2025-01-11T11:22:00Z"
}
}
Core Data Structures#
Oumi uses structured data formats implemented with pydantic models for robust type checking and validation:
Message Format#
The basic unit of conversation is the Message
class, which represents a single message in a conversation. Key attributes include:
id
: Optional unique identifier for the messagerole
: The role of the entity sending the messagecontent
: Text content of the message for text messages, or a list ofContentItem
-s for multimodal messages e.g., an image and text content items.
from oumi.core.types.conversation import Message, Role
message = Message(
role=Role.USER,
content="Hello, how can I help you?"
)
Available roles:
Role.SYSTEM
: System instructionsRole.USER
: User messagesRole.ASSISTANT
: AI assistant responsesRole.TOOL
: Tool/function calls
Conversation#
The Conversation
class represents a sequence of messages. Key attributes include:
conversation_id
: Optional unique identifier for the conversationmessages
: List ofMessage
objects that make up the conversationmetadata
: Optional dictionary for storing additional information about the conversation
Content Types#
For multimodal content, use ContentItem
with appropriate types:
from oumi.core.types.conversation import ContentItem, Type
# Text content
text_content = ContentItem(
type=Type.TEXT,
content="What's in this image?"
)
# Image content
image_content = ContentItem(
type=Type.IMAGE_URL,
content="https://example.com/image.jpg"
)
Available types:
Type.TEXT
: Text contentType.IMAGE_PATH
: Local image pathType.IMAGE_URL
: Remote image URLType.IMAGE_BINARY
: Raw image data
ContentItem#
The ContentItem
class represents a single type part of content used in multimodal messages in a conversation. Key attributes include:
type
: The type of the contentcontent
: Optional text content (used for content text items, or to store image URL or path forIMAGE_URL
andIMAGE_PATH
content items respectively).binary
: Optional binary data for the content item (used for images)
Either content
or binary
must be provided when creating a ContentItem
instance.
Working with Conversations#
Creating Conversations#
from oumi.core.types.conversation import Conversation, Message, Role
conversation = Conversation(
messages=[
Message(role=Role.USER, content="Hi there!"),
Message(role=Role.ASSISTANT, content="Hello! How can I help?")
],
metadata={"source": "customer_support"}
)
>>> from oumi.core.types.conversation import ContentItem, Message, Role
>>> # Create a simple text message
>>> text_message = Message(role=Role.USER, content="Hello, world!")
>>> text_message.role
<Role.USER: 'user'>
>>> text_message.content
'Hello, world!'
>>> # Create an image message
>>> image_message = Message(role=Role.USER, content=[ContentItem(type=Type.IMAGE_BINARY, binary=b"image_bytes")])
>>> image_message.type
<Type.IMAGE_BINARY: 'image_binary'>
Conversation Methods#
# Get first message of a specific role
first_user = conversation.first_message(role=Role.USER)
# Get all messages from a role
assistant_msgs = conversation.filter_messages(role=Role.ASSISTANT)
# Get the last message
last_msg = conversation.last_message()
>>> from oumi.core.types.conversation import ContentItem, Conversation, Message, Role
>>> # Create a conversation with multiple messages
>>> conversation = Conversation(
... messages=[
... Message(role=Role.USER, content="Hi there!"),
... Message(role=Role.ASSISTANT, content="Hello! How can I help?"),
... Message(role=Role.USER, content="What's the weather?")
... ],
... metadata={"source": "customer_support"}
... )
>>> # Get the first user message
>>> first_user = conversation.first_message(role=Role.USER)
>>> first_user.content
'Hi there!'
>>> # Get all assistant messages
>>> assistant_msgs = conversation.filter_messages(role=Role.ASSISTANT)
>>> len(assistant_msgs)
1
>>> assistant_msgs[0].content
'Hello! How can I help?'
>>> # Get the last message
>>> last_msg = conversation.last_message()
>>> last_msg.content
"What's the weather?"
Serialization#
# Convert to JSON
json_data = conversation.to_json()
# Load from JSON
restored = Conversation.from_json(json_data)
>>> from oumi.core.types.conversation import ContentItem, Conversation, Message, Role
>>> # Serialize to JSON
>>> conversation = Conversation(
... messages=[Message(role=Role.USER, content="Hello!")],
... metadata={"timestamp": "2025-01-01"}
... )
>>> json_data = conversation.to_json()
>>> print(json_data)
{"messages":[{"content":"Hello!","role":"user"}],"metadata":{"timestamp":"2025-01-01"}}
>>> # Deserialize from JSON
>>> restored = Conversation.from_json(json_data)
>>> restored.messages[0].content
'Hello!'
>>> restored.metadata["timestamp"]
'2025-01-01'
Data Validation#
Oumi uses pydantic models to automatically validate:
Message role values
Content type consistency
Required fields presence
Data type correctness