oumi#
Oumi (Open Universal Machine Intelligence) library.
This library provides tools and utilities for training, evaluating, and inferring with machine learning models, particularly focused on language tasks.
- Modules:
models
: Contains model architectures and related utilities.evaluate
: Functions for evaluating models.evaluate_async
: Asynchronous evaluation functionality.infer
: Functions for model inference, including interactive mode.train
: Training utilities for machine learning models.utils
: Utility functions, including logging configuration.judges
: Functions for judging datasets and model responses.
- Functions:
train()
: Train a machine learning model.evaluate_async()
: Asynchronously evaluate a model.evaluate()
: Evaluate a model using LM Harness.infer()
: Perform inference with a trained model.infer_interactive()
: Run interactive inference with a model.judge_dataset()
: Judge a dataset using a model.
Examples
Training a model:
>>> from oumi import train
>>> from oumi.core.configs import TrainingConfig
>>> config = TrainingConfig(...)
>>> train(config)
Evaluating a model:
>>> from oumi import evaluate
>>> from oumi.core.configs import EvaluationConfig
>>> config = EvaluationConfig(...)
>>> results = evaluate(config)
Performing inference:
>>> from oumi import infer
>>> from oumi.core.configs import InferenceConfig
>>> config = InferenceConfig(...)
>>> outputs = infer(config)
Judging a dataset:
>>> from oumi import judge_dataset
>>> from oumi.core.configs import JudgeConfig
>>> config = JudgeConfig(...)
>>> judge_dataset(config, dataset)
See also
oumi.core.configs
: For configuration classes used in Oumi
- oumi.evaluate(config: EvaluationConfig) list[dict[str, Any]] [source]#
Evaluates a model using the provided configuration.
- Parameters:
config – The desired configuration for evaluation.
- Returns:
A list of evaluation results (one for each task). Each evaluation result is a dictionary of metric names and their corresponding values.
- oumi.evaluate_async(config: AsyncEvaluationConfig) None [source]#
Runs an async evaluation for a model using the provided configuration.
- Overview:
This is a utility method for running evaluations iteratively over a series of checkpoints. This method can be run in parallel with a training job to compute metrics per checkpoint without wasting valuable time in the main training loop.
- Parameters:
config – The desired configuration for evaluation.
- Returns:
None.
- oumi.infer(config: InferenceConfig, inputs: list[str] | None = None, inference_engine: BaseInferenceEngine | None = None, *, input_image_bytes: bytes | None = None) list[Conversation] [source]#
Runs batch inference for a model using the provided configuration.
- Parameters:
config – The configuration to use for inference.
inputs – A list of inputs for inference.
inference_engine – The engine to use for inference. If unspecified, the engine will be inferred from config.
input_image_bytes – An input PNG image bytes to be used with image+text VLLMs. Only used in interactive mode.
- Returns:
A list of model responses.
- Return type:
object
- oumi.infer_interactive(config: InferenceConfig, *, input_image_bytes: bytes | None = None) None [source]#
Interactively provide the model response for a user-provided input.
- oumi.judge_conversations(config: JudgeConfig, judge_inputs: list[Conversation]) list[dict[str, Any]] [source]#
Judge a list of conversations.
This function evaluates a list of conversations using the specified Judge.
The function performs the following steps:
Initializes the Judge with the provided configuration.
Uses the Judge to evaluate each conversation input.
Collects and returns the judged outputs.
- Parameters:
config – The configuration for the judge.
judge_inputs – A list of Conversation objects to be judged.
- Returns:
A list of judgement results for each conversation.
>>> # Example output: [ {'helpful': True, 'safe': False}, {'helpful': True, 'safe': True}, ]
- Return type:
List[Dict[str, Any]]
Example
>>> config = JudgeConfig(...) >>> judge_inputs = [Conversation(...), Conversation(...)] >>> judged_outputs = judge_conversations(config, judge_inputs) >>> for output in judged_outputs: ... print(output)
- oumi.judge_dataset(config: JudgeConfig, dataset: BaseSftDataset) list[dict[str, Any]] [source]#
Judge a dataset.
This function evaluates a given dataset using a specified Judge configuration.
The function performs the following steps:
Initializes the Judge with the provided configuration.
Iterates through the dataset to extract conversation inputs.
Uses the Judge to evaluate each conversation input.
Collects and returns the judged outputs.
- Parameters:
config – The configuration for the judge.
dataset – The dataset to be judged. This dataset should be compatible with the Supervised Finetuning Dataset class.
- Returns:
A list of judgement results for each conversation.
>>> # Example output: [ {'helpful': True, 'safe': False}, {'helpful': True, 'safe': True}, ]
- Return type:
List[Dict[str, Any]]
Example
>>> config = JudgeConfig(...) >>> dataset = SomeDataset(...) >>> judged_outputs = judge_dataset(config, dataset) >>> for output in judged_outputs: ... print(output)
- oumi.train(config: TrainingConfig, **kwargs) None [source]#
Trains a model using the provided configuration.
Subpackages#
- oumi.builders
build_chat_template()
build_collator_from_config()
build_data_collator()
build_dataset()
build_dataset_from_params()
build_dataset_mixture()
build_metrics_function()
build_model()
build_optimizer()
build_peft_model()
build_processor()
build_tokenizer()
build_trainer()
build_training_callbacks()
is_image_text_llm()
- oumi.cli
- oumi.core
- oumi.datasets
AlpacaDataset
AlpacaEvalDataset
ArgillaDollyDataset
ArgillaMagpieUltraDataset
AyaDataset
C4Dataset
COCOCaptionsDataset
ChatRAGBenchDataset
ChatqaDataset
ChatqaTatqaDataset
DebugClassificationDataset
DebugPretrainingDataset
DebugSftDataset
DolmaDataset
FalconRefinedWebDataset
FineWebEduDataset
Flickr30kDataset
LlavaInstructMixVsftDataset
MagpieProDataset
OpenO1SFTDataset
OrpoDpoMix40kDataset
PileV1Dataset
PromptResponseDataset
RedPajamaDataV1Dataset
RedPajamaDataV2Dataset
SlimPajamaDataset
StarCoderDataset
TextSftJsonLinesDataset
TheStackDataset
TinyStoriesDataset
TinyTextbooksDataset
UltrachatH4Dataset
VLJsonlinesDataset
WikiTextDataset
WikipediaDataset
YouTubeCommonsDataset
- Subpackages
- oumi.evaluation
- oumi.inference
AnthropicInferenceEngine
DeepSeekInferenceEngine
GoogleGeminiInferenceEngine
GoogleVertexInferenceEngine
LlamaCppInferenceEngine
NativeTextInferenceEngine
OpenAIInferenceEngine
ParasailInferenceEngine
RemoteInferenceEngine
RemoteVLLMInferenceEngine
SGLangInferenceEngine
TogetherInferenceEngine
VLLMInferenceEngine
- oumi.judges
- oumi.launcher
- oumi.models
- oumi.performance
- oumi.utils
- Submodules
- oumi.utils.batching module
- oumi.utils.conversation_utils module
- oumi.utils.device_utils module
- oumi.utils.distributed_utils module
- oumi.utils.git_utils module
- oumi.utils.hf_datasets_utils module
- oumi.utils.image_utils module
- oumi.utils.io_utils module
- oumi.utils.logging module
- oumi.utils.model_caching module
- oumi.utils.packaging module
- oumi.utils.peft_utils module
- oumi.utils.saver module
- oumi.utils.serialization_utils module
- oumi.utils.str_utils module
- oumi.utils.torch_naming_heuristics module
- oumi.utils.torch_utils module
- oumi.utils.version_utils module