oumi#

Oumi (Open Universal Machine Intelligence) library.

This library provides tools and utilities for training, evaluating, and inferring with machine learning models, particularly focused on language tasks.

Modules:
  • models: Contains model architectures and related utilities.

  • evaluate: Functions for evaluating models.

  • evaluate_async: Asynchronous evaluation functionality.

  • infer: Functions for model inference, including interactive mode.

  • train: Training utilities for machine learning models.

  • utils: Utility functions, including logging configuration.

  • judges: Functions for judging datasets and model responses.

Functions:
  • train(): Train a machine learning model.

  • evaluate_async(): Asynchronously evaluate a model.

  • evaluate(): Evaluate a model using LM Harness.

  • infer(): Perform inference with a trained model.

  • infer_interactive(): Run interactive inference with a model.

  • judge_dataset(): Judge a dataset using a model.

Examples

Training a model:

>>> from oumi import train
>>> from oumi.core.configs import TrainingConfig
>>> config = TrainingConfig(...)
>>> train(config)

Evaluating a model:

>>> from oumi import evaluate
>>> from oumi.core.configs import EvaluationConfig
>>> config = EvaluationConfig(...)
>>> results = evaluate(config)

Performing inference:

>>> from oumi import infer
>>> from oumi.core.configs import InferenceConfig
>>> config = InferenceConfig(...)
>>> outputs = infer(config)

Judging a dataset:

>>> from oumi import judge_dataset
>>> from oumi.core.configs import JudgeConfig
>>> config = JudgeConfig(...)
>>> judge_dataset(config, dataset)

See also

oumi.evaluate(config: EvaluationConfig) list[dict[str, Any]][source]#

Evaluates a model using the provided configuration.

Parameters:

config – The desired configuration for evaluation.

Returns:

A list of evaluation results (one for each task). Each evaluation result is a dictionary of metric names and their corresponding values.

oumi.evaluate_async(config: AsyncEvaluationConfig) None[source]#

Runs an async evaluation for a model using the provided configuration.

Overview:

This is a utility method for running evaluations iteratively over a series of checkpoints. This method can be run in parallel with a training job to compute metrics per checkpoint without wasting valuable time in the main training loop.

Parameters:

config – The desired configuration for evaluation.

Returns:

None.

oumi.infer(config: InferenceConfig, inputs: list[str] | None = None, inference_engine: BaseInferenceEngine | None = None, *, input_image_bytes: bytes | None = None) list[Conversation][source]#

Runs batch inference for a model using the provided configuration.

Parameters:
  • config – The configuration to use for inference.

  • inputs – A list of inputs for inference.

  • inference_engine – The engine to use for inference. If unspecified, the engine will be inferred from config.

  • input_image_bytes – An input PNG image bytes to be used with image+text VLLMs. Only used in interactive mode.

Returns:

A list of model responses.

Return type:

object

oumi.infer_interactive(config: InferenceConfig, *, input_image_bytes: bytes | None = None) None[source]#

Interactively provide the model response for a user-provided input.

oumi.judge_conversations(config: JudgeConfig, judge_inputs: list[Conversation]) list[dict[str, Any]][source]#

Judge a list of conversations.

This function evaluates a list of conversations using the specified Judge.

The function performs the following steps:

  1. Initializes the Judge with the provided configuration.

  2. Uses the Judge to evaluate each conversation input.

  3. Collects and returns the judged outputs.

Parameters:
  • config – The configuration for the judge.

  • judge_inputs – A list of Conversation objects to be judged.

Returns:

A list of judgement results for each conversation.

>>> # Example output:
[
    {'helpful': True, 'safe': False},
    {'helpful': True, 'safe': True},
]

Return type:

List[Dict[str, Any]]

Example

>>> config = JudgeConfig(...) 
>>> judge_inputs = [Conversation(...), Conversation(...)] 
>>> judged_outputs = judge_conversations(config, judge_inputs) 
>>> for output in judged_outputs: 
...     print(output)
oumi.judge_dataset(config: JudgeConfig, dataset: BaseSftDataset) list[dict[str, Any]][source]#

Judge a dataset.

This function evaluates a given dataset using a specified Judge configuration.

The function performs the following steps:

  1. Initializes the Judge with the provided configuration.

  2. Iterates through the dataset to extract conversation inputs.

  3. Uses the Judge to evaluate each conversation input.

  4. Collects and returns the judged outputs.

Parameters:
  • config – The configuration for the judge.

  • dataset – The dataset to be judged. This dataset should be compatible with the Supervised Finetuning Dataset class.

Returns:

A list of judgement results for each conversation.

>>> # Example output:
[
    {'helpful': True, 'safe': False},
    {'helpful': True, 'safe': True},
]

Return type:

List[Dict[str, Any]]

Example

>>> config = JudgeConfig(...) 
>>> dataset = SomeDataset(...) 
>>> judged_outputs = judge_dataset(config, dataset) 
>>> for output in judged_outputs: 
...     print(output)
oumi.train(config: TrainingConfig, **kwargs) None[source]#

Trains a model using the provided configuration.

Subpackages#