oumi.analyze

oumi.analyze#

Analyzer framework for dataset analysis.

Bases: object

Pipeline for orchestrating multiple analyzers on a dataset.

The AnalysisPipeline manages running multiple analyzers on conversations, handling different analyzer scopes appropriately, and providing unified access to results.

The pipeline can inject shared resources (like tokenizers) into analyzers that need them, ensuring consistent configuration across the analysis.

Note

PreferenceAnalyzers are not run by run(). Use run_preference() separately to analyze preference pairs (chosen/rejected conversations).

Example

>>> from oumi.analyze import AnalysisPipeline, LengthAnalyzer
>>>
>>> pipeline = AnalysisPipeline(
...     analyzers=[LengthAnalyzer()],
...     cache_dir="./analysis_cache",
... )
>>> results = pipeline.run(conversations)

Parameters:

analyzers – List of analyzer instances to run.
cache_dir – Optional directory for caching results.
tokenizer – Optional tokenizer to inject into analyzers that need one. If None, uses tiktoken with the specified encoding as default.
tiktoken_encoding – Tiktoken encoding to use when no tokenizer is provided. Defaults to “cl100k_base” (GPT-4 encoding).

property conversations: list[Conversation]#

Get the analyzed conversations.

Returns:: List of conversations that were analyzed.

get_analyzer(name: str) → MessageAnalyzer[Any] | ConversationAnalyzer[Any] | DatasetAnalyzer[Any] | PreferenceAnalyzer[Any] | None[source]#

Get an analyzer by name.

Parameters:: name – Name of the analyzer to find.
Returns:: Analyzer instance or None if not found.

load_cache() → bool[source]#

Load results from cache directory.

Note

Loaded results are raw dictionaries, not Pydantic model instances. Use get_cached_result() to reconstruct typed results if needed, or access raw data directly via self.results.

Returns:: True if cache was loaded successfully, False otherwise.

property message_to_conversation_idx: list[int]#: Get the mapping from message index to conversation index.

property results: dict[str, list[BaseModel] | BaseModel]#

Get the cached analysis results.

Returns:: Dictionary mapping analyzer names to results.

run(conversations: list[Conversation]) → dict[str, list[BaseModel] | BaseModel][source]#

Run all analyzers on the provided conversations.

Note

PreferenceAnalyzers are not run by this method. Use run_preference() separately to analyze preference pairs.

Parameters:: conversations – List of conversations to analyze.
Returns:: Dictionary mapping analyzer names to their results. - For ConversationAnalyzer: list of results (one per conversation) - For MessageAnalyzer: list of results (one per message) - For DatasetAnalyzer: single result for entire dataset

run_preference(pairs: list[tuple[Conversation, Conversation]]) → dict[str, list[BaseModel] | BaseModel][source]#

Run preference analyzers on conversation pairs.

Parameters:: pairs – List of (chosen, rejected) conversation tuples.
Returns:: Dictionary mapping analyzer names to their results.

to_dataframe() → pd.DataFrame[source]#

Convert cached results to a pandas DataFrame.

Returns:: DataFrame with one row per conversation, columns for each metric.
Raises:: RuntimeError – If no results are cached (run() not called).

class oumi.analyze.AnalyzerConfig(id: str, instance_id: str | None = None, params: dict[str, ~typing.Any]=<factory>)[source]#

Bases: object

Configuration for a single analyzer.

Variables:

id (str) – Analyzer type identifier (e.g., “length”, “quality”).
instance_id (str | None) – Optional unique instance ID for multiple analyzers of same type.
params (dict[str, Any]) – Analyzer-specific parameters.

__post_init__()[source]#: Auto-populate instance_id if not provided.

id: str#

instance_id: str | None = None#

params: dict[str, Any]#

class oumi.analyze.BaseAnalyzer[source]#

Bases: ABC, Generic[TResult]

Base class for all analyzer types.

Provides common metadata methods for inspecting the result type and schema of an analyzer, enabling introspection of available metrics.

All concrete analyzer types (MessageAnalyzer, ConversationAnalyzer, etc.) inherit from this class.

Variables:: analyzer_id (str | None) – Optional custom identifier for this analyzer instance. If not set, the class name is used as the identifier.

analyzer_id: str | None = None#

classmethod get_metric_descriptions() → dict[str, str][source]#

Get descriptions for each metric field.

Returns:: Dictionary mapping field names to descriptions.
Raises:: TypeError – If the analyzer doesn’t have a valid result type.

classmethod get_metric_names() → list[str][source]#

Get the list of metric field names this analyzer produces.

Returns:: List of metric field names.
Raises:: TypeError – If the analyzer doesn’t have a valid result type.

classmethod get_result_schema() → dict[source]#

Get the JSON schema for this analyzer’s result model.

This allows users to discover what metrics the analyzer produces before running analysis. Useful for documentation, UI generation, and config validation.

Returns:: JSON schema dictionary for the result model.
Raises:: TypeError – If the analyzer doesn’t have a valid result type.

classmethod get_scope() → str[source]#

Get the scope of this analyzer.

Returns:: Scope string (‘message’, ‘conversation’, ‘dataset’, or ‘preference’).

static get_text_content(message: Message) → str[source]#

Extract text content from a message.

Handles both simple string content and multimodal content lists.

Parameters:: message – The message to extract text from.
Returns:: The text content as a string.

class oumi.analyze.ConversationAnalyzer[source]#

Bases: BaseAnalyzer[TResult]

Base class for analyzers that operate on complete conversations.

__call__(conversation: Conversation) → TResult[source]#

Allow analyzer to be called directly.

Parameters:: conversation – The conversation to analyze.
Returns:: Typed result model.

abstractmethod analyze(conversation: Conversation) → TResult[source]#

Analyze a complete conversation and return typed results.

Parameters:: conversation – The conversation to analyze.
Returns:: Typed result model containing analysis metrics.

analyze_batch(conversations: list[Conversation]) → list[TResult][source]#

Analyze multiple conversations and return results for each.

Override this method to implement batched processing for better performance, especially for analyzers that benefit from batching (e.g., those using ML models).

Parameters:: conversations – List of conversations to analyze.
Returns:: List of typed results, one per conversation.

static get_conversation_text(conversation: Conversation, tokenizer: PreTrainedTokenizerBase) → str[source]#

Get the full text of a conversation using a tokenizer’s chat template.

Parameters:

conversation – The conversation to extract text from.
tokenizer – Tokenizer with a chat template for formatting.

Returns:

Full conversation text as a single string.

Raises:

ValueError – If the tokenizer doesn’t have a chat template.

classmethod get_scope() → str[source]#

Get the scope of this analyzer.

Returns:: Scope string (‘conversation’).

class oumi.analyze.DatasetAnalyzer[source]#

Bases: BaseAnalyzer[TResult]

Base class for analyzers that operate on entire datasets.

__call__(conversations: list[Conversation]) → TResult[source]#

Allow analyzer to be called directly.

Parameters:: conversations – All conversations to analyze.
Returns:: Typed result model.

abstractmethod analyze(conversations: list[Conversation]) → TResult[source]#

Analyze an entire dataset and return typed results.

This method receives all conversations at once, enabling cross-sample operations that require global context.

Parameters:: conversations – All conversations in the dataset.
Returns:: Typed result model containing dataset-level analysis.

classmethod get_scope() → str[source]#

Get the scope of this analyzer.

Returns:: Scope string (‘dataset’).

class oumi.analyze.LengthAnalyzer(tokenizer: Tokenizer | None = None)[source]#

Bases: ConversationAnalyzer[LengthMetrics]

Analyzer for computing token length metrics of conversations.

Computes token counts for conversations using a provided tokenizer. Provides both conversation-level totals and per-message breakdowns.

Example

>>> from oumi.analyze.analyzers.length import LengthAnalyzer, default_tokenizer
>>> from oumi.core.types.conversation import Conversation, Message, Role
>>>
>>> analyzer = LengthAnalyzer(tokenizer=default_tokenizer())
>>> conversation = Conversation(messages=[
...     Message(role=Role.USER, content="Hello, how are you?"),
...     Message(role=Role.ASSISTANT, content="I'm doing well, thanks!"),
... ])
>>> result = analyzer.analyze(conversation)
>>> print(f"Total tokens: {result.total_tokens}")
Total tokens: 12

Parameters:: tokenizer – Tokenizer instance for token counting. Must have an encode(text) -> list method. Use default_tokenizer() for tiktoken, or pass a HuggingFace tokenizer for model-specific counts.

analyze(conversation: Conversation) → LengthMetrics[source]#

Analyze token length metrics for a conversation.

Parameters:: conversation – The conversation to analyze.
Returns:: LengthMetrics containing token counts.

analyze_text(text: str) → LengthMetrics[source]#

Analyze token length metrics for a single text string.

Convenience method for analyzing text without creating a Conversation.

Parameters:: text – The text to analyze.
Returns:: LengthMetrics for the text (treated as a single message).

class oumi.analyze.LengthMetrics(*, total_tokens: int, rendered_tokens: int | None = None, avg_tokens_per_message: float, message_token_counts: list[int], num_messages: int, user_total_tokens: int = 0, assistant_total_tokens: int = 0, system_total_tokens: int = 0, tool_total_tokens: int = 0)[source]#

Bases: BaseModel

Result model for length analysis of conversations.

Example

>>> result = LengthMetrics(
...     total_tokens=25,
...     avg_tokens_per_message=12.5,
...     message_token_counts=[10, 15],
...     num_messages=2,
... )
>>> print(result.total_tokens)
25

assistant_total_tokens: int#

avg_tokens_per_message: float#

message_token_counts: list[int]#

model_config = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_messages: int#

rendered_tokens: int | None#

system_total_tokens: int#

tool_total_tokens: int#

total_tokens: int#

user_total_tokens: int#

class oumi.analyze.MessageAnalyzer[source]#

Bases: BaseAnalyzer[TResult]

Base class for analyzers that operate on individual messages.

__call__(message: Message) → TResult[source]#

Allow analyzer to be called directly.

Parameters:: message – The message to analyze.
Returns:: Typed result model.

abstractmethod analyze(message: Message) → TResult[source]#

Analyze a single message and return typed results.

Parameters:: message – The message to analyze.
Returns:: Typed result model containing analysis metrics.

analyze_batch(messages: list[Message]) → list[TResult][source]#

Analyze multiple messages and return results for each.

Override this method to implement vectorized/batched processing for better performance with large datasets.

Parameters:: messages – List of messages to analyze.
Returns:: List of typed results, one per message.

classmethod get_scope() → str[source]#

Get the scope of this analyzer.

Returns:: Scope string (‘message’).

class oumi.analyze.PreferenceAnalyzer[source]#

Bases: BaseAnalyzer[TResult]

Base class for analyzers that operate on preference pairs.

__call__(chosen: Conversation, rejected: Conversation) → TResult[source]#

Allow analyzer to be called directly.

Parameters:

chosen – The preferred conversation.
rejected – The rejected conversation.

Returns:

Typed result model.

abstractmethod analyze(chosen: Conversation, rejected: Conversation) → TResult[source]#

Analyze a preference pair and return typed results.

Parameters:

chosen – The preferred/chosen conversation.
rejected – The rejected/dispreferred conversation.

Returns:

Typed result model containing preference analysis.

analyze_batch(pairs: list[tuple[Conversation, Conversation]]) → list[TResult][source]#

Analyze multiple preference pairs.

Parameters:: pairs – List of (chosen, rejected) conversation tuples.
Returns:: List of typed results, one per pair.

classmethod get_scope() → str[source]#

Get the scope of this analyzer.

Returns:: Scope string (‘preference’).

class oumi.analyze.TestEngine(tests: list[TestParams])[source]#

Bases: object

Engine for running tests on typed analysis results.

Tests operate on typed Pydantic results, not DataFrames. This ensures tests are pure validation with no computation - all metrics must be pre-computed by analyzers.

Example

>>> from oumi.analyze.testing import TestEngine, TestParams, TestType
>>>
>>> tests = [
...     TestParams(
...         id="max_words",
...         type=TestType.THRESHOLD,
...         metric="LengthAnalyzer.total_words",
...         operator=">",
...         value=10000,
...         max_percentage=5.0,
...         severity=TestSeverity.MEDIUM,
...     ),
... ]
>>> engine = TestEngine(tests)
>>> summary = engine.run(results)
>>> print(f"Pass rate: {summary.pass_rate}%")

Parameters:: tests – List of test configurations.

run(results: dict[str, list[BaseModel] | BaseModel]) → TestSummary[source]#

Run all tests on the analysis results.

Parameters:: results – Dictionary mapping analyzer names to results.
Returns:: TestSummary containing all test results.

class oumi.analyze.TestResult(*, test_id: str, passed: bool, severity: TestSeverity = TestSeverity.MEDIUM, title: str = '', description: str = '', metric: str = '', affected_count: int = 0, total_count: int = 0, affected_percentage: float = 0.0, threshold: float | None = None, actual_value: float | None = None, sample_indices: list[int] = <factory>, error: str | None = None, details: dict[str, ~typing.Any]=<factory>)[source]#

Bases: BaseModel

Result of a single test execution.

Variables:

test_id (str) – Unique identifier for the test.
passed (bool) – Whether the test passed.
severity (oumi.core.configs.params.test_params.TestSeverity) – Severity level of the test.
title (str) – Human-readable title.
description (str) – Description of what the test checks.
metric (str) – The metric being tested (e.g., “analyzer_name.field”).
affected_count (int) – Number of samples that failed the test.
total_count (int) – Total number of samples tested.
affected_percentage (float) – Percentage of samples affected.
threshold (float | None) – The configured threshold for the test.
actual_value (float | None) – The actual computed value (for threshold tests).
sample_indices (list[int]) – Indices of affected samples (limited).
error (str | None) – Error message if test execution failed.
details (dict[str, Any]) – Additional details about the test result.

actual_value: float | None#

affected_count: int#

affected_percentage: float#

description: str#

details: dict[str, Any]#

error: str | None#

metric: str#

model_config = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

passed: bool#

sample_indices: list[int]#

severity: TestSeverity#

test_id: str#

threshold: float | None#

title: str#

to_dict() → dict[str, Any][source]#: Convert to dictionary representation.

total_count: int#

class oumi.analyze.TestSummary(*, results: list[TestResult] = <factory>, total_tests: int = 0, passed_tests: int = 0, failed_tests: int = 0, error_tests: int = 0, pass_rate: float = 0.0, high_severity_failures: int = 0, medium_severity_failures: int = 0, low_severity_failures: int = 0)[source]#

Bases: BaseModel

Summary of all test results.

Variables:

results (list[oumi.analyze.testing.results.TestResult]) – List of individual test results.
total_tests (int) – Total number of tests run.
passed_tests (int) – Number of tests that passed.
failed_tests (int) – Number of tests that failed.
error_tests (int) – Number of tests that had errors.
pass_rate (float) – Percentage of tests that passed.
high_severity_failures (int) – Number of high severity failures.
medium_severity_failures (int) – Number of medium severity failures.
low_severity_failures (int) – Number of low severity failures.

error_tests: int#

failed_tests: int#

classmethod from_results(results: list[TestResult]) → TestSummary[source]#

Create a summary from a list of test results.

Parameters:: results – List of test results.
Returns:: TestSummary with computed statistics.

get_error_results() → list[TestResult][source]#: Get all test results with errors.

get_failed_results() → list[TestResult][source]#: Get all failed test results.

get_passed_results() → list[TestResult][source]#: Get all passed test results.

high_severity_failures: int#

low_severity_failures: int#

medium_severity_failures: int#

model_config = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pass_rate: float#

passed_tests: int#

results: list[TestResult]#

to_dict() → dict[str, Any][source]#: Convert to dictionary representation.

total_tests: int#

class oumi.analyze.TurnStatsAnalyzer[source]#

Bases: ConversationAnalyzer[TurnStatsMetrics]

Analyzer for computing turn statistics of conversations.

Computes turn counts and per-role statistics to help understand conversation structure and balance.

Example

>>> from oumi.analyze.analyzers.turn_stats import TurnStatsAnalyzer
>>> from oumi.core.types.conversation import Conversation, Message, Role
>>>
>>> analyzer = TurnStatsAnalyzer()
>>> conversation = Conversation(messages=[
...     Message(role=Role.USER, content="What is Python?"),
...     Message(
...         role=Role.ASSISTANT,
...         content="Python is a programming language.",
...     ),
... ])
>>> result = analyzer.analyze(conversation)
>>> print(f"Turns: {result.num_turns}")
Turns: 2

analyze(conversation: Conversation) → TurnStatsMetrics[source]#

Analyze turn statistics for a conversation.

Parameters:: conversation – The conversation to analyze.
Returns:: TurnStatsMetrics containing turn counts and statistics.

class oumi.analyze.TurnStatsMetrics(*, num_turns: int, num_user_turns: int, num_assistant_turns: int, num_tool_turns: int = 0, has_system_message: bool, first_turn_role: str | None = None, last_turn_role: str | None = None)[source]#

Bases: BaseModel

Result model for turn statistics analysis of conversations.

Example

>>> result = TurnStatsMetrics(
...     num_turns=4,
...     num_user_turns=2,
...     num_assistant_turns=2,
...     has_system_message=False,
...     first_turn_role="user",
...     last_turn_role="assistant",
... )
>>> print(result.num_turns)
4

first_turn_role: str | None#

has_system_message: bool#

last_turn_role: str | None#

model_config = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

num_assistant_turns: int#

num_tool_turns: int#

num_turns: int#

num_user_turns: int#

class oumi.analyze.TypedAnalyzeConfig(eval_name: str | None = None, parent_eval_id: str | None = None, dataset_name: str | None = None, dataset_path: str | None = None, split: str = 'train', subset: str | None = None, sample_count: int | None = None, output_path: str = '.', analyzers: list[AnalyzerConfig] = <factory>, custom_metrics: list[CustomMetricConfig] = <factory>, tests: list[TestParams] = <factory>, tokenizer_name: str | None = None, tokenizer_kwargs: dict[str, ~typing.Any]=<factory>, generate_report: bool = False, report_title: str | None = None)[source]#

Bases: object

Configuration for the typed analyzer pipeline.

This is the main configuration class for the new typed analyzer architecture. It supports both programmatic construction and loading from YAML files.

Example YAML:

dataset_path: /path/to/data.jsonl
sample_count: 1000
output_path: ./analysis_output

analyzers:
  - id: length
    params:
      count_tokens: true
  - id: quality

custom_metrics:
  - id: turn_pattern
    scope: conversation
    function: |
      def compute(conversation):
          ...

tests:
  - id: max_words
    type: threshold
    metric: LengthAnalyzer.total_words
    operator: ">"
    value: 10000
    max_percentage: 5.0

Variables:

dataset_name (str | None) – Name of the dataset (HuggingFace identifier).
dataset_path (str | None) – Path to local dataset file.
split (str) – Dataset split to use.
sample_count (int | None) – Number of samples to analyze.
output_path (str) – Directory for output artifacts.
analyzers (list[oumi.analyze.config.AnalyzerConfig]) – List of analyzer configurations.
custom_metrics (list[oumi.analyze.config.CustomMetricConfig]) – List of custom metric configurations.
tests (list[oumi.core.configs.params.test_params.TestParams]) – List of test configurations.
tokenizer_name (str | None) – Tokenizer for token counting.
generate_report (bool) – Whether to generate HTML report.
report_title (str | None) – Custom title for the report.

analyzers: list[AnalyzerConfig]#

custom_metrics: list[CustomMetricConfig]#

dataset_name: str | None = None#

dataset_path: str | None = None#

eval_name: str | None = None#

classmethod from_dict(data: dict[str, Any], allow_custom_code: bool = False) → TypedAnalyzeConfig[source]#

Create configuration from a dictionary.

Parameters:

data – Configuration dictionary.
allow_custom_code – If True, allow custom_metrics with function code. If False (default) and the config contains custom metrics with code, raises ValueError.

Returns:

TypedAnalyzeConfig instance.

Raises:

ValueError – If config contains custom code but allow_custom_code=False.

classmethod from_yaml(path: str | Path, allow_custom_code: bool = False) → TypedAnalyzeConfig[source]#

Load configuration from a YAML file.

Warning

Security Warning: If the YAML file contains custom_metrics with function fields, arbitrary Python code will be loaded. Only load configurations from trusted sources. Set allow_custom_code=True to explicitly acknowledge this risk.

Parameters:

path – Path to YAML configuration file.
allow_custom_code – If True, allow loading custom_metrics with function code. If False (default) and the config contains custom metrics with code, raises ValueError.

Returns:

TypedAnalyzeConfig instance.

Raises:

ValueError – If config contains custom code but allow_custom_code=False.

generate_report: bool = False#

get_test_configs() → list[TestParams][source]#

Get test configurations for the test engine.

Returns:: List of TestParams instances.

output_path: str = '.'#

parent_eval_id: str | None = None#

report_title: str | None = None#

sample_count: int | None = None#

split: str = 'train'#

subset: str | None = None#

tests: list[TestParams]#

to_dict() → dict[str, Any][source]#

Convert configuration to a dictionary.

Returns:: Configuration as dictionary.

tokenizer_kwargs: dict[str, Any]#

tokenizer_name: str | None = None#

oumi.analyze.create_analyzer_from_config(analyzer_id: str, params: dict) → MessageAnalyzer | ConversationAnalyzer | DatasetAnalyzer | None[source]#

Create an analyzer instance from configuration.

Parameters:

analyzer_id – Analyzer type identifier.
params – Analyzer-specific parameters.

Returns:

Analyzer instance or None if not found.

oumi.analyze.describe_analyzer(analyzer_class: type) → str[source]#: Get a human-readable description of an analyzer’s metrics.

oumi.analyze.get_analyzer_class(name: str) → type | None[source]#

Get an analyzer class by name.

Parameters:: name – Name of the analyzer.
Returns:: The analyzer class or None if not found.

oumi.analyze.get_analyzer_info(analyzer_class: type) → dict[str, Any][source]#: Get detailed information about an analyzer’s output metrics.

oumi.analyze.list_available_metrics(include_duplicates: bool = False) → dict[str, dict[str, Any]][source]#: List all available metrics from registered analyzers.

oumi.analyze.print_analyzer_metrics(analyzer_name: str | None = None) → None[source]#

Pretty print available metrics for analyzers.

Parameters:: analyzer_name – Optional specific analyzer to show. If None, shows all.

oumi.analyze.register_analyzer(registry_name: str) → Callable#

Returns function to register a sample analyzer in the Oumi global registry.

Parameters:: registry_name – The name that the sample analyzer should be registered with.
Returns:: Decorator function to register the target sample analyzer.

oumi.analyze.to_analysis_dataframe(conversations: list[Conversation], results: Mapping[str, Sequence[BaseModel] | BaseModel], message_to_conversation_idx: list[int] | None = None) → DataFrame[source]#

Convert typed analysis results to a pandas DataFrame.

Creates a DataFrame with one row per conversation, with columns for conversation metadata and all analyzer metrics. Analyzer field names are prefixed with the analyzer name to avoid collisions.

Example

>>> results = {"LengthAnalyzer": [LengthMetrics(...), LengthMetrics(...)]}
>>> df = to_analysis_dataframe(conversations, results)
>>> print(df.columns.tolist())
['conversation_id', 'conversation_index', 'num_messages',
 'length__total_chars', 'length__total_words', ...]

Parameters:

conversations – List of conversations that were analyzed.
results – Dictionary mapping analyzer names to results. - For per-conversation results: list of BaseModel (len = num conversations) - For message-level results: list of BaseModel (len = num messages) - For dataset-level results: single BaseModel (will be repeated)
message_to_conversation_idx – Optional mapping from message index to conversation index. Required for proper aggregation of message-level results. If provided, message-level results will be aggregated per conversation.

Returns:

DataFrame with conversation metadata and all metrics as columns.

oumi.analyze

Contents

oumi.analyze#

Subpackages#