oumi.analyze.testing

oumi.analyze.testing#

Test engine for validating analysis results.

class oumi.analyze.testing.TestEngine(tests: list[TestParams])[source]#

Bases: object

Engine for running tests on typed analysis results.

Tests operate on typed Pydantic results, not DataFrames. This ensures tests are pure validation with no computation - all metrics must be pre-computed by analyzers.

Example

>>> from oumi.analyze.testing import TestEngine, TestParams, TestType
>>>
>>> tests = [
...     TestParams(
...         id="max_words",
...         type=TestType.THRESHOLD,
...         metric="LengthAnalyzer.total_words",
...         operator=">",
...         value=10000,
...         max_percentage=5.0,
...         severity=TestSeverity.MEDIUM,
...     ),
... ]
>>> engine = TestEngine(tests)
>>> summary = engine.run(results)
>>> print(f"Pass rate: {summary.pass_rate}%")

Parameters:: tests – List of test configurations.

run(results: dict[str, list[BaseModel] | BaseModel]) → TestSummary[source]#

Run all tests on the analysis results.

Parameters:: results – Dictionary mapping analyzer names to results.
Returns:: TestSummary containing all test results.

class oumi.analyze.testing.TestParams(id: str = '', type: str = '', severity: str = 'medium', title: str | None = None, description: str | None = None, scope: str = 'message', negate: bool = False, metric: str | None = None, operator: str | None = None, value: float | int | str | None = None, condition: str | None = None, max_percentage: float | None = None, min_percentage: float | None = None, std_threshold: float = 3.0, text_field: str | None = None, pattern: str | None = None, values: list[str] | None = None, case_sensitive: bool = False, check: str | None = None, threshold: float | None = None, expression: str | None = None, tests: list[dict[str, ~typing.Any]]=<factory>, composite_operator: str = 'any', function: str | None = None)[source]#

Bases: BaseParams

Configuration for a single test on analysis results.

This is a flexible dataclass that supports all test types. Fields are optional based on the test type being configured. Validation is performed in __finalize_and_validate__ based on the test type.

Variables:

id (str) – Unique identifier for this test.
type (str) – The type of test (threshold, percentage, regex, etc.).
severity (str) – How severe a failure of this test is (high, medium, low).
title (str | None) – Human-readable title for the test (shown in reports).
description (str | None) – Detailed description of what this test checks.
scope (str) – Whether to run on message or conversation DataFrame.
negate (bool) – If True, invert the test logic (pass becomes fail).
fields (# Python test)
metric (str | None) – Column name to check (e.g., “length__token_count”).
operator (str | None) – Comparison operator for threshold tests (<, >, <=, >=, ==, !=).
value (float | int | str | None) – Value to compare against for threshold tests.
condition (str | None) – Condition string for percentage tests (e.g., “== True”, “> 0.5”).
max_percentage (float | None) – Maximum percentage of samples that can match/fail.
min_percentage (float | None) – Minimum percentage of samples that must match.
std_threshold (float) – Standard deviations for outlier detection.
fields
field – Column name containing text to search (e.g., “text_content”).
pattern (str | None) – Regex pattern for regex tests.
values (list[str] | None) – List of substrings for contains-any/contains-all tests.
case_sensitive (bool) – Whether text matching is case-sensitive.
fields
check (str | None) – Type of distribution check (max_fraction, entropy, etc.).
threshold (float | None) – Threshold value for distribution checks.
fields
expression (str | None) – Pandas query expression string.
fields
tests (list[dict[str, Any]]) – List of sub-test configurations for composite tests.
composite_operator (str) – How to combine sub-tests (any, all, or min count).
fields
function (str | None) – Python function code as a string.

__finalize_and_validate__() → None[source]#: Validate test configuration based on test type.

case_sensitive: bool = False#

check: str | None = None#

composite_operator: str = 'any'#

condition: str | None = None#

description: str | None = None#

expression: str | None = None#

function: str | None = None#

get_description() → str[source]#: Get the description for this test.

get_title() → str[source]#: Get the display title for this test.

id: str = ''#

max_percentage: float | None = None#

metric: str | None = None#

min_percentage: float | None = None#

negate: bool = False#

operator: str | None = None#

pattern: str | None = None#

scope: str = 'message'#

severity: str = 'medium'#

std_threshold: float = 3.0#

tests: list[dict[str, Any]]#

text_field: str | None = None#

threshold: float | None = None#

title: str | None = None#

type: str = ''#

value: float | int | str | None = None#

values: list[str] | None = None#

class oumi.analyze.testing.TestResult(*, test_id: str, passed: bool, severity: TestSeverity = TestSeverity.MEDIUM, title: str = '', description: str = '', metric: str = '', affected_count: int = 0, total_count: int = 0, affected_percentage: float = 0.0, threshold: float | None = None, actual_value: float | None = None, sample_indices: list[int] = <factory>, error: str | None = None, details: dict[str, ~typing.Any]=<factory>)[source]#

Bases: BaseModel

Result of a single test execution.

Variables:

test_id (str) – Unique identifier for the test.
passed (bool) – Whether the test passed.
severity (oumi.core.configs.params.test_params.TestSeverity) – Severity level of the test.
title (str) – Human-readable title.
description (str) – Description of what the test checks.
metric (str) – The metric being tested (e.g., “analyzer_name.field”).
affected_count (int) – Number of samples that failed the test.
total_count (int) – Total number of samples tested.
affected_percentage (float) – Percentage of samples affected.
threshold (float | None) – The configured threshold for the test.
actual_value (float | None) – The actual computed value (for threshold tests).
sample_indices (list[int]) – Indices of affected samples (limited).
error (str | None) – Error message if test execution failed.
details (dict[str, Any]) – Additional details about the test result.

actual_value: float | None#

affected_count: int#

affected_percentage: float#

description: str#

details: dict[str, Any]#

error: str | None#

metric: str#

model_config = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

passed: bool#

sample_indices: list[int]#

severity: TestSeverity#

test_id: str#

threshold: float | None#

title: str#

to_dict() → dict[str, Any][source]#: Convert to dictionary representation.

total_count: int#

class oumi.analyze.testing.TestSeverity(value)[source]#

Bases: str, Enum

Severity levels for test failures.

HIGH = 'high'#

LOW = 'low'#

MEDIUM = 'medium'#

class oumi.analyze.testing.TestSummary(*, results: list[TestResult] = <factory>, total_tests: int = 0, passed_tests: int = 0, failed_tests: int = 0, error_tests: int = 0, pass_rate: float = 0.0, high_severity_failures: int = 0, medium_severity_failures: int = 0, low_severity_failures: int = 0)[source]#

Bases: BaseModel

Summary of all test results.

Variables:

results (list[oumi.analyze.testing.results.TestResult]) – List of individual test results.
total_tests (int) – Total number of tests run.
passed_tests (int) – Number of tests that passed.
failed_tests (int) – Number of tests that failed.
error_tests (int) – Number of tests that had errors.
pass_rate (float) – Percentage of tests that passed.
high_severity_failures (int) – Number of high severity failures.
medium_severity_failures (int) – Number of medium severity failures.
low_severity_failures (int) – Number of low severity failures.

error_tests: int#

failed_tests: int#

classmethod from_results(results: list[TestResult]) → TestSummary[source]#

Create a summary from a list of test results.

Parameters:: results – List of test results.
Returns:: TestSummary with computed statistics.

get_error_results() → list[TestResult][source]#: Get all test results with errors.

get_failed_results() → list[TestResult][source]#: Get all failed test results.

get_passed_results() → list[TestResult][source]#: Get all passed test results.

high_severity_failures: int#

low_severity_failures: int#

medium_severity_failures: int#

model_config = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pass_rate: float#

passed_tests: int#

results: list[TestResult]#

to_dict() → dict[str, Any][source]#: Convert to dictionary representation.

total_tests: int#

class oumi.analyze.testing.TestType(value)[source]#

Bases: str, Enum

Types of tests that can be run on analysis results.

Currently implemented:

THRESHOLD: Numeric comparisons with optional percentage tolerance

Not yet implemented (planned for future):

REGEX: Pattern matching on text fields
CONTAINS: Text containment checks (supports match_mode: any/all/exact)
OUTLIERS: Anomaly detection using standard deviation
COMPOSITE: Combine multiple tests with AND/OR logic

COMPOSITE = 'composite'#

CONTAINS = 'contains'#

OUTLIERS = 'outliers'#

REGEX = 'regex'#

THRESHOLD = 'threshold'#

oumi.analyze.testing

Contents

oumi.analyze.testing#