oumi.analyze.testing

Contents

oumi.analyze.testing#

Test engine for validating analysis results.

class oumi.analyze.testing.TestEngine(tests: list[TestParams])[source]#

Bases: object

Engine for running tests on typed analysis results.

Tests operate on typed Pydantic results, not DataFrames. This ensures tests are pure validation with no computation - all metrics must be pre-computed by analyzers.

Example

>>> from oumi.analyze.testing import TestEngine, TestParams, TestType
>>>
>>> tests = [
...     TestParams(
...         id="max_words",
...         type=TestType.THRESHOLD,
...         metric="LengthAnalyzer.total_words",
...         operator=">",
...         value=10000,
...         max_percentage=5.0,
...         severity=TestSeverity.MEDIUM,
...     ),
... ]
>>> engine = TestEngine(tests)
>>> summary = engine.run(results)
>>> print(f"Pass rate: {summary.pass_rate}%")
Parameters:

tests – List of test configurations.

run(results: dict[str, list[BaseModel] | BaseModel]) TestSummary[source]#

Run all tests on the analysis results.

Parameters:

results – Dictionary mapping analyzer names to results.

Returns:

TestSummary containing all test results.

class oumi.analyze.testing.TestParams(id: str = '', type: str = '', severity: str = 'medium', title: str | None = None, description: str | None = None, scope: str = 'message', negate: bool = False, metric: str | None = None, operator: str | None = None, value: float | int | str | None = None, condition: str | None = None, max_percentage: float | None = None, min_percentage: float | None = None, std_threshold: float = 3.0, text_field: str | None = None, pattern: str | None = None, values: list[str] | None = None, case_sensitive: bool = False, check: str | None = None, threshold: float | None = None, expression: str | None = None, tests: list[dict[str, ~typing.Any]]=<factory>, composite_operator: str = 'any', function: str | None = None)[source]#

Bases: BaseParams

Configuration for a single test on analysis results.

This is a flexible dataclass that supports all test types. Fields are optional based on the test type being configured. Validation is performed in __finalize_and_validate__ based on the test type.

Variables:
  • id (str) – Unique identifier for this test.

  • type (str) – The type of test (threshold, percentage, regex, etc.).

  • severity (str) – How severe a failure of this test is (high, medium, low).

  • title (str | None) – Human-readable title for the test (shown in reports).

  • description (str | None) – Detailed description of what this test checks.

  • scope (str) – Whether to run on message or conversation DataFrame.

  • negate (bool) – If True, invert the test logic (pass becomes fail).

  • fields (# Python test)

  • metric (str | None) – Column name to check (e.g., “length__token_count”).

  • operator (str | None) – Comparison operator for threshold tests (<, >, <=, >=, ==, !=).

  • value (float | int | str | None) – Value to compare against for threshold tests.

  • condition (str | None) – Condition string for percentage tests (e.g., “== True”, “> 0.5”).

  • max_percentage (float | None) – Maximum percentage of samples that can match/fail.

  • min_percentage (float | None) – Minimum percentage of samples that must match.

  • std_threshold (float) – Standard deviations for outlier detection.

  • fields

  • field – Column name containing text to search (e.g., “text_content”).

  • pattern (str | None) – Regex pattern for regex tests.

  • values (list[str] | None) – List of substrings for contains-any/contains-all tests.

  • case_sensitive (bool) – Whether text matching is case-sensitive.

  • fields

  • check (str | None) – Type of distribution check (max_fraction, entropy, etc.).

  • threshold (float | None) – Threshold value for distribution checks.

  • fields

  • expression (str | None) – Pandas query expression string.

  • fields

  • tests (list[dict[str, Any]]) – List of sub-test configurations for composite tests.

  • composite_operator (str) – How to combine sub-tests (any, all, or min count).

  • fields

  • function (str | None) – Python function code as a string.

__finalize_and_validate__() None[source]#

Validate test configuration based on test type.

case_sensitive: bool = False#
check: str | None = None#
composite_operator: str = 'any'#
condition: str | None = None#
description: str | None = None#
expression: str | None = None#
function: str | None = None#
get_description() str[source]#

Get the description for this test.

get_title() str[source]#

Get the display title for this test.

id: str = ''#
max_percentage: float | None = None#
metric: str | None = None#
min_percentage: float | None = None#
negate: bool = False#
operator: str | None = None#
pattern: str | None = None#
scope: str = 'message'#
severity: str = 'medium'#
std_threshold: float = 3.0#
tests: list[dict[str, Any]]#
text_field: str | None = None#
threshold: float | None = None#
title: str | None = None#
type: str = ''#
value: float | int | str | None = None#
values: list[str] | None = None#
class oumi.analyze.testing.TestResult(*, test_id: str, passed: bool, severity: TestSeverity = TestSeverity.MEDIUM, title: str = '', description: str = '', metric: str = '', affected_count: int = 0, total_count: int = 0, affected_percentage: float = 0.0, threshold: float | None = None, actual_value: float | None = None, sample_indices: list[int] = <factory>, error: str | None = None, details: dict[str, ~typing.Any]=<factory>)[source]#

Bases: BaseModel

Result of a single test execution.

Variables:
  • test_id (str) – Unique identifier for the test.

  • passed (bool) – Whether the test passed.

  • severity (oumi.core.configs.params.test_params.TestSeverity) – Severity level of the test.

  • title (str) – Human-readable title.

  • description (str) – Description of what the test checks.

  • metric (str) – The metric being tested (e.g., “analyzer_name.field”).

  • affected_count (int) – Number of samples that failed the test.

  • total_count (int) – Total number of samples tested.

  • affected_percentage (float) – Percentage of samples affected.

  • threshold (float | None) – The configured threshold for the test.

  • actual_value (float | None) – The actual computed value (for threshold tests).

  • sample_indices (list[int]) – Indices of affected samples (limited).

  • error (str | None) – Error message if test execution failed.

  • details (dict[str, Any]) – Additional details about the test result.

actual_value: float | None#
affected_count: int#
affected_percentage: float#
description: str#
details: dict[str, Any]#
error: str | None#
metric: str#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

passed: bool#
sample_indices: list[int]#
severity: TestSeverity#
test_id: str#
threshold: float | None#
title: str#
to_dict() dict[str, Any][source]#

Convert to dictionary representation.

total_count: int#
class oumi.analyze.testing.TestSeverity(value)[source]#

Bases: str, Enum

Severity levels for test failures.

HIGH = 'high'#
LOW = 'low'#
MEDIUM = 'medium'#
class oumi.analyze.testing.TestSummary(*, results: list[TestResult] = <factory>, total_tests: int = 0, passed_tests: int = 0, failed_tests: int = 0, error_tests: int = 0, pass_rate: float = 0.0, high_severity_failures: int = 0, medium_severity_failures: int = 0, low_severity_failures: int = 0)[source]#

Bases: BaseModel

Summary of all test results.

Variables:
  • results (list[oumi.analyze.testing.results.TestResult]) – List of individual test results.

  • total_tests (int) – Total number of tests run.

  • passed_tests (int) – Number of tests that passed.

  • failed_tests (int) – Number of tests that failed.

  • error_tests (int) – Number of tests that had errors.

  • pass_rate (float) – Percentage of tests that passed.

  • high_severity_failures (int) – Number of high severity failures.

  • medium_severity_failures (int) – Number of medium severity failures.

  • low_severity_failures (int) – Number of low severity failures.

error_tests: int#
failed_tests: int#
classmethod from_results(results: list[TestResult]) TestSummary[source]#

Create a summary from a list of test results.

Parameters:

results – List of test results.

Returns:

TestSummary with computed statistics.

get_error_results() list[TestResult][source]#

Get all test results with errors.

get_failed_results() list[TestResult][source]#

Get all failed test results.

get_passed_results() list[TestResult][source]#

Get all passed test results.

high_severity_failures: int#
low_severity_failures: int#
medium_severity_failures: int#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pass_rate: float#
passed_tests: int#
results: list[TestResult]#
to_dict() dict[str, Any][source]#

Convert to dictionary representation.

total_tests: int#
class oumi.analyze.testing.TestType(value)[source]#

Bases: str, Enum

Types of tests that can be run on analysis results.

Currently implemented:
  • THRESHOLD: Numeric comparisons with optional percentage tolerance

Not yet implemented (planned for future):
  • REGEX: Pattern matching on text fields

  • CONTAINS: Text containment checks (supports match_mode: any/all/exact)

  • OUTLIERS: Anomaly detection using standard deviation

  • COMPOSITE: Combine multiple tests with AND/OR logic

COMPOSITE = 'composite'#
CONTAINS = 'contains'#
OUTLIERS = 'outliers'#
REGEX = 'regex'#
THRESHOLD = 'threshold'#