oumi.analyze.testing#
Test engine for validating analysis results.
- class oumi.analyze.testing.TestEngine(tests: list[TestParams])[source]#
Bases:
objectEngine for running tests on typed analysis results.
Tests operate on typed Pydantic results, not DataFrames. This ensures tests are pure validation with no computation - all metrics must be pre-computed by analyzers.
Example
>>> from oumi.analyze.testing import TestEngine, TestParams, TestType >>> >>> tests = [ ... TestParams( ... id="max_words", ... type=TestType.THRESHOLD, ... metric="LengthAnalyzer.total_words", ... operator=">", ... value=10000, ... max_percentage=5.0, ... severity=TestSeverity.MEDIUM, ... ), ... ] >>> engine = TestEngine(tests) >>> summary = engine.run(results) >>> print(f"Pass rate: {summary.pass_rate}%")
- Parameters:
tests – List of test configurations.
- run(results: dict[str, list[BaseModel] | BaseModel]) TestSummary[source]#
Run all tests on the analysis results.
- Parameters:
results – Dictionary mapping analyzer names to results.
- Returns:
TestSummary containing all test results.
- class oumi.analyze.testing.TestParams(id: str = '', type: str = '', severity: str = 'medium', title: str | None = None, description: str | None = None, scope: str = 'message', negate: bool = False, metric: str | None = None, operator: str | None = None, value: float | int | str | None = None, condition: str | None = None, max_percentage: float | None = None, min_percentage: float | None = None, std_threshold: float = 3.0, text_field: str | None = None, pattern: str | None = None, values: list[str] | None = None, case_sensitive: bool = False, check: str | None = None, threshold: float | None = None, expression: str | None = None, tests: list[dict[str, ~typing.Any]]=<factory>, composite_operator: str = 'any', function: str | None = None)[source]#
Bases:
BaseParamsConfiguration for a single test on analysis results.
This is a flexible dataclass that supports all test types. Fields are optional based on the test type being configured. Validation is performed in __finalize_and_validate__ based on the test type.
- Variables:
id (str) – Unique identifier for this test.
type (str) – The type of test (threshold, percentage, regex, etc.).
severity (str) – How severe a failure of this test is (high, medium, low).
title (str | None) – Human-readable title for the test (shown in reports).
description (str | None) – Detailed description of what this test checks.
scope (str) – Whether to run on message or conversation DataFrame.
negate (bool) – If True, invert the test logic (pass becomes fail).
fields (# Python test)
metric (str | None) – Column name to check (e.g., “length__token_count”).
operator (str | None) – Comparison operator for threshold tests (<, >, <=, >=, ==, !=).
value (float | int | str | None) – Value to compare against for threshold tests.
condition (str | None) – Condition string for percentage tests (e.g., “== True”, “> 0.5”).
max_percentage (float | None) – Maximum percentage of samples that can match/fail.
min_percentage (float | None) – Minimum percentage of samples that must match.
std_threshold (float) – Standard deviations for outlier detection.
fields
field – Column name containing text to search (e.g., “text_content”).
pattern (str | None) – Regex pattern for regex tests.
values (list[str] | None) – List of substrings for contains-any/contains-all tests.
case_sensitive (bool) – Whether text matching is case-sensitive.
fields
check (str | None) – Type of distribution check (max_fraction, entropy, etc.).
threshold (float | None) – Threshold value for distribution checks.
fields
expression (str | None) – Pandas query expression string.
fields
tests (list[dict[str, Any]]) – List of sub-test configurations for composite tests.
composite_operator (str) – How to combine sub-tests (any, all, or min count).
fields
function (str | None) – Python function code as a string.
- case_sensitive: bool = False#
- check: str | None = None#
- composite_operator: str = 'any'#
- condition: str | None = None#
- description: str | None = None#
- expression: str | None = None#
- function: str | None = None#
- id: str = ''#
- max_percentage: float | None = None#
- metric: str | None = None#
- min_percentage: float | None = None#
- negate: bool = False#
- operator: str | None = None#
- pattern: str | None = None#
- scope: str = 'message'#
- severity: str = 'medium'#
- std_threshold: float = 3.0#
- tests: list[dict[str, Any]]#
- text_field: str | None = None#
- threshold: float | None = None#
- title: str | None = None#
- type: str = ''#
- value: float | int | str | None = None#
- values: list[str] | None = None#
- class oumi.analyze.testing.TestResult(*, test_id: str, passed: bool, severity: TestSeverity = TestSeverity.MEDIUM, title: str = '', description: str = '', metric: str = '', affected_count: int = 0, total_count: int = 0, affected_percentage: float = 0.0, threshold: float | None = None, actual_value: float | None = None, sample_indices: list[int] = <factory>, error: str | None = None, details: dict[str, ~typing.Any]=<factory>)[source]#
Bases:
BaseModelResult of a single test execution.
- Variables:
test_id (str) – Unique identifier for the test.
passed (bool) – Whether the test passed.
severity (oumi.core.configs.params.test_params.TestSeverity) – Severity level of the test.
title (str) – Human-readable title.
description (str) – Description of what the test checks.
metric (str) – The metric being tested (e.g., “analyzer_name.field”).
affected_count (int) – Number of samples that failed the test.
total_count (int) – Total number of samples tested.
affected_percentage (float) – Percentage of samples affected.
threshold (float | None) – The configured threshold for the test.
actual_value (float | None) – The actual computed value (for threshold tests).
sample_indices (list[int]) – Indices of affected samples (limited).
error (str | None) – Error message if test execution failed.
details (dict[str, Any]) – Additional details about the test result.
- actual_value: float | None#
- affected_count: int#
- affected_percentage: float#
- description: str#
- details: dict[str, Any]#
- error: str | None#
- metric: str#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- passed: bool#
- sample_indices: list[int]#
- severity: TestSeverity#
- test_id: str#
- threshold: float | None#
- title: str#
- total_count: int#
- class oumi.analyze.testing.TestSeverity(value)[source]#
Bases:
str,EnumSeverity levels for test failures.
- HIGH = 'high'#
- LOW = 'low'#
- MEDIUM = 'medium'#
- class oumi.analyze.testing.TestSummary(*, results: list[TestResult] = <factory>, total_tests: int = 0, passed_tests: int = 0, failed_tests: int = 0, error_tests: int = 0, pass_rate: float = 0.0, high_severity_failures: int = 0, medium_severity_failures: int = 0, low_severity_failures: int = 0)[source]#
Bases:
BaseModelSummary of all test results.
- Variables:
results (list[oumi.analyze.testing.results.TestResult]) – List of individual test results.
total_tests (int) – Total number of tests run.
passed_tests (int) – Number of tests that passed.
failed_tests (int) – Number of tests that failed.
error_tests (int) – Number of tests that had errors.
pass_rate (float) – Percentage of tests that passed.
high_severity_failures (int) – Number of high severity failures.
medium_severity_failures (int) – Number of medium severity failures.
low_severity_failures (int) – Number of low severity failures.
- error_tests: int#
- failed_tests: int#
- classmethod from_results(results: list[TestResult]) TestSummary[source]#
Create a summary from a list of test results.
- Parameters:
results – List of test results.
- Returns:
TestSummary with computed statistics.
- get_error_results() list[TestResult][source]#
Get all test results with errors.
- get_failed_results() list[TestResult][source]#
Get all failed test results.
- get_passed_results() list[TestResult][source]#
Get all passed test results.
- high_severity_failures: int#
- low_severity_failures: int#
- medium_severity_failures: int#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- pass_rate: float#
- passed_tests: int#
- results: list[TestResult]#
- total_tests: int#
- class oumi.analyze.testing.TestType(value)[source]#
Bases:
str,EnumTypes of tests that can be run on analysis results.
- Currently implemented:
THRESHOLD: Numeric comparisons with optional percentage tolerance
- Not yet implemented (planned for future):
REGEX: Pattern matching on text fields
CONTAINS: Text containment checks (supports match_mode: any/all/exact)
OUTLIERS: Anomaly detection using standard deviation
COMPOSITE: Combine multiple tests with AND/OR logic
- COMPOSITE = 'composite'#
- CONTAINS = 'contains'#
- OUTLIERS = 'outliers'#
- REGEX = 'regex'#
- THRESHOLD = 'threshold'#