oumi.judges_v2#
This module provides access to various judge configurations for the Oumi project.
The judges are used to evaluate the quality of AI-generated responses based on different criteria such as helpfulness, honesty, and safety.
- class oumi.judges_v2.BaseJudge(prompt_template: str, system_instruction: str | None, example_field_values: list[dict[str, str]], response_format: JudgeResponseFormat, output_fields: list[JudgeOutputField], inference_engine: BaseInferenceEngine)[source]#
Bases:
object
Base class for implementing judges that evaluate model outputs.
A judge takes structured inputs, formats them using a prompt template, runs inference to get judgments, and parses the results into structured outputs.
- judge(inputs: list[dict[str, str]]) list[JudgeOutput] [source]#
Evaluate a batch of inputs and return structured judgments.
- Parameters:
inputs – List of dictionaries containing input data for evaluation. Each dict must contain values for all prompt_template placeholders.
- Returns:
List of structured judge outputs with parsed results
- Raises:
ValueError – If inference returns unexpected number of conversations
- class oumi.judges_v2.JudgeOutput(*, raw_output: str, parsed_output: dict[str, str] = {}, output_fields: list[JudgeOutputField] | None = None, field_values: dict[str, float | int | str | bool | None] = {}, field_scores: dict[str, float | None] = {}, response_format: JudgeResponseFormat | None = None)[source]#
Bases:
BaseModel
Represents the output from a judge evaluation.
- Variables:
raw_output (str) – The original unprocessed output from the judge
parsed_output (dict[str, str]) – Structured data (fields & their values) extracted from raw output
output_fields (list[oumi.judges_v2.base_judge.JudgeOutputField] | None) – List of expected output fields for this judge
field_values (dict[str, float | int | str | bool | None]) – Typed values for each expected output field
field_scores (dict[str, float | None]) – Numeric scores for each expected output field (if applicable)
response_format (oumi.core.configs.params.judge_params.JudgeResponseFormat | None) – Format used for generating output (XML, JSON, or RAW)
- field_scores: dict[str, float | None]#
- field_values: dict[str, float | int | str | bool | None]#
- classmethod from_raw_output(raw_output: str, response_format: JudgeResponseFormat, output_fields: list[JudgeOutputField]) Self [source]#
Generate a structured judge output from a raw model output.
- generate_raw_output(field_values: dict[str, str]) str [source]#
Generate raw output string from field values in the specified format.
- Parameters:
field_values – Dictionary mapping field keys to their string values. Must contain values for all required output fields.
- Returns:
Formatted raw output string ready for use as assistant response.
- Raises:
ValueError – If required output fields are missing from field_values, if response_format/output_fields are not set, or if response_format is not supported.
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'field_scores': FieldInfo(annotation=dict[str, Union[float, NoneType]], required=False, default={}), 'field_values': FieldInfo(annotation=dict[str, Union[float, int, str, bool, NoneType]], required=False, default={}), 'output_fields': FieldInfo(annotation=Union[list[JudgeOutputField], NoneType], required=False, default=None), 'parsed_output': FieldInfo(annotation=dict[str, str], required=False, default={}), 'raw_output': FieldInfo(annotation=str, required=True), 'response_format': FieldInfo(annotation=Union[JudgeResponseFormat, NoneType], required=False, default=None)}#
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.
- output_fields: list[JudgeOutputField] | None#
- parsed_output: dict[str, str]#
- raw_output: str#
- response_format: JudgeResponseFormat | None#
- class oumi.judges_v2.JudgeOutputField(*, field_key: str, field_type: JudgeOutputType, field_scores: dict[str, float] | None)[source]#
Bases:
BaseModel
Represents a single output field that a judge can produce.
- Variables:
field_key (str) – The key/name for this field in the judge’s output
field_type (oumi.core.configs.params.judge_params.JudgeOutputType) – The data type expected for this field’s value
field_scores (dict[str, float] | None) – Optional mapping from categorical values to numeric scores
- field_key: str#
- field_scores: dict[str, float] | None#
- field_type: JudgeOutputType#
- get_typed_value(raw_value: str) float | int | str | bool | None [source]#
Convert the field’s raw string value to the appropriate type.
- Parameters:
raw_value – The raw string value from the judge’s output
- Returns:
The typed value, or None if conversion fails
- Raises:
ValueError – If the field_type is not supported
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[Dict[str, FieldInfo]] = {'field_key': FieldInfo(annotation=str, required=True), 'field_scores': FieldInfo(annotation=Union[dict[str, float], NoneType], required=True), 'field_type': FieldInfo(annotation=JudgeOutputType, required=True)}#
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.
This replaces Model.__fields__ from Pydantic V1.