oumi.analyze.utils

oumi.analyze.utils#

Utility functions for the analyze module.

oumi.analyze.utils.to_analysis_dataframe(conversations: list[Conversation], results: Mapping[str, Sequence[BaseModel] | BaseModel], message_to_conversation_idx: list[int] | None = None) DataFrame[source]#

Convert typed analysis results to a pandas DataFrame.

Creates a DataFrame with one row per conversation, with columns for conversation metadata and all analyzer metrics. Analyzer field names are prefixed with the analyzer name to avoid collisions.

Example

>>> results = {"LengthAnalyzer": [LengthMetrics(...), LengthMetrics(...)]}
>>> df = to_analysis_dataframe(conversations, results)
>>> print(df.columns.tolist())
['conversation_id', 'conversation_index', 'num_messages',
 'length__total_chars', 'length__total_words', ...]
Parameters:
  • conversations – List of conversations that were analyzed.

  • results – Dictionary mapping analyzer names to results. - For per-conversation results: list of BaseModel (len = num conversations) - For message-level results: list of BaseModel (len = num messages) - For dataset-level results: single BaseModel (will be repeated)

  • message_to_conversation_idx – Optional mapping from message index to conversation index. Required for proper aggregation of message-level results. If provided, message-level results will be aggregated per conversation.

Returns:

DataFrame with conversation metadata and all metrics as columns.