oumi.core.synthesis#
Submodules#
oumi.core.synthesis.dataset_ingestion module#
- class oumi.core.synthesis.dataset_ingestion.DatasetPath(path: str)[source]#
Bases:
object
Path to a dataset in some storage location.
- get_storage_type() DatasetStorageType [source]#
Get the storage type.
- class oumi.core.synthesis.dataset_ingestion.DatasetReader[source]#
Bases:
object
Reads a dataset from some storage location.
Supports: - HuggingFace - Local files (JSONL, CSV, TSV, Parquet, JSON) - Glob patterns
- read(data_source: DatasetSource) list[dict] [source]#
Read the data from the data path.
oumi.core.synthesis.planner module#
- class oumi.core.synthesis.planner.DatasetPlanner[source]#
Bases:
object
- plan(synthesis_params: GeneralSynthesisParams, sample_count: int) list[dict] [source]#
Setup the dataset’s attributes for inference.
This function will create a list of dictionaries, with each dictionary representing a sample of the dataset with a particular attribute value for each attribute.
Dataset sources are used to populate the dataset plan with values for the attributes, with each sample of a dataset source being used round-robin.
Permutable attributes have their values sampled from a distribution.
Combination sampling overrides the distribution for particular attribute value combinations.
The final list of dictionaries will be used to create a dataset.
- Parameters:
synthesis_params – The synthesis parameters.
sample_count – The number of samples to plan.
- Returns:
A list of dictionaries, each representing a sample of the dataset with the attribute values for each attribute.