oumi.datasets.preference_tuning

oumi.datasets.preference_tuning#

Preference tuning datasets module.

class oumi.datasets.preference_tuning.OrpoDpoMix40kDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, tokenizer: PreTrainedTokenizerBase | None = None, return_tensors: bool = False, **kwargs)[source]#

Bases: BaseExperimentalDpoDataset

Preprocess the ORPO dataset for DPO.

A dataset designed for ORPO (Offline Reinforcement Learning for Preference Optimization) or DPO (Direct Preference Optimization) training.

This dataset is a combination of high-quality DPO datasets, including: - Capybara-Preferences - distilabel-intel-orca-dpo-pairs - ultrafeedback-binarized-preferences-cleaned - distilabel-math-preference-dpo - toxic-dpo-v0.2 - prm_dpo_pairs_cleaned - truthy-dpo-v0.1

Rule-based filtering was applied to remove ‘gptisms’ in the chosen answers.

Data Fields:

- source – string
- chosen – list of dictionaries with ‘content’ and ‘role’ fields
- rejected – list of dictionaries with ‘content’ and ‘role’ fields
- prompt – string
- question – string

oumi.datasets.preference_tuning

Contents

oumi.datasets.preference_tuning#