oumi.datasets.preference_tuning#
Preference tuning datasets module.
- class oumi.datasets.preference_tuning.OrpoDpoMix40kDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, tokenizer: PreTrainedTokenizerBase | None = None, return_tensors: bool = False, **kwargs)[source]#
Bases:
BaseExperimentalDpoDataset
Preprocess the ORPO dataset for DPO.
A dataset designed for ORPO (Offline Reinforcement Learning for Preference Optimization) or DPO (Direct Preference Optimization) training.
This dataset is a combination of high-quality DPO datasets, including: - Capybara-Preferences - distilabel-intel-orca-dpo-pairs - ultrafeedback-binarized-preferences-cleaned - distilabel-math-preference-dpo - toxic-dpo-v0.2 - prm_dpo_pairs_cleaned - truthy-dpo-v0.1
Rule-based filtering was applied to remove ‘gptisms’ in the chosen answers.
- Data Fields:
- source – string
- chosen – list of dictionaries with ‘content’ and ‘role’ fields
- rejected – list of dictionaries with ‘content’ and ‘role’ fields
- prompt – string
- question – string
See also
For more information on how to use this dataset, refer to: - Blog post: https://huggingface.co/blog/mlabonne/orpo-llama-3 - Huggingface hub: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k
- dataset_name: str#
- default_dataset: str | None = 'mlabonne/orpo-dpo-mix-40k'#
- trust_remote_code: bool#