oumi.datasets.preference_tuning#
Preference tuning datasets module.
- class oumi.datasets.preference_tuning.KtoMix14kDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#
Bases:
BaseExperimentalKtoDataset
Preprocess the KTO dataset.
A KTO-formatted version of argilla/dpo-mix-7k designed for Kahneman-Tversky Optimization training. This dataset provides binary preference data for training language models with human preferences.
- Data Fields:
- prompt – List of message dictionaries with a single user message Example: [{“content”: “Question text”, “role”: “assistant”}]
- completion – List of message dictionaries with a single assistant message Example: [{“content”: “Answer text”, “role”: “assistant”}]
- label – boolean (True for desirable, False for undesirable)
See also
For more information on how to use this dataset, refer to: - Huggingface hub: https://huggingface.co/datasets/trl-lib/kto-mix-14k - KTO documentation: https://huggingface.co/docs/trl/main/en/kto_trainer
- dataset_name: str#
- default_dataset: str | None = 'trl-lib/kto-mix-14k'#
- trust_remote_code: bool#
- class oumi.datasets.preference_tuning.OrpoDpoMix40kDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, tokenizer: PreTrainedTokenizerBase | None = None, return_tensors: bool = False, **kwargs)[source]#
Bases:
BaseDpoDataset
Preprocess the ORPO dataset for DPO.
A dataset designed for ORPO (Offline Reinforcement Learning for Preference Optimization) or DPO (Direct Preference Optimization) training.
This dataset is a combination of high-quality DPO datasets, including: - Capybara-Preferences - distilabel-intel-orca-dpo-pairs - ultrafeedback-binarized-preferences-cleaned - distilabel-math-preference-dpo - toxic-dpo-v0.2 - prm_dpo_pairs_cleaned - truthy-dpo-v0.1
Rule-based filtering was applied to remove ‘gptisms’ in the chosen answers.
- Data Fields:
- source – string
- chosen – list of dictionaries with ‘content’ and ‘role’ fields
- rejected – list of dictionaries with ‘content’ and ‘role’ fields
- prompt – string
- question – string
See also
For more information on how to use this dataset, refer to: - Blog post: https://huggingface.co/blog/mlabonne/orpo-llama-3 - Huggingface hub: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k
- dataset_name: str#
- default_dataset: str | None = 'mlabonne/orpo-dpo-mix-40k'#
- trust_remote_code: bool#