oumi.datasets.preference_tuning#

Preference tuning datasets module.

class oumi.datasets.preference_tuning.KtoMix14kDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#

Bases: BaseExperimentalKtoDataset

Preprocess the KTO dataset.

A KTO-formatted version of argilla/dpo-mix-7k designed for Kahneman-Tversky Optimization training. This dataset provides binary preference data for training language models with human preferences.

Data Fields:
  • - prompt – List of message dictionaries with a single user message Example: [{“content”: “Question text”, “role”: “assistant”}]

  • - completion – List of message dictionaries with a single assistant message Example: [{“content”: “Answer text”, “role”: “assistant”}]

  • - label – boolean (True for desirable, False for undesirable)

See also

For more information on how to use this dataset, refer to: - Huggingface hub: https://huggingface.co/datasets/trl-lib/kto-mix-14k - KTO documentation: https://huggingface.co/docs/trl/main/en/kto_trainer

dataset_name: str#
default_dataset: str | None = 'trl-lib/kto-mix-14k'#
trust_remote_code: bool#
class oumi.datasets.preference_tuning.OrpoDpoMix40kDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, tokenizer: PreTrainedTokenizerBase | None = None, return_tensors: bool = False, **kwargs)[source]#

Bases: BaseDpoDataset

Preprocess the ORPO dataset for DPO.

A dataset designed for ORPO (Offline Reinforcement Learning for Preference Optimization) or DPO (Direct Preference Optimization) training.

This dataset is a combination of high-quality DPO datasets, including: - Capybara-Preferences - distilabel-intel-orca-dpo-pairs - ultrafeedback-binarized-preferences-cleaned - distilabel-math-preference-dpo - toxic-dpo-v0.2 - prm_dpo_pairs_cleaned - truthy-dpo-v0.1

Rule-based filtering was applied to remove ‘gptisms’ in the chosen answers.

Data Fields:
  • - source – string

  • - chosen – list of dictionaries with ‘content’ and ‘role’ fields

  • - rejected – list of dictionaries with ‘content’ and ‘role’ fields

  • - prompt – string

  • - question – string

See also

For more information on how to use this dataset, refer to: - Blog post: https://huggingface.co/blog/mlabonne/orpo-llama-3 - Huggingface hub: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k

dataset_name: str#
default_dataset: str | None = 'mlabonne/orpo-dpo-mix-40k'#
trust_remote_code: bool#