oumi.core.configs

oumi.core.configs#

Configuration module for the Oumi (Open Universal Machine Intelligence) library.

This module provides various configuration classes and parameters used throughout the Oumi framework for tasks such as training, evaluation, inference, and job management.

The configurations are organized into different categories:

Evaluation:
- AsyncEvaluationConfig
- EvaluationConfig
- EvaluationFramework
Generation and Inference:
Job Management:
Data:
Model:
Training:
Profiling:
- ProfilerParams
Telemetry:
- TelemetryParams
Judge:
- JudgeConfig
- JudgeAttribute
- JudgeAttributeValueType
- JudgeConfig
- JudgeOutputType
- JudgeResponseFormat

Example

>>> from oumi.core.configs import ModelParams, TrainingConfig, TrainingParams
>>> model_params = ModelParams(model_name="gpt2")
>>> training_params = TrainingParams(num_train_epochs=3)
>>> training_config = TrainingConfig(
...     model=model_params,
...     training=training_params,
... )
>>> # Use the training_config in your training pipeline

Note

All configuration classes inherit from: BaseConfig, which provides common functionality such as serialization and validation.

class oumi.core.configs.AlpacaEvalTaskParams(evaluation_backend: str = '', task_name: str | None = None, num_samples: int | None = None, log_samples: bool | None = False, eval_kwargs: dict[str, ~typing.Any] = <factory>, version: float | None = 2.0)[source]#

Bases: EvaluationTaskParams

Parameters for the AlpacaEval evaluation framework.

AlpacaEval is an LLM-based automatic evaluation suite that is fast, cheap, replicable, and validated against 20K human annotations. The latest version (AlpacaEval 2.0) contains 805 prompts (tatsu-lab/alpaca_eval), which are open-ended questions. A model annotator (judge) is used to evaluate the quality of model’s responses for these questions and calculates win rates vs. reference responses. The default judge is GPT4 Turbo.

__post_init__()[source]#: Verifies params.

version: float | None = 2.0#

1.0 or 2.0 (default).

Type:: The version of AlpacaEval to use. Options

class oumi.core.configs.AnalyzeConfig(dataset_name: str | None = None, split: str = 'train', subset: str | None = None, sample_count: int | None = None, output_path: str = '.', analyzers: list[~oumi.core.configs.analyze_config.SampleAnalyzerParams] = <factory>)[source]#

Bases: BaseConfig

Configuration for dataset analysis and aggregation.

__post_init__()[source]#: Validates the configuration parameters.

analyzers: list[SampleAnalyzerParams]#: List of analyzer configurations (plugin-style).

dataset_name: str | None = None#: Dataset name.

output_path: str = '.'#

Directory path where output files will be saved.

Defaults to current directory (‘.’).

sample_count: int | None = None#: The number of examples to sample from the dataset. If None, uses the full dataset. If specified, must be non-negative.

split: str = 'train'#: The split of the dataset to load. This is typically one of “train”, “test”, or “validation”. Defaults to “train”.

subset: str | None = None#: The subset of the dataset to load. If None, uses the base dataset.

class oumi.core.configs.AsyncEvaluationConfig(evaluation: oumi.core.configs.evaluation_config.EvaluationConfig = <factory>, checkpoints_dir: str = '???', polling_interval: float = '???', num_retries: int = 5)[source]#

Bases: BaseConfig

__post_init__()[source]#: Verifies/populates params.

checkpoints_dir: str = '???'#: The directory to poll for new checkpoints.

evaluation: EvaluationConfig#

The evaluation configuration to use for each checkpoint.

This field specifies the EvaluationConfig object that defines the parameters for evaluating each checkpoint. It includes settings for the dataset, model, generation, and evaluation framework to be used.

num_retries: int = 5#

The number of times to retry polling before exiting the current job.

A retry occurs when the job reads the target directory but cannot find a new model checkpoint to evaluate. Defaults to 5. Cannot be negative.

polling_interval: float = '???'#: The time in seconds between the end of the previous evaluation and the start of the next polling attempt. Cannot be negative.

class oumi.core.configs.AutoWrapPolicy(value)[source]#

Bases: str, Enum

The auto wrap policies for FullyShardedDataParallel (FSDP).

NO_WRAP = 'NO_WRAP'#: No automatic wrapping is performed.

SIZE_BASED_WRAP = 'SIZE_BASED_WRAP'#: Wraps layers based on parameter count.

TRANSFORMER_BASED_WRAP = 'TRANSFORMER_BASED_WRAP'#: Wraps layers based on the transformer block layer.

class oumi.core.configs.BackwardPrefetch(value)[source]#

Bases: str, Enum

The backward prefetch options for FullyShardedDataParallel (FSDP).

BACKWARD_POST = 'BACKWARD_POST'#: Enables less overlap but requires less memory usage.

BACKWARD_PRE = 'BACKWARD_PRE'#: Enables the most overlap but increases memory usage the most.

NO_PREFETCH = 'NO_PREFETCH'#: Disables backward prefetching altogether.

to_torch() → BackwardPrefetch | None[source]#: Convert the enum to the corresponding torch_fsdp.BackwardPrefetch.

class oumi.core.configs.BaseConfig[source]#

Bases: object

__finalize_and_validate__() → None[source]#

Finalizes and validates the parameters of this object.

This method can be overridden by subclasses to implement custom validation logic.

In case of validation errors, this method should raise a ValueError or other appropriate exception.

__iter__() → Iterator[tuple[str, Any]][source]#

Returns an iterator over field names and values.

Note: for an attribute to be a field, it must be declared in the dataclass definition and have a type annotation.

finalize_and_validate() → None[source]#: Finalizes and validates the top level params objects.

classmethod from_str(config_str: str) → T[source]#

Loads a configuration from a YAML string.

Parameters:: config_str – The YAML string.
Returns:: The configuration object.
Return type:: BaseConfig

classmethod from_yaml(config_path: str | Path, ignore_interpolation=True) → T[source]#

Loads a configuration from a YAML file.

Parameters:

config_path – The path to the YAML file.
ignore_interpolation – If True, then any interpolation variables in the configuration file will be escaped.

Returns:

The merged configuration object.

Return type:

BaseConfig

classmethod from_yaml_and_arg_list(config_path: str | None, arg_list: list[str], logger: Logger | None = None, ignore_interpolation=True) → T[source]#

Loads a configuration from various sources.

If both YAML and arguments list are provided, then parameters specified in arg_list have higher precedence.

Parameters:

config_path – The path to the YAML file.
arg_list – Command line arguments list.
logger – (optional) Logger.
ignore_interpolation – If True, then any interpolation variables in the configuration file will be escaped.

Returns:

The merged configuration object.

Return type:

BaseConfig

print_config(logger: Logger | None = None) → None[source]#

Prints the configuration in a human-readable format.

Parameters:: logger – Optional logger to use. If None, uses module logger.

to_yaml(config_path: str | Path | StringIO) → None[source]#: Saves the configuration to a YAML file.

class oumi.core.configs.DataParams(train: oumi.core.configs.params.data_params.DatasetSplitParams = <factory>, test: oumi.core.configs.params.data_params.DatasetSplitParams = <factory>, validation: oumi.core.configs.params.data_params.DatasetSplitParams = <factory>)[source]#

Bases: BaseParams

__finalize_and_validate__()[source]#: Verifies params.

get_split(split: DatasetSplit) → DatasetSplitParams[source]#: A public getting for individual dataset splits.

test: DatasetSplitParams#: The input datasets used for testing. This field is currently unused.

train: DatasetSplitParams#: The input datasets used for training.

validation: DatasetSplitParams#: The input datasets used for validation.

class oumi.core.configs.DatasetParams(dataset_name: str = '???', dataset_path: Optional[str] = None, subset: Optional[str] = None, split: str = 'train', dataset_kwargs: dict[str, typing.Any] = <factory>, sample_count: Optional[int] = None, mixture_proportion: Optional[float] = None, shuffle: bool = False, seed: Optional[int] = None, shuffle_buffer_size: int = 1000, trust_remote_code: bool = False, transform_num_workers: Union[int, str, NoneType] = None)[source]#

Bases: BaseParams

__post_init__()[source]#: Verifies params.

dataset_kwargs: dict[str, Any]#

Keyword arguments to pass to the dataset constructor.

These arguments will be passed directly to the dataset constructor.

dataset_name: str = '???'#

The name of the dataset to load. Required.

This field is used to retrieve the appropriate class from the dataset registry that can be used to instantiate and preprocess the data.

If dataset_path is not specified, then the raw data will be automatically downloaded from the huggingface hub or oumi registry. Otherwise, the dataset will be loaded from the specified dataset_path.

dataset_path: str | None = None#

The path to the dataset to load.

This can be used to load a dataset of type dataset_name from a custom path.

If dataset_path is not specified, then the raw data will be automatically downloaded from the huggingface hub or oumi registry.

mixture_proportion: float | None = None#

The proportion of examples from this dataset relative to other datasets: in the mixture.

If specified, all datasets must supply this value. Must be a float in the range [0, 1.0]. The mixture_proportion for all input datasets must sum to 1.

Examples are sampled after the dataset has been sampled using sample_count if specified.

sample_count: int | None = None#

The number of examples to sample from the dataset.

Must be non-negative. If sample_count is larger than the size of the dataset, then the required additional examples are sampled by looping over the original dataset.

seed: int | None = None#

The random seed used for shuffling the dataset before sampling.

If set to None, shuffling will be non-deterministic.

shuffle: bool = False#: Whether to shuffle the dataset before any sampling occurs.

shuffle_buffer_size: int = 1000#: The size of the shuffle buffer used for shuffling the dataset before sampling.

split: str = 'train'#

The split of the dataset to load.

This is typically one of “train”, “test”, or “validation”. Defaults to “train”.

subset: str | None = None#

The subset of the dataset to load.

This is usually a subfolder within the dataset root.

transform_num_workers: int | str | None = None#

Number of subprocesses to use for dataset post-processing (ds.transform()).

Multiprocessing is disabled by default (None).

You can also use the special value “auto” to let oumi automatically select the number of subprocesses.

Using multiple processes can speed-up processing e.g., for large or multi-modal datasets.

The parameter is only supported for Map (non-iterable) datasets.

trust_remote_code: bool = False#: Whether to trust remote code when loading the dataset.

class oumi.core.configs.DatasetSplit(value)[source]#

Bases: Enum

Enum representing the split for a dataset.

TEST = 'test'#

TRAIN = 'train'#

VALIDATION = 'validation'#

class oumi.core.configs.DatasetSplitParams(datasets: list[oumi.core.configs.params.data_params.DatasetParams] = <factory>, collator_name: Optional[str] = None, collator_kwargs: dict[str, typing.Any] = <factory>, pack: bool = False, stream: bool = False, target_col: Optional[str] = None, mixture_strategy: str = 'first_exhausted', seed: Optional[int] = None, use_async_dataset: bool = False, use_torchdata: Optional[bool] = None)[source]#

Bases: BaseParams

__post_init__()[source]#: Verifies params.

collator_kwargs: dict[str, Any]#

Additional keyword arguments to pass to the collator constructor.

These arguments will be passed directly to the collator constructor and can be used to customize collator behavior beyond the default parameters.

collator_name: str | None = None#

Name of Oumi data collator.

Data collator controls how to form a mini-batch from individual dataset elements.

Valid options are:

“text_with_padding”: Dynamically pads the inputs received to
the longest length.

“vision_language_with_padding”: Uses VisionLanguageCollator
for image+text multi-modal data.

If None, then a default collator will be assigned.

datasets: list[DatasetParams]#: The datasets in this split.

mixture_strategy: str = 'first_exhausted'#

The strategy for mixing multiple datasets.

When multiple datasets are provided, this parameter determines how they are combined. Two strategies are available:

FIRST_EXHAUSTED: Samples from all datasets until one is fully represented in the mixture. This is the default strategy.
ALL_EXHAUSTED: Samples from all datasets until each one is fully represented in the mixture. This may lead to significant oversampling.

pack: bool = False#

Whether to pack the text into constant-length chunks.

Each chunk will be the size of the model’s max input length. This will stream the dataset, and tokenize on the fly if the dataset isn’t already tokenized (i.e. has an input_ids column).

seed: int | None = None#

The random seed used for mixing this dataset split, if specified.

If set to None mixing will be non-deterministic.

stream: bool = False#: Whether to stream the dataset.

target_col: str | None = None#

The dataset column name containing the input for training/testing/validation.

Deprecated:: This parameter is deprecated and will be removed in the future.

use_async_dataset: bool = False#

Whether to use the PretrainingAsyncTextDataset instead of ConstantLengthDataset.

Deprecated:: This parameter is deprecated and will be removed in the future.

use_torchdata: bool | None = None#

Whether to use the torchdata library for dataset loading and processing.

If set to None, this setting may be auto-inferred.

class oumi.core.configs.EvaluationBackend(value)[source]#

Bases: Enum

Enum representing the evaluation backend to use.

ALPACA_EVAL = 'alpaca_eval'#

CUSTOM = 'custom'#

LM_HARNESS = 'lm_harness'#

class oumi.core.configs.EvaluationConfig(tasks: list[oumi.core.configs.params.evaluation_params.EvaluationTaskParams] = <factory>, model: oumi.core.configs.params.model_params.ModelParams = <factory>, generation: oumi.core.configs.params.generation_params.GenerationParams = <factory>, inference_engine: oumi.core.configs.inference_engine_type.InferenceEngineType = <InferenceEngineType.NATIVE: 'NATIVE'>, inference_remote_params: Optional[oumi.core.configs.params.remote_params.RemoteParams] = None, run_name: Optional[str] = None, enable_wandb: bool = False, output_dir: str = 'output')[source]#

Bases: BaseConfig

__post_init__()[source]#: Verifies params.

enable_wandb: bool = False#: Whether to enable Weights & Biases (wandb) logging. Currently, this is only supported for LM Harness evaluation. If True, wandb will be used for experiment tracking and visualization. After enabling, you must set the WANDB_API_KEY environment variable. Alternatively, you can use the wandb login command to authenticate.

generation: GenerationParams#

Parameters for text generation during evaluation.

This includes settings such as temperature, top-k, top-p, maximum length, and any other parameters that control the text generation process.

inference_engine: InferenceEngineType = 'NATIVE'#: For evaluation tasks that require an inference step, such as AlpacaEval tasks, an inference engine is required to generate model responses. This parameter specifies the inference engine to use for generation. If not defined, the default is the NATIVE inference engine.

inference_remote_params: RemoteParams | None = None#: For evaluation tasks that require an inference step, such as AlpacaEval tasks, an inference engine is required to generate model responses. If the model is accessed via a remote API, these parameters specify how to run inference against the remote API.

model: ModelParams#

Parameters for the model to be evaluated.

This includes model architecture, size, dtype, and any specific configurations required for the evaluation task.

output_dir: str = 'output'#: Where to write computed evaluations.

run_name: str | None = None#: A unique identifier for the current training run. This name is used to identify the run in Weights & Biases.

tasks: list[EvaluationTaskParams]#: List of all the evaluation tasks to run.

class oumi.core.configs.EvaluationTaskParams(evaluation_backend: str = '', task_name: str | None = None, num_samples: int | None = None, log_samples: bool | None = False, eval_kwargs: dict[str, ~typing.Any] = <factory>)[source]#

Bases: BaseParams

Configuration parameters for model evaluation tasks.

Supported backends:

LM Harness: Framework for evaluating language models on standard benchmarks. A list of all supported tasks can be found at: EleutherAI/lm-evaluation-harness.
Alpaca Eval: Framework for evaluating language models on instruction-following and quality of responses on open-ended questions.
Custom: Users can register their own evaluation functions using the decorator @register_evaluation_function. The task_name should be the registry key for the custom evaluation function to be used.

Examples

# LM Harness evaluation on MMLU
params = EvaluationTaskParams(
    evaluation_backend="lm_harness",
    task_name="mmlu",
    eval_kwargs={"num_fewshot": 5}
)

# Alpaca Eval 2.0 evaluation
params = EvaluationTaskParams(
    evaluation_backend="alpaca_eval"
)

# Custom evaluation
@register_evaluation_function("my_evaluation_function")
def my_evaluation(task_params, config):
    accuracy = ...
    return EvaluationResult(task_result={"accuracy": accuracy})

params = EvaluationTaskParams(
    task_name="my_evaluation_function",
    evaluation_backend="custom"
)

__post_init__()[source]#: Verifies params.

eval_kwargs: dict[str, Any]#

Additional keyword arguments to pass to the evaluation function.

This allows for passing any evaluation-specific parameters that are not covered by other fields in TaskParams classes.

evaluation_backend: str = ''#: The evaluation backend to use for the current task.

get_evaluation_backend() → EvaluationBackend[source]#: Returns the evaluation backend as an Enum.

static list_evaluation_backends() → str[source]#: Returns a string listing all available evaluation backends.

log_samples: bool | None = False#

Whether to log the samples used for evaluation.

If not set (False): the model samples used for evaluation will not be logged. If set to True: the model samples generated during inference and used for evaluation will be logged in backend_config.json. The backend may also log other intermediate results related to inference.

num_samples: int | None = None#

Number of samples/examples to evaluate from this dataset.

Mostly for debugging, in order to reduce the runtime. If not set (None): the entire dataset is evaluated. If set, this must be a positive integer.

task_name: str | None = None#

The task to evaluate or the custom evaluation function to use.

For LM Harness evaluations (when the evaluation_backend is set to EvaluationBackend.LM_HARNESS), the task_name corresponds to a predefined task to evaluate on (e.g. “mmlu”). A list of all supported tasks by the LM Harness backend can be found by running: lm-eval –tasks list.

For custom evaluations (when evaluation_backend is set to EvaluationBackend.CUSTOM), the task_name should be the registry key for the custom evaluation function to be used. Users can register new evaluation functions using the decorator @register_evaluation_function.

class oumi.core.configs.FSDPParams(enable_fsdp: bool = False, sharding_strategy: ShardingStrategy = ShardingStrategy.FULL_SHARD, cpu_offload: bool = False, mixed_precision: str | None = None, backward_prefetch: BackwardPrefetch = BackwardPrefetch.BACKWARD_PRE, forward_prefetch: bool = False, use_orig_params: bool | None = None, state_dict_type: StateDictType = StateDictType.FULL_STATE_DICT, auto_wrap_policy: AutoWrapPolicy = AutoWrapPolicy.NO_WRAP, min_num_params: int = 100000, transformer_layer_cls: str | None = None, sync_module_states: bool = True)[source]#

Bases: BaseParams

Configuration options for Pytorch’s FullyShardedDataParallel (FSDP) training.

auto_wrap_policy: AutoWrapPolicy = 'NO_WRAP'#: Policy for automatically wrapping layers in FSDP.

backward_prefetch: BackwardPrefetch = 'BACKWARD_PRE'#

Determines when to prefetch the next set of parameters.

Improves throughput by enabling communication and computation overlap in the backward pass at the cost of slightly increased memory usage.

Options:

BACKWARD_PRE: Enables the most overlap but increases memory: usage the most. This prefetches the next set of parameters before the current set of parameters’ gradient computation.
BACKWARD_POST: Enables less overlap but requires less memory: usage. This prefetches the next set of parameters after the current set of parameters’ gradient computation.
NO_PREFETCH: Disables backward prefetching altogether. This has no overlap and: does not increase memory usage. This may degrade throughput significantly.

cpu_offload: bool = False#: If True, offloads parameters and gradients to CPU when not in use.

enable_fsdp: bool = False#

If True, enables FullyShardedDataParallel training.

Allows training larger models by sharding models and gradients across multiple GPUs.

forward_prefetch: bool = False#: If True, prefetches the forward pass results.

min_num_params: int = 100000#: Minimum number of parameters for a layer to be wrapped when using size_based policy. This has no effect when using transformer_based policy.

mixed_precision: str | None = None#

Enables mixed precision training.

Options: None, “fp16”, “bf16”.

sharding_strategy: ShardingStrategy = 'FULL_SHARD'#

Determines how to shard model parameters across GPUs.

See torch.distributed.fsdp.api.ShardingStrategy for more details.

Options:

FULL_SHARD: Shards model parameters, gradients, and optimizer states.: Provides the most memory efficiency but may impact performance.
SHARD_GRAD_OP: Shards gradients and optimizer states, but not model: parameters. Balances memory savings and performance.
HYBRID_SHARD: Shards model parameters within a node and replicates them: across nodes.
NO_SHARD: No sharding is applied. Parameters, gradients, and optimizer states: are kept in full on each GPU.
HYBRID_SHARD_ZERO2: Apply SHARD_GRAD_OP within a node, and replicate: parameters across nodes.

Warning

NO_SHARD option is deprecated and will be removed in a future release.: Please use DistributedDataParallel (DDP) instead.

state_dict_type: StateDictType = 'FULL_STATE_DICT'#: Specifies the type of state dict to use for checkpointing.

sync_module_states: bool = True#

If True, synchronizes module states across processes.

When enabled, each FSDP module broadcasts parameters and buffers from rank 0 to ensure replication across ranks.

transformer_layer_cls: str | None = None#

Class name for transformer layers when using transformer_based policy.

This has no effect when using size_based policy.

use_orig_params: bool | None = None#

If True, uses the PyTorch Module’s original parameters for FSDP.

For more information, see: https://pytorch.org/docs/stable/fsdp.html. If not specified, it will be automatically inferred based on other config values.

class oumi.core.configs.GenerationParams(max_new_tokens: int = 1024, batch_size: Optional[int] = 1, exclude_prompt_from_response: bool = True, seed: Optional[int] = None, temperature: float = 0.0, top_p: float = 1.0, frequency_penalty: float = 0.0, presence_penalty: float = 0.0, stop_strings: Optional[list[str]] = None, stop_token_ids: Optional[list[int]] = None, logit_bias: dict[typing.Any, float] = <factory>, min_p: float = 0.0, use_cache: bool = False, num_beams: int = 1, use_sampling: bool = False, guided_decoding: Optional[oumi.core.configs.params.guided_decoding_params.GuidedDecodingParams] = None)[source]#

Bases: BaseParams

__post_init__()[source]#: Validates generation-specific parameters.

batch_size: int | None = 1#

The number of sequences to generate in parallel.

Larger batch sizes can improve throughput but require more memory. Default is 1.

The value must either be positive or None, in which case the behavior is dependent on the downstream application. For example, LM Harness will automatically determine the largest batch size that will fit in memory.

For inference, this parameter is only used in NativeTextInferenceEngine.

exclude_prompt_from_response: bool = True#: Whether to trim the model’s response and remove the prepended prompt.

frequency_penalty: float = 0.0#: Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

guided_decoding: GuidedDecodingParams | None = None#: Parameters for guided decoding.

logit_bias: dict[Any, float]#

Modify the likelihood of specified tokens appearing in the completion.

Keys are tokens (specified by their token ID in the tokenizer), and values are the bias (-100 to 100). Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

max_new_tokens: int = 1024#

The maximum number of new tokens to generate.

This limits the length of the generated text to prevent excessively long outputs. Default is 1024 tokens.

min_p: float = 0.0#

Sets a minimum probability threshold for token selection.

Tokens with probabilities below this threshold are filtered out before top-p or top-k sampling. This can help prevent the selection of highly improbable tokens. Default is 0.0 (no minimum threshold).

num_beams: int = 1#: Number of beams for beam search. 1 means no beam search. Larger number of beams will make for a more thorough search for probable output token sequences, at the cost of increased computation time. Default is 1.

presence_penalty: float = 0.0#: Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.

seed: int | None = None#: Seed to use for random number determinism. If specified, APIs may use this parameter to make a best-effort at determinism.

stop_strings: list[str] | None = None#: List of sequences where the API will stop generating further tokens.

stop_token_ids: list[int] | None = None#: List of token ids for which the API will stop generating further tokens. This is only supported in VLLMInferenceEngine and NativeTextInferenceEngine.

temperature: float = 0.0#

Controls randomness in the output.

Higher values (e.g., 1.0) make output more random, while lower values (e.g., 0.2) make it more focused and deterministic.

top_p: float = 1.0#

An alternative to temperature, called nucleus sampling.

It sets the cumulative probability threshold for token selection. For example, 0.9 means only considering the tokens comprising the top 90% probability mass.

use_cache: bool = False#: Whether to use the model’s internal cache (key/value attentions) to speed up generation. Default is False.

use_sampling: bool = False#: Whether to use sampling for next-token generation. If False, uses greedy decoding. Default is False.

class oumi.core.configs.GrpoParams(model_init_kwargs: dict[str, typing.Any] = <factory>, max_prompt_length: Optional[int] = None, max_completion_length: Optional[int] = None, num_generations: Optional[int] = None, temperature: float = 0.9, remove_unused_columns: bool = False, repetition_penalty: Optional[float] = 1.0, use_vllm: bool = False, vllm_mode: Optional[str] = None, vllm_gpu_memory_utilization: float = 0.9, epsilon: float = 0.2, log_completions: bool = False)[source]#

Bases: BaseParams

__post_init__()[source]#: Verifies params.

epsilon: float = 0.2#

Epsilon value for clipping the relative probability in the loss.

For example, if epsilon is 0.2, then the new probability can only differ from the old probability by a factor of x0.8-1.2.

log_completions: bool = False#: Whether to log prompt and completion pairs every logging_steps steps.

max_completion_length: int | None = None#

Maximum length of the generated completion.

If unspecified (None), defaults to 256.

max_prompt_length: int | None = None#

Maximum length of the prompt.

If the prompt is longer than this value, it will be truncated left. If unspecified (None), defaults to 512.

model_init_kwargs: dict[str, Any]#: Keyword arguments for AutoModelForCausalLM.from_pretrained(…)

num_generations: int | None = None#

Number of generations per prompt to sample.

The global batch size (num_processes * per_device_batch_size) must be divisible by this value. If unspecified (None), defaults to 8.

remove_unused_columns: bool = False#

Whether to only keep the column “prompt” in the dataset.

If you use a custom reward function that requires any column other than “prompts” and “completions”, you should set it to False.

repetition_penalty: float | None = 1.0#

Float that penalizes new tokens if they appear in the prompt/response so far.

Values > 1.0 encourage the model to use new tokens, while values < 1.0 encourage the model to repeat tokens.

temperature: float = 0.9#

Temperature for sampling.

The higher the temperature, the more random the completions.

to_hf_trainer_kwargs() → dict[str, Any][source]#: Converts GrpoParams to TRL’s GRPOConfig kwargs.

use_vllm: bool = False#

Whether to use vLLM for generating completions.

If set to True, ensure that a GPU is kept unused for training, as vLLM will require one for generation.

vllm_gpu_memory_utilization: float = 0.9#

Ratio (between 0 and 1) of GPU memory to reserve.

Fraction of VRAM reserved for the model weights, activations, and KV cache on the device dedicated to generation powered by vLLM. Higher values will increase the KV cache size and thus improve the model’s throughput. However, if the value is too high, it may cause out-of-memory (OOM) errors during initialization.

vllm_mode: str | None = None#

The mode to use for vLLM generation (“colocate” or “server”).

If set to None, defaults to “server”.

Server mode means that vLLM is running on a separate server that the trainer will communicate with. It requires the server to be started with trl vllm-serve beforehand.

Colocate mode means that vLLM will run in the same process as the trainer and share GPUs. While this is simpler as it doesn’t require a separate server, vLLM will contend with the trainer for GPU resources.

class oumi.core.configs.GuidedDecodingParams(json: Any | None = None, regex: str | None = None, choice: list[str] | None = None)[source]#

Bases: BaseParams

Parameters for guided decoding.

The parameters are mutually exclusive. Only one of the parameters can be specified at a time.

__post_init__() → None[source]#: Validate parameters.

choice: list[str] | None = None#

List of allowed choices for the output.

Restricts model output to one of the provided choices. Useful for forcing the model to select from a predefined set of options.

json: Any | None = None#

JSON schema, Pydantic model, or string to guide the output format.

Can be a dict containing a JSON schema, a Pydantic model class, or a string containing JSON schema. Used to enforce structured output from the model.

regex: str | None = None#

Regular expression pattern to guide the output format.

Pattern that the model output must match. Can be used to enforce specific text formats or patterns.

class oumi.core.configs.InferenceConfig(model: oumi.core.configs.params.model_params.ModelParams = <factory>, generation: oumi.core.configs.params.generation_params.GenerationParams = <factory>, input_path: Optional[str] = None, output_path: Optional[str] = None, engine: Optional[oumi.core.configs.inference_engine_type.InferenceEngineType] = None, remote_params: Optional[oumi.core.configs.params.remote_params.RemoteParams] = None)[source]#

Bases: BaseConfig

engine: InferenceEngineType | None = None#

The inference engine to use for generation.

Options:

NATIVE: Use the native inference engine via a local forward pass.

VLLM: Use the vLLM inference engine started locally by oumi.

REMOTE_VLLM: Use the external vLLM inference engine.

SGLANG: Use the SGLang inference engine.

LLAMACPP: Use LlamaCPP inference engine.

REMOTE: Use the inference engine for APIs that implement the OpenAI Chat API interface.

ANTHROPIC: Use the inference engine for Anthropic’s API.

If not specified, the “NATIVE” engine will be used.

generation: GenerationParams#: Parameters for text generation during inference.

input_path: str | None = None#

Path to the input file containing prompts for text generation.

The input file should be in JSONL format, where each line is a JSON representation of an Oumi Conversation object.

model: ModelParams#: Parameters for the model used in inference.

output_path: str | None = None#: Path to the output file where the generated text will be saved.

remote_params: RemoteParams | None = None#: Parameters for running inference against a remote API.

class oumi.core.configs.InferenceEngineType(value)[source]#

Bases: str, Enum

The supported inference engines.

ANTHROPIC = 'ANTHROPIC'#: The inference engine for Anthropic’s API.

DEEPSEEK = 'DEEPSEEK'#: The inference engine for DeepSeek Platform API.

GOOGLE_GEMINI = 'GEMINI'#: The inference engine for Gemini.

GOOGLE_VERTEX = 'GOOGLE_VERTEX'#: The inference engine for Google Vertex AI.

LAMBDA = 'LAMBDA'#: The Lambda inference engine.

LLAMACPP = 'LLAMACPP'#: The LlamaCPP inference engine.

NATIVE = 'NATIVE'#: The native inference engine using a local forward pass.

OPENAI = 'OPENAI'#: The inference engine for OpenAI API.

PARASAIL = 'PARASAIL'#: The inference engine for Parasail API.

REMOTE = 'REMOTE'#: The inference engine for APIs that implement the OpenAI Chat API interface.

REMOTE_VLLM = 'REMOTE_VLLM'#: The external vLLM inference engine.

SAMBANOVA = 'SAMBANOVA'#: The inference engine for SambaNova API.

SGLANG = 'SGLANG'#: The SGLang inference engine.

TOGETHER = 'TOGETHER'#: The inference engine for Together API.

VLLM = 'VLLM'#: The vLLM inference engine started locally by oumi using vLLM library.

class oumi.core.configs.JobConfig(name: str | None = None, user: str | None = None, working_dir: str = '???', num_nodes: int = 1, resources: ~oumi.core.configs.job_config.JobResources = <factory>, envs: dict[str, str] = <factory>, file_mounts: dict[str, str] = <factory>, storage_mounts: dict[str, ~oumi.core.configs.job_config.StorageMount] = <factory>, setup: str | None = None, run: str = '???')[source]#

Bases: BaseConfig

Configuration for launching jobs on a cluster.

__finalize_and_validate__()[source]#: Finalizes and validates the configuration.

envs: dict[str, str]#: The environment variables to set on the node.

file_mounts: dict[str, str]#

File mounts to attach to the node.

For mounting (copying) local directories, the key is the file path on the remote and the value is the local path. The keys of file_mounts cannot be shared with storage_mounts.

name: str | None = None#: Job name (optional). Only used for display purposes.

num_nodes: int = 1#: The number of nodes to use for the job. Defaults to 1.

resources: JobResources#: The resources required for each node in the job.

run: str = '???'#: The script to run on every node. Required. Runs after setup.

setup: str | None = None#

The setup script to run on every node. Optional.

setup will always be executed before run. In sky-based clouds, setup is executed only once upon cluster creation, not once per job.

ex) pip install -r requirements.txt

storage_mounts: dict[str, StorageMount]#

Storage system mounts to attach to the node.

For mounting remote storage solutions, the key is the file path on the remote and the value is a StorageMount. The keys of storage_mounts cannot be shared with file_mounts.

user: str | None = None#: The user that the job will run as (optional). Required only for Polaris.

working_dir: str = '???'#

The local directory containing the scripts required to execute this job.

This directory will be copied to the remote node before the job is executed.

Bases: object

Resources required for a single node in a job.

accelerators: str | None = None#

Accelerator type (optional). Supported values vary by environment.

For GCP you may specify the accelerator name and count, e.g. “V100:4”.

cloud: str = '???'#

The cloud used to run the job (required).

Options:

aws: Amazon Web Services
azure: Microsoft Azure
gcp: Google Cloud Platform
lambda: Lambda Cloud
local: The local machine launching the job
polaris: The Polaris cluster at Argonne National Laboratory
runpod: RunPod

cpus: str | None = None#

Number of vCPUs to use per node (optional).

Sky-based clouds support strings with modifiers, e.g. “2+” to indicate at least 2 vCPUs.

disk_size: int | None = None#

Disk size in GiB to allocate for OS (mounted at /) (optional)

Ignored by Polaris.

disk_tier: str | None = None#

Disk tier to use for OS (optional).

For sky-based clouds this Could be one of ‘low’, ‘medium’, ‘high’, ‘ultra’, or ‘best’ (default: None). As of Feb ‘25, only AWS, Azure, GCP, and OCI support disk tiers.

image_id: str | None = None#

The image id used to boot the instances (optional).

You can specify a docker by using the format docker:<image_id>. This field is not applicable for all clouds.

instance_type: str | None = None#

Instance type to use (optional).

Supported values vary by environment. The instance type is automatically inferred if accelerators is specified.

memory: str | None = None#

Memory to allocate per node in GiB (optional).

Sky-based clouds support strings with modifiers, e.g. “256+” to indicate at least 256 GB.

region: str | None = None#: The region to use (optional). Supported values vary by environment.

use_spot: bool = False#

Whether the cluster should use spot instances (optional).

If unspecified, defaults to False (on-demand instances).

zone: str | None = None#: The zone to use (optional). Supported values vary by environment.

class oumi.core.configs.JudgeAttribute(*, name: str, system_prompt: str, examples: list[T] = None, value_type: JudgeAttributeValueType = JudgeAttributeValueType.BOOL, limit_examples: int | None = 5)[source]#

Bases: BaseModel, Generic[T]

Attributes for the judge.

Example

>>> attribute = JudgeAttribute(
...     name="helpful",
...     system_prompt="You are an impartial judge.",
...     examples=[
...         TemplatedMessage(
...             role=Role.USER,
...             request="What is the capital of France?",
...             response="The capital of France is Paris.",
...         ),
...         TemplatedMessage(
...             role=Role.ASSISTANT,
...             response="True",
...         ),
...     ],
...     value_type=JudgeAttributeValueType.BOOL,
...     limit_examples=5,
... )
>>> print(attribute.name)
helpful

property conversation: Conversation#

Returns the judgement conversation in oumi format.

This will include the judge system prompt, and any few-shot examples.

examples: list[T]#: A list of few-shot example inputs and judgements.

limit_examples: int | None#

The maximum number of examples to use.

This is an optional parameter that limits the number of examples to be used for judging the attribute. If not specified, the default is 5.

classmethod load(filename: str) → JudgeAttribute[source]#: Loads the judge attribute config from a file.

property messages: list[Message]#

Returns the messages in oumi format.

This will include the judge system prompt, and any few-shot examples.

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}#: A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'examples': FieldInfo(annotation=list[~T], required=False, default_factory=list, init=True, init_var=False, kw_only=<dataclasses._MISSING_TYPE object>), 'limit_examples': FieldInfo(annotation=Union[int, NoneType], required=False, default=5), 'name': FieldInfo(annotation=str, required=True), 'system_prompt': FieldInfo(annotation=str, required=True), 'value_type': FieldInfo(annotation=JudgeAttributeValueType, required=False, default=<JudgeAttributeValueType.BOOL: 'bool'>)}#

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

name: str#: The name of the attribute being judged.

system_prompt: str#: The system prompt for the judge.

value_type: JudgeAttributeValueType#: The type of value for the attribute.

class oumi.core.configs.JudgeAttributeValueType(value)[source]#

Bases: str, Enum

Enumeration of possible value types for judge attributes.

BOOL = 'bool'#: Boolean value type.

CATEGORICAL = 'categorical'#: Categorical value type.

LIKERT_5 = 'likert-5'#: Likert scale with 5 points value type.

class oumi.core.configs.JudgeConfig(attributes: dict[str, ~oumi.core.configs.judge_config.JudgeAttribute] = <factory>, model: ~oumi.core.configs.params.model_params.ModelParams = <factory>, generation: ~oumi.core.configs.params.generation_params.GenerationParams = <factory>, engine: ~oumi.core.configs.inference_engine_type.InferenceEngineType = InferenceEngineType.NATIVE, remote_params: ~oumi.core.configs.params.remote_params.RemoteParams | None = None)[source]#

Bases: BaseConfig

Configuration for the Judge.

This class holds the configuration for the Judge,: including the attributes to judge, the model parameters, and the text generation parameters.

Examples

>>> attributes = {
...     "helpful": JudgeAttribute(
...         name="helpful",
...         system_prompt="Is this answer helpful?",
...         examples=[
...             TemplatedMessage(
...                 role=Role.USER,
...                 request="What is the capital of France?",
...                 response="The capital of France is Paris.",
...             ),
...             TemplatedMessage(
...                 role=Role.ASSISTANT,
...                 response="True",
...             ),
...         ],
...     ),
...     "honest": JudgeAttribute(
...         name="honest",
...         system_prompt="Is this answer honest?",
...         examples=[]
...     )
... }
>>> model_params = ModelParams(model_name="example-model")
>>> generation_params = GenerationParams(max_new_tokens=100)
>>> judge_config = JudgeConfig(
...     attributes=attributes,
...     model=model_params,
...     generation=generation_params
... )

attributes: dict[str, JudgeAttribute]#: The attributes to judge.

engine: InferenceEngineType = 'NATIVE'#: The inference engine to use for generation.

generation: GenerationParams#: Parameters for text generation during inference.

model: ModelParams#: Parameters for the model used in inference.

remote_params: RemoteParams | None = None#: Parameters for running inference against a remote API.

oumi.core.configs.JudgeConfigV2#: alias of JudgeConfig

class oumi.core.configs.JudgeOutputType(value)[source]#

Bases: str, Enum

Enumeration of possible output types for the judge’s output fields.

BOOL = 'bool'#: Boolean judgment (True/False, Yes/No).

ENUM = 'enum'#: Categorical judgment from predefined options.

FLOAT = 'float'#: Floating-point value judgment.

INT = 'int'#: Integer value judgment.

TEXT = 'text'#: Free-form text judgment.

class oumi.core.configs.JudgeResponseFormat(value)[source]#

Bases: str, Enum

Enumeration of possible response formats for the judge output.

JSON = 'json'#: JSON structured response format.

RAW = 'raw'#: Plain text response format.

XML = 'xml'#: XML-tagged response format.

class oumi.core.configs.LMHarnessTaskParams(evaluation_backend: str = '', task_name: str | None = None, num_samples: int | None = None, log_samples: bool | None = False, eval_kwargs: dict[str, ~typing.Any] = <factory>, num_fewshot: int | None = None)[source]#

Bases: EvaluationTaskParams

Parameters for the LM Harness evaluation framework.

LM Harness is a comprehensive benchmarking suite for evaluating language models across various tasks.

__post_init__()[source]#: Verifies params.

num_fewshot: int | None = None#

Number of few-shot examples (with responses) to add in the prompt, in order to teach the model how to respond to the specific dataset’s prompts.

If not set (None): LM Harness will decide the value. If set to 0: no few-shot examples will be added in the prompt.

class oumi.core.configs.LoraWeightInitialization(value)[source]#

Bases: str, Enum

Enum representing the supported weight initializations for LoRA adapters.

DEFAULT = 'default'#

EVA = 'eva'#

GAUSSIAN = 'gaussian'#

LOFTQ = 'loftq'#

OLORA = 'olora'#

PISA = 'pissa'#

PISSA_NITER = 'pissa_niter_[number of iters]'#

RANDOM = 'random'#

get_literal_value() → Literal['default', 'random', 'gaussian', 'eva', 'pissa', 'pissa_niter_[number of iters]', 'loftq', 'olora'][source]#: Returns a literal value of the enum.

class oumi.core.configs.MixedPrecisionDtype(value)[source]#

Bases: str, Enum

Enum representing the dtype used for mixed precision training.

For more details on mixed-precision training, see: https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html

BF16 = 'bf16'#

Similar to fp16 mixed precision, but with bf16 instead.

This requires Ampere or higher NVIDIA architecture, or using CPU or Ascend NPU.

FP16 = 'fp16'#

fp16 mixed precision.

Requires ModelParams.torch_dtype (the dtype of the model weights) to be fp32. The model weights and optimizer state are fp32, but some ops will run in fp16 to improve training speed.

NONE = 'none'#

No mixed precision.

Uses ModelParams.torch_dtype as the dtype for all tensors (model weights, optimizer state, activations, etc.).

class oumi.core.configs.MixtureStrategy(value)[source]#

Bases: str, Enum

Enum representing the supported mixture strategies for datasets.

ALL_EXHAUSTED = 'all_exhausted'#

FIRST_EXHAUSTED = 'first_exhausted'#

get_literal_value() → Literal['first_exhausted', 'all_exhausted'][source]#: Returns a literal value of the enum.

class oumi.core.configs.ModelParams(model_name: str = '???', adapter_model: Optional[str] = None, tokenizer_name: Optional[str] = None, tokenizer_pad_token: Optional[str] = None, tokenizer_kwargs: dict[str, typing.Any] = <factory>, processor_kwargs: dict[str, typing.Any] = <factory>, model_max_length: Optional[int] = None, load_pretrained_weights: bool = True, trust_remote_code: bool = False, torch_dtype_str: str = 'float32', compile: bool = False, chat_template: Optional[str] = None, attn_implementation: Optional[str] = None, device_map: Optional[str] = 'auto', model_kwargs: dict[str, typing.Any] = <factory>, enable_liger_kernel: bool = False, shard_for_eval: bool = False, freeze_layers: list[str] = <factory>, model_revision: Optional[str] = None)[source]#

Bases: BaseParams

__finalize_and_validate__()[source]#: Finalizes and validates final config params.

__post_init__()[source]#: Populate additional params.

adapter_model: str | None = None#

The path to an adapter model to be applied on top of the base model.

If provided, this adapter will be loaded and applied to the base model. The adapter path could alternatively be specified in model_name.

attn_implementation: str | None = None#

The attention implementation to use.

Valid options include:

None: Use the default attention implementation (spda for torch>=2.1.1, else eager)
“sdpa”: Use PyTorch’s scaled dot-product attention
“flash_attention_2”: Use Flash Attention 2 for potentially faster computation. Requires “flash-attn” package to be installed
“eager”: Manual implementation of attention

chat_template: str | None = None#

The chat template to use for formatting inputs.

If provided, this template will be used to format multi-turn conversations for models that support chat-like interactions.

Note

Different models may require specific chat templates. Consult the model’s documentation for the appropriate template to use.

compile: bool = False#

Whether to JIT compile the model.

For training, do not set this param, and instead set TrainingParams.compile.

device_map: str | None = 'auto'#

Specifies how to distribute the model’s layers across available devices.

“auto”: Automatically distribute the model across available devices
None: Load the entire model on the default device

Note

“auto” is generally recommended as it optimizes device usage, especially for large models that don’t fit on a single GPU.

enable_liger_kernel: bool = False#

Whether to enable the Liger kernel for potential performance improvements.

Liger is an optimized CUDA kernel that can accelerate certain operations.

Tip

Enabling this may improve performance, but ensure compatibility with your model and hardware before use in production.

freeze_layers: list[str]#

A list of layer names to freeze during training.

These layers will have their parameters set to not require gradients, effectively preventing them from being updated during the training process. This is useful for fine-tuning specific parts of a model while keeping other parts fixed.

load_pretrained_weights: bool = True#

Whether to load the pretrained model’s weights.

If True, the model will be initialized with pretrained weights. If False, the model will be initialized from the pretrained config without loading weights.

model_kwargs: dict[str, Any]#

Additional keyword arguments to pass to the model’s constructor.

This allows for passing any model-specific parameters that are not covered by other fields in ModelParams.

Note

Use this for model-specific parameters or to enable experimental features.

model_max_length: int | None = None#

The maximum sequence length the model can handle.

If specified, this will override the default max length of the model’s config.

Note

Setting this to a larger value may increase memory usage but allow for processing longer inputs. Ensure your hardware can support the chosen length.

model_name: str = '???'#

The name or path of the model or LoRA adapter to use.

This can be a model identifier from the Oumi registry, HuggingFace Hub, or a path to a local directory containing model files.

The LoRA adapter can be specified here instead of in adapter_model. If so, this value is copied to adapter_model, and the appropriate base model is set here instead. The base model could either be in the same directory as the adapter, or specified in the adapter’s config file.

model_revision: str | None = None#

The revision of the model to use.

This is used to specify the version of the model to use.

processor_kwargs: dict[str, Any]#

Additional keyword arguments to pass into the processor’s constructor.

Processors are used in Oumi for vision-language models to process image and text inputs. This field is optional and can be left empty for text-only models, or if not needed.

These params override model-specific default values for these kwargs, if present.

shard_for_eval: bool = False#

Whether to shard the model for evaluation.

This is needed for large models that do not fit on a single GPU. It is used as the value for the parallelize argument in LM Harness.

tokenizer_kwargs: dict[str, Any]#

Additional keyword arguments to pass into the tokenizer’s constructor.

This allows for passing any tokenizer-specific parameters that are not covered by other fields in ModelParams.

tokenizer_name: str | None = None#

The name or path of the tokenizer to use.

If None, the tokenizer associated with model_name will be used. Specify this if you want to use a different tokenizer than the default for the model.

tokenizer_pad_token: str | None = None#

The padding token used by the tokenizer.

If this is set, it will override the default padding token of the tokenizer and the padding token optionally defined in the tokenizer_kwargs.

torch_dtype_str: str = 'float32'#

The data type to use for the model’s parameters as a string.

Valid options are: - “float32” or “f32” or “float” for 32-bit floating point - “float16” or “f16” or “half” for 16-bit floating point - “bfloat16” or “bf16” for brain floating point - “float64” or “f64” or “double” for 64-bit floating point

This string will be converted to the corresponding torch.dtype. Defaults to “float32” for full precision.

trust_remote_code: bool = False#

Whether to allow loading remote code when loading the model.

If True, this allows loading and executing code from the model’s repository, which can be a security risk. Only set to True for models you trust.

Defaults to False for safety.

class oumi.core.configs.PeftParams(lora_r: int = 8, lora_alpha: int = 8, lora_dropout: float = 0.0, lora_target_modules: Optional[list[str]] = None, lora_modules_to_save: Optional[list[str]] = None, lora_bias: str = 'none', lora_init_weights: oumi.core.configs.params.peft_params.LoraWeightInitialization = <LoraWeightInitialization.DEFAULT: 'default'>, lora_task_type: peft.utils.peft_types.TaskType = <TaskType.CAUSAL_LM: 'CAUSAL_LM'>, q_lora: bool = False, q_lora_bits: int = 4, bnb_4bit_quant_type: str = 'fp4', llm_int8_skip_modules: Optional[list[str]] = None, use_bnb_nested_quant: bool = False, bnb_4bit_quant_storage: str = 'uint8', bnb_4bit_compute_dtype: str = 'float32', peft_save_mode: oumi.core.configs.params.peft_params.PeftSaveMode = <PeftSaveMode.ADAPTER_ONLY: 'adapter_only'>)[source]#

Bases: BaseParams

bnb_4bit_compute_dtype: str = 'float32'#

Compute type of the quantized parameters. It can be different than the input type, e.g., it can be set to a lower precision for improved speed.

The string will be converted to the corresponding torch.dtype.

Valid string options are: - “float32” for 32-bit floating point - “float16” for 16-bit floating point - “bfloat16” for brain floating point - “float64” for 64-bit floating point

Defaults to “float16” for half precision.

bnb_4bit_quant_storage: str = 'uint8'#

The storage type for packing quantized 4-bit parameters.

Defaults to ‘uint8’ for efficient storage.

bnb_4bit_quant_type: str = 'fp4'#

The type of 4-bit quantization to use.

Can be ‘fp4’ (float point 4) or ‘nf4’ (normal float 4).

llm_int8_skip_modules: list[str] | None = None#: An explicit list of the modules that we do not want to convert in 8-bit.

lora_alpha: int = 8#

The scaling factor for the LoRA update.

This value is typically set equal to lora_r or 2*lora_r for stable training.

lora_bias: str = 'none'#

Bias type for LoRA.

Can be ‘none’, ‘all’ or ‘lora_only’: - ‘none’: No biases are trained. - ‘all’: All biases in the model are trained. - ‘lora_only’: Only biases in LoRA layers are trained.

If ‘all’ or ‘lora_only’, the corresponding biases will be updated during training. Note that this means even when disabling the adapters, the model will not produce the same output as the base model would have without adaptation.

For more details, see: huggingface/peft

lora_dropout: float = 0.0#

The dropout probability applied to LoRA layers.

This helps prevent overfitting in the adaptation layers.

lora_init_weights: LoraWeightInitialization = 'default'#

Passing LoraWeightInitialization.DEFAULT will use the underlying reference implementation of the corresponding model from Microsoft.

Other valid (LoraWeightInitialization) options include:

“random” which will use fully random initialization and is discouraged.
“gaussian” for Gaussian initialization.
“eva” for Explained Variance Adaptation (EVA) (https://arxiv.org/abs/2410.07170).
“loftq” for improved performance when LoRA is combined with with quantization (https://arxiv.org/abs/2310.08659).
“olora” for Orthonormal Low-Rank Adaptation of Large Language Models (OLoRA) (https://arxiv.org/html/2406.01775v1).
“pissa” for Principal Singular values and Singular vectors Adaptation (PiSSA) (https://arxiv.org/abs/2404.02948).

For more information, see HF:

huggingface/peft

lora_modules_to_save: list[str] | None = None#

List of module names to unfreeze and train alongside LoRA parameters.

These modules will be fully fine-tuned, not adapted using LoRA. Use this to selectively train certain parts of the model in full precision.

lora_r: int = 8#

The rank of the update matrices in LoRA.

A higher value allows for more expressive adaptations but increases the number of trainable parameters.

lora_target_modules: list[str] | None = None#

List of module names to apply LoRA to.

If None, LoRA will be applied to all linear layers in the model. Specify module names to selectively apply LoRA to certain parts of the model.

lora_task_type: TaskType = 'CAUSAL_LM'#

The task type for LoRA adaptation.

Defaults to CAUSAL_LM (Causal Language Modeling).

peft_save_mode: PeftSaveMode = 'adapter_only'#

How to save the final model during PEFT training.

This option is only used if TrainingParams.save_final_model is True. By default, only the model adapter is saved to reduce disk usage. Options are defined in the PeftSaveMode enum and include: - ADAPTER_ONLY: Only save the model adapter. - ADAPTER_AND_BASE_MODEL: Save the base model in addition to the adapter. - MERGED: Merge the adapter and base model’s weights and save as a single model.

q_lora: bool = False#

Whether to use quantization for LoRA (Q-LoRA).

If True, enables quantization for more memory-efficient fine-tuning.

q_lora_bits: int = 4#

The number of bits to use for quantization in Q-LoRA.

This is only used if q_lora is True.

Defaults to 4-bit quantization.

to_bits_and_bytes() → BitsAndBytesConfig[source]#

Creates a configuration for quantized models via BitsAndBytes.

The resulting configuration uses the instantiated peft parameters.

use_bnb_nested_quant: bool = False#

Whether to use nested quantization.

Nested quantization can provide additional memory savings.

class oumi.core.configs.PeftSaveMode(value)[source]#

Bases: Enum

Enum representing how to save the final model during PEFT training.

While models saved with any of these options can be loaded by Oumi, those saved with ADAPTER_ONLY are not self-contained; the base model will be loaded separately from the local HF cache or downloaded from HF Hub if not in the cache.

ADAPTER_AND_BASE_MODEL = 'adapter_and_base_model'#

Save the base model in addition to the adapter.

This is similar to ADAPTER_ONLY, but the base model’s weights are also saved in the same directory as the adapter weights, making the output dir self-contained.

ADAPTER_ONLY = 'adapter_only'#

Only save the model adapter.

Note that when loading this saved model, the base model will be loaded separately from the local HF cache or downloaded from HF Hub.

MERGED = 'merged'#

Merge the adapter and base model’s weights and save as a single model.

Note that the resulting model is a standard HF Transformers model, and is no longer a PEFT model. A copy of the adapter before merging is saved in the “adapter/” subdirectory.

class oumi.core.configs.ProfilerParams(save_dir: Optional[str] = None, enable_cpu_profiling: bool = False, enable_cuda_profiling: bool = False, record_shapes: bool = False, profile_memory: bool = False, with_stack: bool = False, with_flops: bool = False, with_modules: bool = False, row_limit: int = 50, schedule: oumi.core.configs.params.profiler_params.ProfilerScheduleParams = <factory>)[source]#

Bases: BaseParams

enable_cpu_profiling: bool = False#

Whether to profile CPU activity.

Corresponds to torch.profiler.ProfilerActivity.CPU.

enable_cuda_profiling: bool = False#

Whether to profile CUDA.

Corresponds to torch.profiler.ProfilerActivity.CUDA.

profile_memory: bool = False#: Track tensor memory allocation/deallocation.

record_shapes: bool = False#: Save information about operator’s input shapes.

row_limit: int = 50#

Max number of rows to include into profiling report tables.

Set to -1 to make it unlimited.

save_dir: str | None = None#

Directory where the profiling data will be saved to.

If not specified and profiling is enabled, then the profiler sub-dir will be used under output_dir.

schedule: ProfilerScheduleParams#: Parameters that define what subset of training steps to profile.

with_flops: bool = False#: Record module hierarchy (including function names) corresponding to the callstack of the op.

with_modules: bool = False#: Use formula to estimate the FLOPs (floating point operations) of specific operators (matrix multiplication and 2D convolution).

with_stack: bool = False#: Record source information (file and line number) for the ops.

class oumi.core.configs.QuantizationConfig(model: ~oumi.core.configs.params.model_params.ModelParams = <factory>, method: str = 'awq_q4_0', output_path: str = 'quantized_model.gguf', output_format: str = 'gguf', batch_size: int | None = None, verbose: bool = False, awq_group_size: int = 128, awq_zero_point: bool = True, awq_version: str = 'GEMM', cleanup_temp: bool = True, calibration_samples: int = 512)[source]#

Bases: BaseConfig

Configuration for model quantization.

Reduces model size by converting weights from higher precision (float32) to lower precision (int4, int8) formats while maintaining performance.

Tested on NVIDIA H100 GPU with models up to 14B parameters.

Example

>>> config = QuantizationConfig(
...     model=ModelParams(model_name="meta-llama/Llama-2-7b-hf"),
...     method="awq_q4_0",
...     output_path="llama2-7b-q4.gguf"
... )

__post_init__()[source]#: Post-initialization validation.

awq_group_size: int = 128#: AWQ weight grouping size. 128 (balanced), 64 (higher accuracy), 256 (faster).

awq_version: str = 'GEMM'#: AWQ kernel version. ‘GEMM’ (faster, default) or ‘GEMV’.

awq_zero_point: bool = True#: Enable zero-point quantization for AWQ. Generally recommended.

batch_size: int | None = None#

32, 8-32, 1-8.

Type:: Batch size for calibration. Auto-sized if None. Typical

calibration_samples: int = 512#: AWQ calibration samples. 512 (balanced), 128 (faster), 1024 (more accurate).

cleanup_temp: bool = True#: Remove temporary AWQ files after conversion.

method: str = 'awq_q4_0'#: Quantization method. AWQ methods (awq_q4_0, awq_q8_0) provide best quality. Direct GGUF methods (q4_0, q8_0) for llama.cpp. Precision methods (f16, f32).

model: ModelParams#: Model to quantize. Supports HuggingFace IDs, local paths, or Oumi models.

output_format: str = 'gguf'#

‘gguf’ (llama.cpp), ‘safetensors’ (HF), ‘pytorch’.

Type:: Output format

output_path: str = 'quantized_model.gguf'#: Output file path. Extension should match format (.gguf, .safetensors, .pt).

verbose: bool = False#: Enable detailed progress logging.

class oumi.core.configs.RemoteParams(api_url: str | None = None, api_key: str | None = None, api_key_env_varname: str | None = None, max_retries: int = 3, retry_backoff_base: float = 1.0, retry_backoff_max: float = 30.0, connection_timeout: float = 300.0, num_workers: int = 1, politeness_policy: float = 0.0, batch_completion_window: str | None = '24h', use_adaptive_concurrency: bool = True)[source]#

Bases: BaseParams

Parameters for running inference against a remote API.

__post_init__()[source]#: Validate the remote parameters.

api_key: str | None = None#: API key to use for authentication.

api_key_env_varname: str | None = None#: Name of the environment variable containing the API key for authentication.

api_url: str | None = None#: URL of the API endpoint to use for inference.

batch_completion_window: str | None = '24h'#

Time window for batch completion. Currently only ‘24h’ is supported.

Only used for batch inference.

connection_timeout: float = 300.0#: Timeout in seconds for a request to an API.

max_retries: int = 3#: Maximum number of retries to attempt when calling an API.

num_workers: int = 1#: Number of workers to use for parallel inference.

politeness_policy: float = 0.0#

Politeness policy to use when calling an API.

If greater than zero, this is the amount of time in seconds a worker will sleep before making a subsequent request.

retry_backoff_base: float = 1.0#: Base delay in seconds for exponential backoff between retries.

retry_backoff_max: float = 30.0#: Maximum delay in seconds between retries.

use_adaptive_concurrency: bool = True#

Whether to use adaptive concurrency control.

If True, the number of concurrent requests will be adjusted based on the error rate of the requests. As error rate increases above a threshold, the number of concurrent requests will decrease, and as error rate decreases below a threshold, the number of concurrent requests will increase.

When this is enabled, users should set num_workers to the requests per minute (RPM/QPM) of the model/API, and the politeness_policy to 60s (as most APIs query limits are dictated by the number of requests per minute).

The lowest concurrency can be is 1, and the highest concurrency is num_workers. Updates to concurrency will happen no sooner than politeness_policy seconds after the last update, and at least 10 requests must have been made since the last update.

In the event that even 1 concurrency causes the error rate to exceed the threshold, it is recommended to increase the politeness_policy to allow more time between requests.

class oumi.core.configs.SampleAnalyzerParams(id: str = '???', config: dict[str, ~typing.Any] = <factory>)[source]#

Bases: BaseParams

Params for a single sample analyzer plugin.

config: dict[str, Any]#: Analyzer-specific configuration parameters.

id: str = '???'#: Unique identifier for the analyzer.

class oumi.core.configs.SchedulerType(value)[source]#

Bases: str, Enum

Enum representing the supported learning rate schedulers.

For optional args for each scheduler, see src/oumi/builders/lr_schedules.py.

CONSTANT = 'constant'#

Constant scheduler.

Keeps the learning rate constant throughout training.

COSINE = 'cosine'#

Cosine scheduler.

Decays the learning rate following the decreasing part of a cosine curve.

COSINE_WITH_MIN_LR = 'cosine_with_min_lr'#

Cosine with a minimum learning rate scheduler.

Similar to cosine scheduler, but maintains a minimum learning rate at the end.

COSINE_WITH_RESTARTS = 'cosine_with_restarts'#

Cosine with restarts scheduler.

Decays the learning rate following a cosine curve with periodic restarts.

LINEAR = 'linear'#

Linear scheduler.

Decreases the learning rate linearly from the initial value to 0 over the course of training.

class oumi.core.configs.ShardingStrategy(value)[source]#

Bases: str, Enum

The sharding strategies for FullyShardedDataParallel (FSDP).

See torch.distributed.fsdp.ShardingStrategy for more details.

FULL_SHARD = 'FULL_SHARD'#: Shards model parameters, gradients, and optimizer states. Provides the most memory efficiency but may impact performance.

HYBRID_SHARD = 'HYBRID_SHARD'#: Shards model parameters within a node and replicates them across nodes.

HYBRID_SHARD_ZERO2 = 'HYBRID_SHARD_ZERO2'#: Apply SHARD_GRAD_OP within a node, and replicate parameters across nodes.

NO_SHARD = 'NO_SHARD'#: No sharding is applied. Parameters, gradients, and optimizer states are kept in full on each GPU.

SHARD_GRAD_OP = 'SHARD_GRAD_OP'#: Shards gradients and optimizer states, but not model parameters. Balances memory savings and performance.

to_torch() → ShardingStrategy[source]#: Convert the enum to the corresponding torch_fsdp.ShardingStrategy.

class oumi.core.configs.StateDictType(value)[source]#

Bases: str, Enum

The supported state dict types for FullyShardedDataParallel (FSDP).

This controls how the model’s state dict will be saved during checkpointing, and how it can be consumed afterwards.

FULL_STATE_DICT = 'FULL_STATE_DICT'#

The state dict will be saved in a non-sharded, unflattened format.

This is similar to checkpointing without FSDP.

LOCAL_STATE_DICT = 'LOCAL_STATE_DICT'#

The state dict will be saved in a sharded, flattened format.

Since it’s flattened, this can only be used by FSDP.

SHARDED_STATE_DICT = 'SHARDED_STATE_DICT'#

The state dict will be saved in a sharded, unflattened format.

This can be used by other parallel schemes.

to_torch() → StateDictType[source]#: Converts to the corresponding torch.distributed.fsdp.StateDictType.

class oumi.core.configs.StorageMount(source: str = '???', store: str = '???')[source]#

Bases: object

A storage system mount to attach to a node.

source: str = '???'#

The remote path to mount the local path to (Required).

e.g. ‘gs://bucket/path’ for GCS, ‘s3://bucket/path’ for S3, or ‘r2://path’ for R2.

store: str = '???'#

The remote storage solution (Required).

Must be one of ‘s3’, ‘gcs’ or ‘r2’.

class oumi.core.configs.TelemetryParams(telemetry_dir: str | None = 'telemetry', collect_telemetry_for_all_ranks: bool = False, track_gpu_temperature: bool = False)[source]#

Bases: BaseParams

collect_telemetry_for_all_ranks: bool = False#

Whether to collect telemetry for all ranks.

By default, only the main rank’s telemetry stats are collected and saved.

telemetry_dir: str | None = 'telemetry'#

Directory where the telemetry data will be saved to.

If not specified, then telemetry files will be written under output_dir. If a relative path is specified, then files will be written in a telemetry_dir sub-directory in output_dir.

track_gpu_temperature: bool = False#

Whether to record GPU temperature.

If save_telemetry_for_all_ranks is False, only the first GPU’s temperature is tracked. Otherwise, temperature is recorded for all GPUs.

class oumi.core.configs.TrainerType(value)[source]#

Bases: Enum

Enum representing the supported trainers.

HF = 'hf'#

Generic HuggingFace trainer from transformers library.

This is the standard trainer provided by the Hugging Face Transformers library, suitable for a wide range of training tasks.

OUMI = 'oumi'#

Custom generic trainer implementation.

This is a custom trainer implementation specific to the Oumi project, designed to provide additional flexibility and features.

TRL_DPO = 'trl_dpo'#

Direct Preference Optimization trainer from trl library.

This trainer implements the Direct Preference Optimization algorithm for fine-tuning language models based on human preferences.

TRL_GRPO = 'trl_grpo'#

Group Relative Policy Optimization trainer from trl library.

This trainer implements the Group Relative Policy Optimization algorithm introduced in the paper https://arxiv.org/pdf/2402.03300 for fine-tuning language models. Optionally, supports user-defined reward functions.

TRL_SFT = 'trl_sft'#

Supervised fine-tuning trainer from trl library.

This trainer is specifically designed for supervised fine-tuning tasks using the TRL (Transformer Reinforcement Learning) library.

VERL_GRPO = 'verl_grpo'#

Group Relative Policy Optimization trainer from verl library.

This trainer implements the Group Relative Policy Optimization algorithm introduced in the paper https://arxiv.org/pdf/2402.03300 for fine-tuning language models. Optionally, supports user-defined reward functions.

class oumi.core.configs.TrainingConfig(data: oumi.core.configs.params.data_params.DataParams = <factory>, model: oumi.core.configs.params.model_params.ModelParams = <factory>, training: oumi.core.configs.params.training_params.TrainingParams = <factory>, peft: oumi.core.configs.params.peft_params.PeftParams = <factory>, fsdp: oumi.core.configs.params.fsdp_params.FSDPParams = <factory>)[source]#

Bases: BaseConfig

__post_init__()[source]#: Verifies/populates params.

data: DataParams#

Parameters for the dataset.

This field contains all the necessary settings for data processing and loading. It includes options for train and evaluation datasets and preprocessing steps.

For more details, see the oumi.core.configs.params.data_params.DataParams class.

fsdp: FSDPParams#: Parameters for FSDP.

model: ModelParams#

Parameters for the model.

This field defines the model architecture, size, and other model-specific settings. It includes options for model type, pretrained weights, and tokenizer configuration.

For more details, see oumi.core.configs.params.model_params.ModelParams class.

peft: PeftParams#

Parameters for Parameter-Efficient Fine-Tuning (PEFT).

This field defines settings for various PEFT methods such as LoRA, or Prefix Tuning. It includes options for rank, alpha values, and other PEFT-specific parameters.

For more details, see oumi.core.configs.params.peft_params.PeftParams.

training: TrainingParams#

Parameters for the training process.

This field contains all settings related to the training loop, including learning rate, batch size, number of epochs, and optimization parameters.

For more details, see oumi.core.configs.params.training_params.TrainingParams.

class oumi.core.configs.TrainingParams(use_peft: bool = False, trainer_type: oumi.core.configs.params.training_params.TrainerType = <TrainerType.HF: 'hf'>, enable_gradient_checkpointing: bool = False, gradient_checkpointing_kwargs: dict[str, typing.Any] = <factory>, output_dir: str = 'output', per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, gradient_accumulation_steps: int = 1, max_steps: int = -1, num_train_epochs: int = 3, save_epoch: bool = False, save_steps: int = 500, save_final_model: bool = True, seed: int = 42, data_seed: int = 42, use_deterministic: bool = False, full_determinism: bool = False, run_name: Optional[str] = None, metrics_function: Optional[str] = None, reward_functions: Optional[list[str]] = None, grpo: oumi.core.configs.params.grpo_params.GrpoParams = <factory>, log_level: str = 'info', dep_log_level: str = 'warning', enable_wandb: bool = False, enable_mlflow: bool = False, enable_tensorboard: bool = True, logging_strategy: str = 'steps', logging_dir: Optional[str] = None, logging_steps: int = 50, logging_first_step: bool = False, eval_strategy: str = 'no', eval_steps: int = 500, learning_rate: float = 5e-05, lr_scheduler_type: str = 'linear', lr_scheduler_kwargs: dict[str, typing.Any] = <factory>, warmup_ratio: Optional[float] = None, warmup_steps: Optional[int] = None, optimizer: str = 'adamw_torch', weight_decay: float = 0.0, adam_beta1: float = 0.9, adam_beta2: float = 0.999, adam_epsilon: float = 1e-08, sgd_momentum: float = 0.0, mixed_precision_dtype: oumi.core.configs.params.training_params.MixedPrecisionDtype = <MixedPrecisionDtype.NONE: 'none'>, compile: bool = False, include_performance_metrics: bool = False, include_alternative_mfu_metrics: bool = False, log_model_summary: bool = False, resume_from_checkpoint: Optional[str] = None, try_resume_from_last_checkpoint: bool = False, dataloader_num_workers: Union[int, str] = 0, dataloader_persistent_workers: bool = False, dataloader_prefetch_factor: Optional[int] = None, dataloader_main_process_only: Optional[bool] = None, ddp_find_unused_parameters: Optional[bool] = None, max_grad_norm: Optional[float] = 1.0, trainer_kwargs: dict[str, typing.Any] = <factory>, verl_config_overrides: dict[str, typing.Any] = <factory>, profiler: oumi.core.configs.params.profiler_params.ProfilerParams = <factory>, telemetry: oumi.core.configs.params.telemetry_params.TelemetryParams = <factory>, empty_device_cache_steps: Optional[int] = None, nccl_default_timeout_minutes: Optional[float] = None, label_ignore_index: Optional[int] = None)[source]#

Bases: BaseParams

__post_init__()[source]#: Verifies params.

adam_beta1: float = 0.9#

The beta1 parameter for Adam-based optimizers.

Exponential decay rate for the first moment estimates. Default is 0.9.

adam_beta2: float = 0.999#

The beta2 parameter for Adam-based optimizers.

Exponential decay rate for the second moment estimates. Default is 0.999.

adam_epsilon: float = 1e-08#

Epsilon parameter for Adam-based optimizers.

Small constant for numerical stability. Default is 1e-08.

compile: bool = False#

Whether to JIT compile the model.

This parameter should be used instead of ModelParams.compile for training.

data_seed: int = 42#: Random data_seed used for initialization. The seed to use for the underlying generator when using use_seedable_sampler. If None, the generator will use the current default seed from torch. Used only by the HuggingFace trainers.

dataloader_main_process_only: bool | None = None#

Controls whether the dataloader is iterated through on the main process only.

If set to True, the dataloader is only iterated through on the main process (rank 0), then the batches are split and broadcast to each process. This can reduce the number of requests to the dataset, and helps ensure that each example is seen by max one GPU per epoch, but may become a performance bottleneck if a large number of GPUs is used.

If set to False, the dataloader is iterated through on each GPU process.

If set to None (default), then True or False is auto-selected based on heuristics (properties of dataset, the number of nodes and/or GPUs, etc).

NOTE: We recommend to benchmark your setup, and configure True or False.

dataloader_num_workers: int | str = 0#

Number of subprocesses to use for data loading (PyTorch only). 0 means that the data will be loaded in the main process.

You can also use the special value “auto” to select the number of dataloader workers using a simple heuristic based on the number of CPU-s and GPU-s per node. Note that the accurate estimation of workers is difficult and depends on many factors (the properties of a model, dataset, VM, network, etc) so you can start with “auto” then experimentally tune the exact number to make it more optimal for your specific case. If “auto” is requested, then at minimum 1 worker is guaranteed to be assigned.

dataloader_persistent_workers: bool = False#: Whether to use persistent workers for data loading (HF Trainers only). If True, the data loader will not shut down the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. Can potentially speed up training, but will increase RAM usage. Will default to False.

dataloader_prefetch_factor: int | None = None#

Number of batches loaded in advance by each worker.

2 means there will be a total of 2 * num_workers batches prefetched across all workers.

This is only used if dataloader_num_workers >= 1.

ddp_find_unused_parameters: bool | None = None#

When using PyTorch’s DistributedDataParallel training, the value of this flag is passed to find_unused_parameters.

Will default to False if gradient checkpointing is used, True otherwise.

dep_log_level: str = 'warning'#

The logging level for dependency loggers (e.g., HuggingFace, PyTorch).

Possible values are “debug”, “info”, “warning”, “error”, “critical”.

empty_device_cache_steps: int | None = None#

Number of steps to wait before calling torch.<device>.empty_cache().

This parameter determines how frequently the GPU cache should be cleared during training. If set, it will trigger cache clearing every empty_device_cache_steps. If left as None, the cache will not be emptied automatically.

Setting this can help manage GPU memory usage, especially for large models or long training runs, but may impact performance if set too low.

enable_gradient_checkpointing: bool = False#

Whether to enable gradient checkpointing to save memory at the expense of speed.

Gradient checkpointing works by trading compute for memory. Rather than storing all intermediate activations of the entire computation graph for computing backward pass, it recomputes these activations during the backward pass. This can make the training slower, but it can also significantly reduce memory usage.

enable_mlflow: bool = False#

Whether to enable MLflow logging.

If True, MLflow will be used for experiment tracking and visualization. If you want to use MLflow, you must set the MLFLOW_TRACKING_URI environment variable to specify the tracking server URI and the MLFLOW_EXPERIMENT_ID or MLFLOW_EXPERIMENT_NAME environment variable to specify the experiment to report the run to.

enable_tensorboard: bool = True#

Whether to enable TensorBoard logging.

If True, TensorBoard will be used for logging metrics and visualizations.

enable_wandb: bool = False#

Whether to enable Weights & Biases (wandb) logging.

If True, wandb will be used for experiment tracking and visualization. Wandb will also log a summary of the training run, including hyperparameters, metrics, and other relevant information at the end of training.

After enabling, you must set the WANDB_API_KEY environment variable. Alternatively, you can use the wandb login command to authenticate.

eval_steps: int = 500#

Number of update steps between two evaluations if eval_strategy=”steps”.

Ignored if eval_strategy is not “steps”.

eval_strategy: str = 'no'#

The strategy to use for evaluation during training.

Possible values: - “no”: No evaluation is done during training. - “steps”: Evaluation is done every eval_steps. - “epoch”: Evaluation is done at the end of each epoch.

full_determinism: bool = False#: If True, enable_full_determinism() is called instead of set_seed() to ensure reproducible results in distributed training. This will only affect HF trainers. Important: this will negatively impact performance, so only use it for debugging.

gradient_accumulation_steps: int = 1#

Number of update steps to accumulate before performing a backward/update pass.

This technique allows for effectively larger batch sizes and is especially useful when such batch sizes would not fit in memory. This is achieved by accumulating gradients from multiple forward passes before performing a single optimization step. Setting this to >1 can increase however memory usage for training setups without existing gradient accumulation buffers (ex. 1-GPU training).

gradient_checkpointing_kwargs: dict[str, Any]#

Keyword arguments for gradient checkpointing.

The use_reentrant parameter is required and is recommended to be set to False. For more details, see: https://pytorch.org/docs/stable/checkpoint.html

grpo: GrpoParams#: Parameters for GRPO training.

include_alternative_mfu_metrics: bool = False#

Whether to report alternative MFU (Model FLOPs Utilization) metrics.

These metrics are based on HuggingFace’s total_flos. This option is only used if include_performance_metrics is True.

include_performance_metrics: bool = False#: Whether to include performance metrics such as token statistics.

label_ignore_index: int | None = None#

Tokens with this label value don’t contribute to the loss computation. For example, this can be PAD, or image tokens. -100 is the PyTorch convention. Refer to the ignore_index parameter of torch.nn.CrossEntropyLoss() for more details.

If unspecified (None), then the default model-specific preferences configured in Oumi may be used.

Users should only set label_ignore_index if the default behavior is not satisfactory, or for new models not yet fully-integrated by Oumi.

learning_rate: float = 5e-05#

The initial learning rate for the optimizer.

This value can be adjusted by the learning rate scheduler during training.

log_level: str = 'info'#

The logging level for the main Oumi logger.

Possible values are “debug”, “info”, “warning”, “error”, “critical”.

log_model_summary: bool = False#: Whether to print a model summary, including layer names.

logging_dir: str | None = None#

The directory where training logs will be saved.

This includes TensorBoard logs and other training-related output.

logging_first_step: bool = False#

Whether to log and evaluate the first global step.

If True, metrics will be logged and evaluation will be performed at the very beginning of training. Skipping the first step can be useful to avoid logging and evaluation of the initial random model.

The first step is usually not representative of the model’s performance, as it includes model compilation, optimizer initialization, and other setup steps.

logging_steps: int = 50#

Number of update steps between two logs if logging_strategy=”steps”.

Ignored if logging_strategy is not “steps”.

logging_strategy: str = 'steps'#

The strategy to use for logging during training.

Possible values are: - “steps”: Log every logging_steps steps. - “epoch”: Log at the end of each epoch. - “no”: Disable logging.

lr_scheduler_kwargs: dict[str, Any]#

Additional keyword arguments to pass to the learning rate scheduler.

These arguments can be used to fine-tune the behavior of the chosen scheduler.

lr_scheduler_type: str = 'linear'#

The type of learning rate scheduler to use.

Possible values include “linear”, “cosine”, “cosine_with_restarts”,: “cosine_with_min_lr” and “constant”.

See src/oumi/builders/lr_schedules.py for more details on each scheduler.

max_grad_norm: float | None = 1.0#

Maximum gradient norm (for gradient clipping) to avoid exploding gradients which can destabilize training.

Defaults to 1.0. When set to 0.0 or None gradient clipping will not be applied.

max_steps: int = -1#

If set to a positive number, the total number of training steps to perform.

This parameter overrides num_train_epochs. If set to -1 (default), the number of training steps is determined by num_train_epochs.

metrics_function: str | None = None#

The name of the metrics function in the Oumi registry to use for evaluation during training.

The method must accept as input a HuggingFace EvalPrediction and return a dictionary of metrics, with string keys mapping to metric values. A single metrics_function may compute multiple metrics.

mixed_precision_dtype: MixedPrecisionDtype = 'none'#

The data type to use for mixed precision training.

Default is NONE, which means no mixed precision is used.

nccl_default_timeout_minutes: float | None = None#

Default timeout for NCCL operations in minutes.

See: https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group

If unset, will use the default value of torch.distributed.init_process_group which is 10min.

num_train_epochs: int = 3#

Total number of training epochs to perform (if max_steps is not specified).

An epoch is one complete pass through the entire training dataset. This parameter is ignored if max_steps is set to a positive number.

optimizer: str = 'adamw_torch'#

The optimizer to use for training.

See pytorch documentation for more information on available optimizers: https://pytorch.org/docs/stable/optim.html

Default is “adamw_torch” (AdamW implemented by PyTorch).

output_dir: str = 'output'#

Directory where the output files will be saved.

This includes checkpoints, evaluation results, and any other artifacts produced during the training process.

per_device_eval_batch_size: int = 8#

Number of samples per batch on each device during evaluation.

Similar to per_device_train_batch_size, but used during evaluation phases. Can often be set higher than the train batch size as no gradients are stored.

per_device_train_batch_size: int = 8#

Number of samples per batch on each device during training.

This parameter directly affects memory usage and training speed. Larger batch sizes generally lead to better utilization of GPU compute capabilities but require more memory.

profiler: ProfilerParams#

Parameters for performance profiling.

This field contains configuration options for the profiler, which can be used to analyze the performance of the training process. It uses the ProfilerParams class to define specific profiling settings.

resume_from_checkpoint: str | None = None#

Path to a checkpoint folder from which to resume training.

If specified, training will resume by first loading the model from this folder.

reward_functions: list[str] | None = None#

The names of the reward function in the Oumi registry to use for reinforcement learning.

Only supported with the TRL_GRPO and VERL_GRPO trainers. Currently, VERL_GRPO only supports specifying a single reward function.

For TRL_GRPO, refer to https://huggingface.co/docs/trl/main/en/grpo_trainer for documentation about the function signature.

For VERL_GRPO, refer to https://verl.readthedocs.io/en/latest/preparation/reward_function.html for documentation about the function signature.

run_name: str | None = None#

A unique identifier for the current training run.

This name is used to identify the run in logging outputs, saved model checkpoints, and experiment tracking tools like Weights & Biases or TensorBoard. It’s particularly useful when running multiple experiments or when you want to easily distinguish between different training sessions.

save_epoch: bool = False#

Save a checkpoint at the end of every epoch.

When set to True, this ensures that a model checkpoint is saved after each complete pass through the training data. This can be useful for tracking model progress over time and for resuming training from a specific epoch if needed.

If both save_steps and save_epoch are set, then save_steps takes precedence.

save_final_model: bool = True#

Whether to save the model at the end of training.

For different options for saving PEFT models, see PeftParams.peft_save_mode. This should normally be set to True to ensure the final trained model is saved. However, in some cases, you may want to disable it, for example: - If saving a large model which takes a long time - When quickly testing training speed or metrics - During debugging or experimentation phases

save_steps: int = 500#

Save a checkpoint every save_steps training steps.

This parameter determines the frequency of saving checkpoints during training based on the number of steps. If both save_steps and save_epoch are set, then save_steps takes precedence.

To disable saving checkpoints during training, set save_steps to 0 and save_epoch to False. If enabled, a checkpoint will be saved at the end of training if there’s any residual steps left.

seed: int = 42#

Random seed used for initialization.

This seed is passed to the trainer and to all downstream dependencies to ensure reproducibility of results. It affects random number generation in various parts of the training process, including data shuffling, weight initialization, and any stochastic operations.

sgd_momentum: float = 0.0#

Momentum factor for SGD optimizer.

Only used when optimizer is set to “sgd”, and when trainer_type is set to OUMI. Default is 0.0.

telemetry: TelemetryParams#

Parameters for telemetry.

This field contains telemetry configuration options.

property telemetry_dir: Path | None#: Returns the telemetry stats output directory.

to_hf()[source]#: Converts Oumi config to HuggingFace’s TrainingArguments.

trainer_kwargs: dict[str, Any]#

Additional keyword arguments to pass to the HF/TRL Trainer.

This allows for customization of the Trainer beyond the standard parameters defined in this class. Any key-value pairs added here will be passed directly to the Trainer’s constructor. Note that this field is only used for HuggingFace and TRL trainers (TRL_SFT, TRL_DPO, TRL_GRPO, HF).

trainer_type: TrainerType = 'hf'#

The type of trainer to use for the training process.

Options are defined in the TrainerType enum and include: - HF: HuggingFace’s Trainer - TRL_SFT: TRL’s SFT Trainer - TRL_DPO: TRL’s DPO Trainer - TRL_GRPO: TRL’s GRPO Trainer - OUMI: Custom generic trainer implementation - VERL_GRPO: verl’s GRPO Trainer

try_resume_from_last_checkpoint: bool = False#

If True, attempt to resume from the last checkpoint in “output_dir”.

If a checkpoint is found, training will resume from the model/optimizer/scheduler states loaded from this checkpoint. If no checkpoint is found, training will continue without loading any intermediate checkpoints.

Note: If resume_from_checkpoint is specified and contains a non-empty path, this parameter has no effect.

use_deterministic: bool = False#: Whether to use deterministic algorithms for reproducibility. If set to True, this will only allow those CuDNN algorithms that are (believed to be) deterministic. Please refer to https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html for more details. If using distributed training, this will override ddp_find_unused_parameters to False and will also use ddp_broadcast_buffers, and disable gradient checkpointing. Note that this will not guarantee full reproducibility, but will help to reduce the variance between runs.

use_peft: bool = False#

Whether to use Parameter-Efficient Fine-Tuning (PEFT) techniques.

PEFT methods allow for efficient adaptation of pre-trained language models to specific tasks by only updating a small number of (extra) model parameters. This can significantly reduce memory usage and training time.

verl_config_overrides: dict[str, Any]#

Values to override in the verl config.

This field is only used for the VERL_GRPO trainer. To see supported params in verl, see: https://verl.readthedocs.io/en/latest/examples/config.html

The verl config is a nested dict, so the kwargs should be structured accordingly. For example, to set actor_rollout_ref.actor.use_kl_loss to True, you can use: {“actor_rollout_ref”: {“actor”: {“use_kl_loss”: True}}}.

The priority of setting verl config params, from highest to lowest, is: 1. Values specified by this field. 2. Values automatically set by Oumi in

src/oumi/core/trainers/verl_grpo_trainer.py:_create_config() for verl params which have corresponding Oumi params. For example, Oumi’s training.output_dir -> verl’s trainer.default_local_dir

Default verl config values in src/oumi/core/trainers/verl_trainer_config.yaml.

warmup_ratio: float | None = None#

The ratio of total training steps used for a linear warmup from 0 to the learning rate.

If set along with warmup_steps, this value will be ignored.

warmup_steps: int | None = None#

The number of steps for the warmup phase of the learning rate scheduler.

If set, will override the value of warmup_ratio.

weight_decay: float = 0.0#

Weight decay (L2 penalty) to apply to the model’s parameters.

In the HF trainers and the OUMI trainer, this is automatically applied to only weight tensors, and skips biases/layernorms.

Default is 0.0 (no weight decay).

oumi.core.configs

Contents

oumi.core.configs#

Subpackages#