oumi.inference

oumi.inference#

Inference module for the Oumi (Open Universal Machine Intelligence) library.

This module provides various implementations for running model inference.

class oumi.inference.AnthropicInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#

Bases: RemoteInferenceEngine

Engine for running inference against the Anthropic API.

This class extends RemoteInferenceEngine to provide specific functionality for interacting with Anthropic’s language models via their API. It handles the conversion of Oumi’s Conversation objects to Anthropic’s expected input format, as well as parsing the API responses back into Conversation objects.

anthropic_version = '2023-06-01'#

The version of the Anthropic API to use.

For more information on Anthropic API versioning, see: https://docs.anthropic.com/claude/reference/versioning

property api_key_env_varname: str | None#: Return the default environment variable name for the Anthropic API key.

property base_url: str | None#: Return the default base URL for the Anthropic API.

get_supported_params() → set[str][source]#: Returns a set of supported generation parameters for this engine.

class oumi.inference.DeepSeekInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#

Bases: RemoteInferenceEngine

Engine for running inference against the DeepSeek API.

Documentation: https://api-docs.deepseek.com

api_key_env_varname: str | None = 'DEEPSEEK_API_KEY'#: The environment variable name for the DeepSeek API key.

base_url: str | None = 'https://api.deepseek.com/v1/chat/completions'#: The base URL for the DeepSeek API.

class oumi.inference.GoogleGeminiInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#

Bases: RemoteInferenceEngine

Engine for running inference against Gemini API.

api_key_env_varname: str | None = 'GEMINI_API_KEY'#: The environment variable name for the Gemini API key.

base_url: str | None = 'https://generativelanguage.googleapis.com/v1beta/openai/chat/completions'#: The base URL for the Gemini API.

get_supported_params() → set[str][source]#: Returns a set of supported generation parameters for this engine.

infer_batch(conversations: list[Conversation], inference_config: dict[str, Any]) → str[source]#

Run inference on a batch of conversations.

Parameters:

conversations – The batch of conversations to infer on.
inference_config – The inference configuration.

Returns:

The batch ID.

Return type:

str

class oumi.inference.GoogleVertexInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None, project_id_env_key: str | None = None, region_env_key: str | None = None, project_id: str | None = None, region: str | None = None)[source]#

Bases: RemoteInferenceEngine

Engine for running inference against Google Vertex AI.

get_supported_params() → set[str][source]#: Returns a set of supported generation parameters for this engine.

class oumi.inference.LambdaInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#

Bases: RemoteInferenceEngine

Engine for running inference against the Lambda AI API.

This class extends RemoteInferenceEngine to provide specific functionality for interacting with Lambda AI’s language models via their API. It handles the conversion of Oumi’s Conversation objects to Lambda AI’s expected input format, as well as parsing the API responses back into Conversation objects.

property api_key_env_varname: str | None#: Return the default environment variable name for the Lambda AI API key.

property base_url: str | None#: Return the default base URL for the Lambda AI API.

class oumi.inference.LlamaCppInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None)[source]#

Bases: BaseInferenceEngine

Engine for running llama.cpp inference locally.

This class provides an interface for running inference using the llama.cpp library on local hardware. It allows for efficient execution of large language models with quantization, kv-caching, prefix filling, …

Note

This engine requires the llama-cpp-python package to be installed. If not installed, it will raise a RuntimeError.

Example

>>> from oumi.core.configs import ModelParams
>>> from oumi.inference import LlamaCppInferenceEngine
>>> model_params = ModelParams(
...     model_name="path/to/model.gguf",
...     model_kwargs={
...         "n_gpu_layers": -1,
...         "n_threads": 8,
...         "flash_attn": True
...     }
... )
>>> engine = LlamaCppInferenceEngine(model_params)
>>> # Use the engine for inference

get_supported_params() → set[str][source]#: Returns a set of supported generation parameters for this engine.

infer_from_file(input_filepath: str, inference_config: InferenceConfig | None = None) → list[Conversation][source]#

Runs model inference on inputs in the provided file.

This is a convenience method to prevent boilerplate from asserting the existence of input_filepath in the generation_params.

Parameters:

input_filepath – Path to the input file containing prompts for generation.
inference_config – Parameters for inference.

Returns:

Inference output.

Return type:

List[Conversation]

infer_online(input: list[Conversation], inference_config: InferenceConfig | None = None) → list[Conversation][source]#

Runs model inference online.

Parameters:

input – A list of conversations to run inference on.
inference_config – Parameters for inference.

Returns:

Inference output.

Return type:

List[Conversation]

class oumi.inference.NativeTextInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None)[source]#

Bases: BaseInferenceEngine

Engine for running text-to-text model inference.

get_supported_params() → set[str][source]#: Returns a set of supported generation parameters for this engine.

infer_from_file(input_filepath: str, inference_config: InferenceConfig | None = None) → list[Conversation][source]#

Runs model inference on inputs in the provided file.

This is a convenience method to prevent boilerplate from asserting the existence of input_filepath in the generation_params.

Parameters:

input_filepath – Path to the input file containing prompts for generation.
inference_config – Parameters for inference.

Returns:

Inference output.

Return type:

List[Conversation]

infer_online(input: list[Conversation], inference_config: InferenceConfig | None = None) → list[Conversation][source]#

Runs model inference online.

Parameters:

input – A list of conversations to run inference on.
inference_config – Parameters for inference.

Returns:

Inference output.

Return type:

List[Conversation]

class oumi.inference.OpenAIInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#

Bases: RemoteInferenceEngine

Engine for running inference against the OpenAI API.

property api_key_env_varname: str | None#: Return the default environment variable name for the OpenAI API key.

property base_url: str | None#: Return the default base URL for the OpenAI API.

class oumi.inference.ParasailInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#

Bases: RemoteInferenceEngine

Engine for running inference against the Parasail API.

property api_key_env_varname: str | None#: Return the default environment variable name for the Parasail API key.

property base_url: str | None#: Return the default base URL for the Parasail API.

class oumi.inference.RemoteInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#

Bases: BaseInferenceEngine

Engine for running inference against a server implementing the OpenAI API.

api_key_env_varname: str | None = None#: The environment variable name for the API key.

base_url: str | None = None#: The base URL for the remote API.

delete_file(file_id: str) → bool[source]#: Deletes a file.

get_batch_api_url() → str[source]#: Returns the URL for the batch API.

get_batch_results(batch_id: str, conversations: list[Conversation]) → list[Conversation][source]#

Gets the results of a completed batch job.

Parameters:

batch_id – The batch job ID
conversations – Original conversations used to create the batch

Returns:

The processed conversations with responses

Return type:

List[Conversation]

Raises:

RuntimeError – If the batch failed or has not completed

get_batch_status(batch_id: str) → BatchInfo[source]#

Gets the status of a batch inference job.

Parameters:: batch_id – The batch job ID
Returns:: Current status of the batch job
Return type:: BatchInfo

get_file(file_id: str) → FileInfo[source]#: Gets information about a file.

get_file_api_url() → str[source]#: Returns the URL for the file API.

get_file_content(file_id: str) → str[source]#: Gets a file’s content.

get_supported_params() → set[str][source]#: Returns a set of supported generation parameters for this engine.

infer_batch(conversations: list[Conversation], inference_config: InferenceConfig | None = None) → str[source]#

Creates a new batch inference job.

Parameters:

conversations – List of conversations to process in batch
inference_config – Parameters for inference

Returns:

The batch job ID

Return type:

str

infer_from_file(input_filepath: str, inference_config: InferenceConfig | None = None) → list[Conversation][source]#

Runs model inference on inputs in the provided file.

This is a convenience method to prevent boilerplate from asserting the existence of input_filepath in the generation_params.

Parameters:

input_filepath – Path to the input file containing prompts for generation.
inference_config – Parameters for inference.

Returns:

Inference output.

Return type:

List[Conversation]

infer_online(input: list[Conversation], inference_config: InferenceConfig | None = None) → list[Conversation][source]#

Runs model inference online.

Parameters:

input – A list of conversations to run inference on.
inference_config – Parameters for inference.

Returns:

Inference output.

Return type:

List[Conversation]

list_batches(after: str | None = None, limit: int | None = None) → BatchListResponse[source]#

Lists batch jobs.

Parameters:

after – Cursor for pagination (batch ID to start after)
limit – Maximum number of batches to return (1-100)

Returns:

List of batch jobs

Return type:

BatchListResponse

list_files(purpose: str | None = None, limit: int | None = None, order: str = 'desc', after: str | None = None) → FileListResponse[source]#: Lists files.

class oumi.inference.RemoteVLLMInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#

Bases: RemoteInferenceEngine

Engine for running inference against Remote vLLM.

property api_key_env_varname: str | None#: Return the default environment variable name for the Remote vLLM API key.

property base_url: str | None#: Return the default base URL for the Remote vLLM API.

get_supported_params() → set[str][source]#: Returns a set of supported generation parameters for this engine.

class oumi.inference.SGLangInferenceEngine(model_params: ModelParams, *, remote_params: RemoteParams | None = None, generation_params: GenerationParams | None = None)[source]#

Bases: RemoteInferenceEngine

Engine for running SGLang inference.

get_supported_params() → set[str][source]#: Returns a set of supported generation parameters for this engine.

class oumi.inference.SambanovaInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#

Bases: RemoteInferenceEngine

Engine for running inference against the SambaNova API.

This class extends RemoteInferenceEngine to provide specific functionality for interacting with SambaNova’s language models via their API. It handles the conversion of Oumi’s Conversation objects to SambaNova’s expected input format, as well as parsing the API responses back into Conversation objects.

property api_key_env_varname: str | None#: Return the default environment variable name for the SambaNova API key.

property base_url: str | None#: Return the default base URL for the SambaNova API.

get_supported_params() → set[str][source]#: Returns a set of supported generation parameters for this engine.

class oumi.inference.TogetherInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#

Bases: RemoteInferenceEngine

Engine for running inference against the Together AI API.

property api_key_env_varname: str | None#: Return the default environment variable name for the Together API key.

property base_url: str | None#: Return the default base URL for the Together API.

class oumi.inference.VLLMInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, tensor_parallel_size: int = -1, quantization: str | None = None, enable_prefix_caching: bool = True, gpu_memory_utilization: float = 0.9, enforce_eager: bool = True, max_num_seqs: int | None = None)[source]#

Bases: BaseInferenceEngine

Engine for running vLLM inference locally.

get_supported_params() → set[str][source]#: Returns a set of supported generation parameters for this engine.

infer_from_file(input_filepath: str, inference_config: InferenceConfig | None = None) → list[Conversation][source]#

Runs model inference on inputs in the provided file.

This is a convenience method to prevent boilerplate from asserting the existence of input_filepath in the generation_params.

Parameters:

input_filepath – Path to the input file containing prompts for generation.
inference_config – Parameters for inference.

Returns:

Inference output.

Return type:

List[Conversation]

infer_online(input: list[Conversation], inference_config: InferenceConfig | None = None) → list[Conversation][source]#

Runs model inference online.

Parameters:

input – A list of conversations to run inference on.
inference_config – Parameters for inference.

Returns:

Inference output.

Return type:

List[Conversation]

oumi.inference

Contents

oumi.inference#