oumi.inference#
Inference module for the Oumi (Open Universal Machine Intelligence) library.
This module provides various implementations for running model inference.
- class oumi.inference.AnthropicInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#
Bases:
RemoteInferenceEngine
Engine for running inference against the Anthropic API.
This class extends RemoteInferenceEngine to provide specific functionality for interacting with Anthropic’s language models via their API. It handles the conversion of Oumi’s Conversation objects to Anthropic’s expected input format, as well as parsing the API responses back into Conversation objects.
- anthropic_version = '2023-06-01'#
The version of the Anthropic API to use.
For more information on Anthropic API versioning, see: https://docs.anthropic.com/claude/reference/versioning
- property api_key_env_varname: str | None#
Return the default environment variable name for the Anthropic API key.
- property base_url: str | None#
Return the default base URL for the Anthropic API.
- class oumi.inference.DeepSeekInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#
Bases:
RemoteInferenceEngine
Engine for running inference against the DeepSeek API.
Documentation: https://api-docs.deepseek.com
- api_key_env_varname: str | None = 'DEEPSEEK_API_KEY'#
The environment variable name for the DeepSeek API key.
- base_url: str | None = 'https://api.deepseek.com/v1/chat/completions'#
The base URL for the DeepSeek API.
- class oumi.inference.GoogleGeminiInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#
Bases:
RemoteInferenceEngine
Engine for running inference against Gemini API.
- api_key_env_varname: str | None = 'GEMINI_API_KEY'#
The environment variable name for the Gemini API key.
- base_url: str | None = 'https://generativelanguage.googleapis.com/v1beta/openai/chat/completions'#
The base URL for the Gemini API.
- class oumi.inference.GoogleVertexInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#
Bases:
RemoteInferenceEngine
Engine for running inference against Google Vertex AI.
- class oumi.inference.LlamaCppInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None)[source]#
Bases:
BaseInferenceEngine
Engine for running llama.cpp inference locally.
This class provides an interface for running inference using the llama.cpp library on local hardware. It allows for efficient execution of large language models with quantization, kv-caching, prefix filling, …
Note
This engine requires the llama-cpp-python package to be installed. If not installed, it will raise a RuntimeError.
Example
>>> from oumi.core.configs import ModelParams >>> from oumi.inference import LlamaCppInferenceEngine >>> model_params = ModelParams( ... model_name="path/to/model.gguf", ... model_kwargs={ ... "n_gpu_layers": -1, ... "n_threads": 8, ... "flash_attn": True ... } ... ) >>> engine = LlamaCppInferenceEngine(model_params) >>> # Use the engine for inference
- get_supported_params() set[str] [source]#
Returns a set of supported generation parameters for this engine.
- infer_from_file(input_filepath: str, inference_config: InferenceConfig | None = None) list[Conversation] [source]#
Runs model inference on inputs in the provided file.
This is a convenience method to prevent boilerplate from asserting the existence of input_filepath in the generation_params.
- Parameters:
input_filepath – Path to the input file containing prompts for generation.
inference_config – Parameters for inference.
- Returns:
Inference output.
- Return type:
List[Conversation]
- infer_online(input: list[Conversation], inference_config: InferenceConfig | None = None) list[Conversation] [source]#
Runs model inference online.
- Parameters:
input – A list of conversations to run inference on.
inference_config – Parameters for inference.
- Returns:
Inference output.
- Return type:
List[Conversation]
- class oumi.inference.NativeTextInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None)[source]#
Bases:
BaseInferenceEngine
Engine for running text-to-text model inference.
- get_supported_params() set[str] [source]#
Returns a set of supported generation parameters for this engine.
- infer_from_file(input_filepath: str, inference_config: InferenceConfig | None = None) list[Conversation] [source]#
Runs model inference on inputs in the provided file.
This is a convenience method to prevent boilerplate from asserting the existence of input_filepath in the generation_params.
- Parameters:
input_filepath – Path to the input file containing prompts for generation.
inference_config – Parameters for inference.
- Returns:
Inference output.
- Return type:
List[Conversation]
- infer_online(input: list[Conversation], inference_config: InferenceConfig | None = None) list[Conversation] [source]#
Runs model inference online.
- Parameters:
input – A list of conversations to run inference on.
inference_config – Parameters for inference.
- Returns:
Inference output.
- Return type:
List[Conversation]
- class oumi.inference.OpenAIInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#
Bases:
RemoteInferenceEngine
Engine for running inference against the OpenAI API.
- property api_key_env_varname: str | None#
Return the default environment variable name for the OpenAI API key.
- property base_url: str | None#
Return the default base URL for the OpenAI API.
- class oumi.inference.ParasailInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#
Bases:
RemoteInferenceEngine
Engine for running inference against the Parasail API.
- property api_key_env_varname: str | None#
Return the default environment variable name for the Parasail API key.
- property base_url: str | None#
Return the default base URL for the Parasail API.
- class oumi.inference.RemoteInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#
Bases:
BaseInferenceEngine
Engine for running inference against a server implementing the OpenAI API.
- api_key_env_varname: str | None = None#
The environment variable name for the API key.
- base_url: str | None = None#
The base URL for the remote API.
- get_batch_results(batch_id: str, conversations: list[Conversation]) list[Conversation] [source]#
Gets the results of a completed batch job.
- Parameters:
batch_id – The batch job ID
conversations – Original conversations used to create the batch
- Returns:
The processed conversations with responses
- Return type:
List[Conversation]
- Raises:
RuntimeError – If the batch failed or has not completed
- get_batch_status(batch_id: str) BatchInfo [source]#
Gets the status of a batch inference job.
- Parameters:
batch_id – The batch job ID
- Returns:
Current status of the batch job
- Return type:
BatchInfo
- get_supported_params() set[str] [source]#
Returns a set of supported generation parameters for this engine.
- infer_batch(conversations: list[Conversation], inference_config: InferenceConfig | None = None) str [source]#
Creates a new batch inference job.
- Parameters:
conversations – List of conversations to process in batch
inference_config – Parameters for inference
- Returns:
The batch job ID
- Return type:
str
- infer_from_file(input_filepath: str, inference_config: InferenceConfig | None = None) list[Conversation] [source]#
Runs model inference on inputs in the provided file.
This is a convenience method to prevent boilerplate from asserting the existence of input_filepath in the generation_params.
- Parameters:
input_filepath – Path to the input file containing prompts for generation.
inference_config – Parameters for inference.
- Returns:
Inference output.
- Return type:
List[Conversation]
- infer_online(input: list[Conversation], inference_config: InferenceConfig | None = None) list[Conversation] [source]#
Runs model inference online.
- Parameters:
input – A list of conversations to run inference on.
inference_config – Parameters for inference.
- Returns:
Inference output.
- Return type:
List[Conversation]
- class oumi.inference.RemoteVLLMInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#
Bases:
RemoteInferenceEngine
Engine for running inference against Remote vLLM.
- property api_key_env_varname: str | None#
Return the default environment variable name for the Remote vLLM API key.
- property base_url: str | None#
Return the default base URL for the Remote vLLM API.
- class oumi.inference.SGLangInferenceEngine(model_params: ModelParams, *, remote_params: RemoteParams | None = None, generation_params: GenerationParams | None = None)[source]#
Bases:
RemoteInferenceEngine
Engine for running SGLang inference.
- class oumi.inference.TogetherInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, remote_params: RemoteParams | None = None)[source]#
Bases:
RemoteInferenceEngine
Engine for running inference against the Together AI API.
- property api_key_env_varname: str | None#
Return the default environment variable name for the Together API key.
- property base_url: str | None#
Return the default base URL for the Together API.
- class oumi.inference.VLLMInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None, tensor_parallel_size: int = -1, quantization: str | None = None, enable_prefix_caching: bool = True, gpu_memory_utilization: float = 1.0, enforce_eager: bool = True, max_num_seqs: int | None = None)[source]#
Bases:
BaseInferenceEngine
Engine for running vLLM inference locally.
- get_supported_params() set[str] [source]#
Returns a set of supported generation parameters for this engine.
- infer_from_file(input_filepath: str, inference_config: InferenceConfig | None = None) list[Conversation] [source]#
Runs model inference on inputs in the provided file.
This is a convenience method to prevent boilerplate from asserting the existence of input_filepath in the generation_params.
- Parameters:
input_filepath – Path to the input file containing prompts for generation.
inference_config – Parameters for inference.
- Returns:
Inference output.
- Return type:
List[Conversation]
- infer_online(input: list[Conversation], inference_config: InferenceConfig | None = None) list[Conversation] [source]#
Runs model inference online.
- Parameters:
input – A list of conversations to run inference on.
inference_config – Parameters for inference.
- Returns:
Inference output.
- Return type:
List[Conversation]