oumi.core.launcher#
Launcher module for the Oumi (Open Universal Machine Intelligence) library.
This module provides base classes for cloud and cluster management in the Oumi framework.
These classes serve as foundations for implementing cloud-specific and cluster-specific launchers for running machine learning jobs.
- class oumi.core.launcher.BaseCloud[source]#
Bases:
ABCBase class for resource pool capable of creating clusters.
- abstractmethod get_cluster(name: str) BaseCluster | None[source]#
Gets the cluster with the specified name, or None if not found.
- abstractmethod list_clusters() list[BaseCluster][source]#
Lists the active clusters on this cloud.
- class oumi.core.launcher.BaseCluster[source]#
Bases:
ABCBase class for a compute cluster (job queue).
- abstractmethod cancel_job(job_id: str) JobStatus[source]#
Cancels the specified job on this cluster.
- abstractmethod get_job(job_id: str) JobStatus[source]#
Gets the job on this cluster if it exists, else returns None.
- abstractmethod get_logs_stream(cluster_name: str, job_id: str | None = None) TextIOBase[source]#
Gets a stream that tails the logs of the target job.
- Parameters:
cluster_name – The name of the cluster the job was run in.
job_id – The ID of the job to tail the logs of. If unspecified, the most recent job will be used.
- class oumi.core.launcher.JobState(value)[source]#
Bases:
EnumEnum to hold the state of a job.
- CANCELLED = 'cancelled'#
- FAILED = 'failed'#
- PENDING = 'pending'#
- RUNNING = 'running'#
- SUCCEEDED = 'succeeded'#
- class oumi.core.launcher.JobStatus(name: str, id: str, status: str, cluster: str, metadata: str, done: bool, state: JobState, cost_per_hour: float | None = None, start_at: float | None = None, end_at: float | None = None)[source]#
Bases:
objectDataclass to hold the status of a job.
- cluster: str#
The cluster to which the job belongs.
- cost_per_hour: float | None = None#
The cost per hour of the cluster in USD. Includes all nodes. None if cost information is unavailable.
- done: bool#
A flag indicating whether the job is done. True only if the job is in a terminal state (e.g. completed, failed, or canceled).
- end_at: float | None = None#
Unix timestamp when the job completed. None if the job hasn’t completed yet or timing data is unavailable.
- id: str#
The unique identifier of the job on the cluster
- metadata: str#
Miscellaneous metadata about the job.
- name: str#
The display name of the job.
- start_at: float | None = None#
Unix timestamp when the job started running. None if the job hasn’t started yet or timing data is unavailable.
- state: JobState#
The state of the job. For more fine-grained information about the job, see the status field.
- status: str#
The status of the job.