oumi.core.launcher#

Launcher module for the Oumi (Open Universal Machine Intelligence) library.

This module provides base classes for cloud and cluster management in the Oumi framework.

These classes serve as foundations for implementing cloud-specific and cluster-specific launchers for running machine learning jobs.

class oumi.core.launcher.BaseCloud[source]#

Bases: ABC

Base class for resource pool capable of creating clusters.

abstract get_cluster(name: str) BaseCluster | None[source]#

Gets the cluster with the specified name, or None if not found.

abstract list_clusters() list[BaseCluster][source]#

Lists the active clusters on this cloud.

abstract up_cluster(job: JobConfig, name: str | None, **kwargs) JobStatus[source]#

Creates a cluster and starts the provided Job.

class oumi.core.launcher.BaseCluster[source]#

Bases: ABC

Base class for a compute cluster (job queue).

abstract cancel_job(job_id: str) JobStatus[source]#

Cancels the specified job on this cluster.

abstract down() None[source]#

Tears down the current cluster.

abstract get_job(job_id: str) JobStatus[source]#

Gets the job on this cluster if it exists, else returns None.

abstract get_jobs() list[JobStatus][source]#

Lists the jobs on this cluster.

abstract name() str[source]#

Gets the name of the cluster.

abstract run_job(job: JobConfig) JobStatus[source]#

Runs the specified job on this cluster.

abstract stop() None[source]#

Stops the current cluster.

class oumi.core.launcher.JobStatus(name: str, id: str, status: str, cluster: str, metadata: str, done: bool)[source]#

Bases: object

Dataclass to hold the status of a job.

cluster: str#

The cluster to which the job belongs.

done: bool#

A flag indicating whether the job is done. True only if the job is in a terminal state (e.g. completed, failed, or canceled).

id: str#

The unique identifier of the job on the cluster

metadata: str#

Miscellaneous metadata about the job.

name: str#

The display name of the job.

status: str#

The status of the job.