oumi.launcher.clusters#

Submodules#

oumi.launcher.clusters.local_cluster module#

class oumi.launcher.clusters.local_cluster.LocalCluster(name: str, client: LocalClient)[source]#

Bases: BaseCluster

A cluster implementation for running jobs locally.

__eq__(other: Any) bool[source]#

Checks if two LocalClusters are equal.

cancel_job(job_id: str) JobStatus[source]#

Cancels the specified job on this cluster.

down() None[source]#

Cancels all jobs, running or queued.

get_job(job_id: str) JobStatus | None[source]#

Gets the jobs on this cluster if it exists, else returns None.

get_jobs() list[JobStatus][source]#

Lists the jobs on this cluster.

name() str[source]#

Gets the name of the cluster.

run_job(job: JobConfig) JobStatus[source]#

Runs the specified job on this cluster.

Parameters:

job – The job to run.

Returns:

The job status.

stop() None[source]#

Cancels all jobs, running or queued.

oumi.launcher.clusters.polaris_cluster module#

class oumi.launcher.clusters.polaris_cluster.PolarisCluster(name: str, client: PolarisClient)[source]#

Bases: BaseCluster

A cluster implementation backed by Polaris.

__eq__(other: Any) bool[source]#

Checks if two PolarisClusters are equal.

cancel_job(job_id: str) JobStatus[source]#

Cancels the specified job on this cluster.

down() None[source]#

This is a no-op for Polaris clusters.

get_job(job_id: str) JobStatus | None[source]#

Gets the jobs on this cluster if it exists, else returns None.

get_jobs() list[JobStatus][source]#

Lists the jobs on this cluster.

name() str[source]#

Gets the name of the cluster.

run_job(job: JobConfig) JobStatus[source]#

Runs the specified job on this cluster.

For Polaris this method consists of 5 parts:

  1. Copy the working directory to /home/$USER/oumi_launcher/$JOB_NAME.

  2. Check if there is a conda installation at /home/$USER/miniconda3/envs/oumi. If not, install it.

  3. Copy all file mounts.

  4. Create a job script with all env vars, setup, and run commands.

  5. CD into the working directory and submit the job.

Parameters:

job – The job to run.

Returns:

The job status.

Return type:

JobStatus

stop() None[source]#

This is a no-op for Polaris clusters.

oumi.launcher.clusters.sky_cluster module#

class oumi.launcher.clusters.sky_cluster.SkyCluster(name: str, client: SkyClient)[source]#

Bases: BaseCluster

A cluster implementation backed by Sky Pilot.

__eq__(other: Any) bool[source]#

Checks if two SkyClusters are equal.

cancel_job(job_id: str) JobStatus[source]#

Cancels the specified job on this cluster.

down() None[source]#

Tears down the current cluster.

get_job(job_id: str) JobStatus | None[source]#

Gets the jobs on this cluster if it exists, else returns None.

get_jobs() list[JobStatus][source]#

Lists the jobs on this cluster.

name() str[source]#

Gets the name of the cluster.

run_job(job: JobConfig) JobStatus[source]#

Runs the specified job on this cluster.

stop() None[source]#

Stops the current cluster.