oumi.launcher.clusters#
Submodules#
oumi.launcher.clusters.local_cluster module#
- class oumi.launcher.clusters.local_cluster.LocalCluster(name: str, client: LocalClient)[source]#
Bases:
BaseCluster
A cluster implementation for running jobs locally.
- get_job(job_id: str) JobStatus | None [source]#
Gets the jobs on this cluster if it exists, else returns None.
oumi.launcher.clusters.polaris_cluster module#
- class oumi.launcher.clusters.polaris_cluster.PolarisCluster(name: str, client: PolarisClient)[source]#
Bases:
BaseCluster
A cluster implementation backed by Polaris.
- get_job(job_id: str) JobStatus | None [source]#
Gets the jobs on this cluster if it exists, else returns None.
- run_job(job: JobConfig) JobStatus [source]#
Runs the specified job on this cluster.
For Polaris this method consists of 5 parts:
Copy the working directory to /home/$USER/oumi_launcher/$JOB_NAME.
Check if there is a conda installation at /home/$USER/miniconda3/envs/oumi. If not, install it.
Copy all file mounts.
Create a job script with all env vars, setup, and run commands.
CD into the working directory and submit the job.
- Parameters:
job – The job to run.
- Returns:
The job status.
- Return type:
oumi.launcher.clusters.sky_cluster module#
oumi.launcher.clusters.slurm_cluster module#
- class oumi.launcher.clusters.slurm_cluster.SlurmCluster(name: str, client: SlurmClient)[source]#
Bases:
BaseCluster
A cluster implementation backed by a Slurm scheduler.
- class ConnectionInfo(hostname: str, user: str)[source]#
Bases:
object
Dataclass to hold information about a connection.
- hostname: str#
- property name#
Gets the name of the connection in the form user@hostname.
- user: str#
- get_job(job_id: str) JobStatus | None [source]#
Gets the jobs on this cluster if it exists, else returns None.
- static get_slurm_connections() list[ConnectionInfo] [source]#
Gets Slurm connections from the OUMI_SLURM_CONNECTIONS env variable.
- static parse_cluster_name(name: str) ConnectionInfo [source]#
Parses the cluster name into queue and user components.
- Parameters:
name – The name of the cluster.
- Returns:
The parsed cluster information.
- Return type:
_ConnectionInfo
- run_job(job: JobConfig) JobStatus [source]#
Runs the specified job on this cluster.
For Slurm this method consists of 5 parts:
Copy the working directory to ~/oumi_launcher/$JOB_NAME.
Check if there is a conda installation at /home/$USER/miniconda3/envs/oumi. If not, install it.
Copy all file mounts.
Create a job script with all env vars, setup, and run commands.
CD into the working directory and submit the job.
- Parameters:
job – The job to run.
- Returns:
The job status.
- Return type: