oumi.launcher.clouds#
Clouds module for the Oumi (Open Universal Machine Intelligence) library.
This module provides implementations for various cloud platforms that can be used with the Oumi launcher for running and managing jobs.
Example
>>> from oumi.launcher import JobConfig, JobResources
>>> from oumi.launcher.clouds import LocalCloud
>>> local_cloud = LocalCloud()
>>> job_resources = JobResources(cloud="local")
>>> job_config = JobConfig(
... name="my_job", resources=job_resources, run="python train.py"
... )
>>> job_status = local_cloud.up_cluster(job_config, name="my_cluster")
Note
Ensure that you have the necessary credentials and configurations set up for the cloud platform you intend to use.
- class oumi.launcher.clouds.LocalCloud[source]#
Bases:
BaseCloud
A resource pool for managing Local jobs.
It is important to note that a single LocalCluster can only run one job at a time. Running multiple GPU jobs simultaneously on separate LocalClusters is not encouraged.
- get_cluster(name) BaseCluster | None [source]#
Gets the cluster with the specified name, or None if not found.
- list_clusters() list[BaseCluster] [source]#
Lists the active clusters on this cloud.
- class oumi.launcher.clouds.PolarisCloud[source]#
Bases:
BaseCloud
A resource pool for managing the Polaris ALCF job queues.
- get_cluster(name) BaseCluster | None [source]#
Gets the cluster with the specified name, or None if not found.
- initialize_clusters(user) list[BaseCluster] [source]#
Initializes clusters for the specified user for all Polaris queues.
- Parameters:
user – The user to initialize clusters for.
- Returns:
The list of initialized clusters.
- Return type:
List[PolarisCluster]
- list_clusters() list[BaseCluster] [source]#
Lists the active clusters on this cloud.
- class oumi.launcher.clouds.SkyCloud(cloud_name: str, client: SkyClient)[source]#
Bases:
BaseCloud
A resource pool capable of creating clusters using Sky Pilot.
- get_cluster(name) BaseCluster | None [source]#
Gets the cluster with the specified name, or None if not found.
- list_clusters() list[BaseCluster] [source]#
Lists the active clusters on this cloud.
- class oumi.launcher.clouds.SlurmCloud[source]#
Bases:
BaseCloud
A resource pool for managing jobs in Slurm clusters.
- get_cluster(name) BaseCluster | None [source]#
Gets the cluster with the specified name, or None if not found.
- initialize_clusters() list[BaseCluster] [source]#
Initializes clusters for the specified user for all Slurm queues.
- Returns:
The list of initialized clusters.
- Return type:
List[SlurmCluster]
- list_clusters() list[BaseCluster] [source]#
Lists the active clusters on this cloud.