Quickstart

Quickstart#

📋 Prerequisites#

Let’s start by installing Oumi. You can easily install the latest stable version of Oumi with the following commands:

pip install oumi

# Optional: If you have an Nvidia or AMD GPU, you can install the GPU dependencies
pip install oumi[gpu]

If you need help setting up your environment (python, pip, git, etc), you can find detailed instructions in the Dev Environment Setup guide. The installation guide offers more details on how to install Oumi for your specific environment and use case.

👋 Introduction#

Now that we have Oumi installed, let’s get started with the basics! We’re going to use the oumi command-line interface (CLI) to train, evaluate, and run inference with a model.

We’ll use a small model (SmolLM-135M) so that the examples can run fast on both CPU and GPU. SmolLM is a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset. You can learn more about about them in this blog post.

For a full list of recipes, including larger models like Llama 3.2, you can explore the recipes page.

💻 Oumi CLI#

The general structure of Oumi CLI commands is:

oumi <command> [options]

For detailed help on any command, you can use the --help option:

oumi --help            # for general help
oumi <command> --help  # for command-specific help

The available commands are:

train
evaluate
infer
launch
judge

Let’s go through some examples of each command.

📚 Training#

You can quickly start training a model using any of existing recipes or your own custom configs. The following command will start training using the recipe in configs/recipes/smollm/sft/135m/quickstart_train.yaml:

oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml

Any Oumi command which takes a config path as an argument (train, evaluate, infer, etc.) can override parameters from the command line. See CLI Reference for more details. For example:

oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml \
  --training.max_steps 20 \
  --training.learning_rate 1e-4 \
  --data.train.datasets[0].shuffle true \
  --training.output_dir output/smollm-135m-sft

To run the same recipe on your own dataset (e.g., in our supported JSON or JSONL formats), you can override the dataset name and path. You can try this functionality out by downloading the alpaca_cleaned dataset manually via the huggingface CLI, then including that local path in your run.

huggingface-cli download yahma/alpaca-cleaned --repo-type dataset --local-dir /path/to/local/dataset

oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml \
  --data.train.datasets "[{dataset_name: text_sft, dataset_path: /path/to/local/dataset}]" \
  --training.output_dir output/smollm-135m-sft-custom

You can also train on multiple GPUs (make sure to install the GPU dependencies if not already installed).

For example, if you have a machine with 4 GPUs, you can run this command to launch a local distributed training run:

oumi distributed torchrun \
  -m oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml \
  --training.output_dir output/smollm-135m-sft-dist

You can also use torchrun directly in standalone mode.

torchrun --standalone --nproc-per-node 4 --log-dir ./logs \
-m oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml \
--training.output_dir output/smollm-135m-sft-dist

📊 Evaluation#

To evaluate a trained model:

Using a model downloaded from HuggingFace:

oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml \
  --model.model_name HuggingFaceTB/SmolLM2-135M-Instruct

Or, with our newly trained model saved on disk:

oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml \
  --model.model_name output/smollm135m.fft

If you saved your model to a different directory such as output/smollm-135m-sft-dist, you need only change --model.model_name.

To explore the benchmarks that our evaluations support, including HuggingFace leaderboards and AlpacaEval, visit our evaluation guide.

🧠 Inference#

To run inference with a trained model:

Using a model downloaded from HuggingFace:

oumi infer -c configs/recipes/smollm/inference/135m_infer.yaml \
  --generation.max_new_tokens 40 \
  --generation.temperature 0.7 \
  --interactive

Or, with our newly trained model saved on disk:

oumi infer -c configs/recipes/smollm/inference/135m_infer.yaml \
  --model.model_name output/smollm135m.fft \
  --generation.max_new_tokens 40 \
  --generation.temperature 0.7 \
  --interactive

To learn more about running inference locally or remotely (including OpenAI, Google, Anthropic APIs) and leveraging inference engines to parallelize and speed up your jobs, visit our inference guide.

☁️ Launching Jobs in the Cloud#

So far we have been using Oumi locally. But one of the most exciting and unique Oumi features, compared to similar frameworks, is its integrated ability to launch jobs directly to the cloud (GCP, AWS, Azure, etc).

This section of the quickstart is going to be a little different than the others, so please read this next bit carefully before you proceed.

This tutorial uses GCP; you’ll need a GCP account. You can also use other cloud providers, such as AWS, Azure, etc. See running jobs remotely for more details.

Launching your first cloud job with Oumi#

Once the one-time setup is out of the way, launching a new cloud job with Oumi is very simple.

oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml

To launch an evaluation job:

oumi launch up -c configs/recipes/smollm/evaluation/135m/quickstart_gcp_job.yaml

After you run one of the above commands, you should see some console output from Oumi which describes how your job is being provisioned and how the cloud installation is proceeding. In particular, your cluster will be assigned a semi-random name such as sky-7fdd-ab183, which you should take note of.

After 15 minutes or so, Oumi should tell you that the run is complete.

If you want to see the logs from your cloud run, you can pull them down to your local machine –

sky logs --sync-down sky-7fdd-ab183

Cloud services can be expensive! Please keep an eye on your costs, and don’t forget to tear down your cluster when you’re done with this tutorial.

sky down sky-7fdd-ab183

This command will destroy your cluster, including all data on those remote machines, so save your logs and artifacts first!

🧭 What’s next?#

Although this example used GCP, Oumi natively supports a wide range of cloud providers. To explore the Cloud providers that we support, visit running jobs remotely.

🔗 Community#

⭐ If you like Oumi and you would like to support it, please give it a star on GitHub.

👋 If you are interested in contributing, please read the Contributor’s Guide.