Dev Environment Setup#
This guide will help you set up a development environment for contributing to Oumi. If you already have a working environment, you can skip to Set up Oumi.
1. Install Miniconda#
The simplest way to install Miniconda is to first clone the Oumi repository (step 2.2 below), then run:
make install-miniconda
Alternatively, install Miniconda from the Anaconda website.
2. Set up Oumi#
2.1 Fork the Oumi repository#
You can create a fork of Oumi by clicking the Fork button in the upper right of the repository. This will create a fork of Oumi associated with your GitHub account.
2.2 Clone your fork of the Oumi repository#
Now you’re ready to clone your fork to your local disk and set up the original repository as a remote:
git clone git@github.com:<your Github handle>/oumi.git
cd oumi
git remote add upstream https://github.com/oumi-ai/oumi.git
2.3 Create a development branch#
Warning
Do not make changes directly to the main
branch, even on your fork!
Your changes should live on a development branch so you can later create a Pull Request to merge your changes into main
.
git checkout -b the-name-of-your-branch
2.4 Install Oumi package and its dependencies#
This command creates a new Conda env, installs relevant packages, and installs pre-commit.
make setup
If you’d like to only run the pre-commits before a push, instead of every commit, you can run:
pre-commit uninstall
pre-commit install --install-hooks --hook-type pre-push
2.4.1 Optional dependencies#
Follow these instructions to install optional dependencies you may want depending on your use case.
2.5 [optional] Add an Oumi alias to your shell#
Add the following alias to {.zshrc or .bashrc}:
alias oumi-conda="cd ~/<YOUR_PATH>/oumi && conda activate oumi"
This will change your directory into the Oumi repo and activate the Oumi Conda environment. Test that this works with:
source ~/{.zshrc or .bashrc}
oumi-conda
3. [optional] Set up SkyPilot#
The Oumi launcher can be used to launch jobs on remote clusters. Our launcher integrates with SkyPilot to launch jobs on popular cloud providers (GCP, Lambda, etc.). To enable the Oumi launcher to run on your preferred cloud, make sure to follow the setup instructions in our launch guide.
4. [optional] Set up HuggingFace#
Oumi integrates with HuggingFace (HF) Hub for access to models and datasets. While most models and datasets are publicly accessible, some like Llama are gated, requiring you to be logged in and be approved for access.
Sign up for HuggingFace if you haven’t done so already.
Create a user access token. If you only need to read content from the Hub, create a
read
token. If you also plan to push datasets or models to the Hub, create awrite
token.Run the following to log in on your machine, using the token created in the previous step:
huggingface-cli login
This will save your token in the HF cache directory at
~/.cache/huggingface/token
. Oumi jobs mount this file to remote clusters to access gated content there. See this config for an example.
4.1 Getting access to Llama#
Llama models are gated on HF Hub. To gain access, sign the agreement on your desired Llama model’s Hub page. It usually takes a few hours to get access to the model after signing the agreement. There is a separate agreement for each version of Llama:
5. [optional] Set up Weights and Biases#
Oumi integrates with Weights and Biases (WandB) to track the results of training and evaluation runs. Run the following to log in on your machine:
wandb login
This will save your login info at ~/.netrc
. Oumi jobs mount this file to remote clusters to allow them to log to WandB. See this config for an example.
6. [optional] Set up VSCode#
We recommend using VSCode as the IDE. See our VSCode Integration guide for recommended setup instructions.
You can also use VSCode to run Jupyter notebooks in the Oumi repository. See our Notebook Integration guide for more information.
7. [optional] Test your setup#
To test that your setup is complete, you can run oumi launch up -c configs/recipes/llama3_1/sft/8b_lora/gcp_job.yaml --cluster llama8b-lora
. This requires step 4 (SkyPilot GCP), step 5 (HF), step 5.1 (Llama 3.1 access), and step 6 (WandB).