<div class="align-center">
<a href="https://oumi.ai/"><img src="https://oumi.ai/docs/en/latest/_static/logo/header_logo.png" height="200"></a>

[![Documentation](https://img.shields.io/badge/Documentation-latest-blue.svg)](https://oumi.ai/docs/en/latest/index.html)
[![Discord](https://img.shields.io/discord/1286348126797430814?label=Discord)](https://discord.gg/oumi)
[![GitHub Repo stars](https://img.shields.io/github/stars/oumi-ai/oumi)](https://github.com/oumi-ai/oumi)
</div>

üëã Welcome to Open Universal Machine Intelligence (Oumi)!

üöÄ Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from [data preparation](https://oumi.ai/docs/en/latest/resources/datasets/datasets.html) and [training](hhttps://oumi.ai/docs/en/latest/user_guides/train/train.html) to [evaluation](https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html) and [deployment](https://oumi.ai/docs/en/latest/user_guides/launch/launch.html). Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.

ü§ù Make sure to join our [Discord community](https://discord.gg/oumi) to get help, share your experiences, and contribute to the project! If you are interested in joining one of the community's open-science efforts, check out our [open collaboration](https://oumi.ai/community) page.

‚≠ê If you like Oumi and you would like to support it, please give it a star on [GitHub](https://github.com/oumi-ai/oumi).

# Adapting NanoGPT

## Intro
The goal of this notebook is to show how to use a custom model with Oumi.

In this case, we will adapt [nanogpt](https://github.com/karpathy/nanoGPT), and train it with the HuggingFace training loop.

## Setup

This notebook assumes that you have already installed the `oumi` package. If you haven't, you can install it by running `!pip install oumi`.

We start then by cloning the nanoGPT repository, and adding nanoGPT to our python path


In [None]:
import sys
from pathlib import Path

module_folder = Path("/tmp/oumi/nanoGPT")

# Clone the nanoGPT repo
if not module_folder.is_dir():
    module_folder.mkdir(parents=True, exist_ok=True)
    !git clone https://github.com/karpathy/nanoGPT {module_folder}
else:
    print("nanoGPT already cloned!")

sys.path.append(str(module_folder))

Next we install the required dependencies. 

In [None]:
pip install -U -q tiktoken

## Adapting nanoGPT model

In [3]:
import torch.nn.functional as F
from model import GPT, GPTConfig  # import from ~/nanoGPT/model.py

from oumi.core import registry


@registry.register("oumi-nanoGPT", registry_type=registry.RegistryType.MODEL)
class OumiNanoGPT(GPT):
    def __init__(self, **kwargs):
        """Initializes an instance of the class."""
        gpt_config = GPTConfig()
        gpt_config.bias = False

        super().__init__(gpt_config)

    def forward(self, input_ids, labels=None, attention_mask=None):
        """Performs the forward pass of the model."""
        # Update the return format to be compatible with our Trainer.
        logits, loss = super().forward(idx=input_ids, targets=labels)
        outputs = {"logits": logits}
        if loss:
            outputs["loss"] = loss
        return outputs

    def criterion(self):
        """Returns the criterion used for calculating the loss."""
        return F.cross_entropy

## Training

Ok now we are ready to train our model! we can start from the default gpt2 config, and edit as needed.

In [4]:
import oumi
from oumi.core.configs import TrainingConfig

In [5]:
# Starting from the default GPT-2 config
config_path = "../configs/recipes/gpt2/pretraining/mac_train.yaml"
config = TrainingConfig.from_yaml(config_path)

config.training.output_dir = "nanogpt_tutorial"
# Update to use our newly registered nanoGPT model
config.model.model_name = "oumi-nanoGPT"  # needs to match the registered model name
# We do not have a custom tokenizer, but we can use the GPT-2 tokenizer from HuggingFace
config.model.tokenizer_name = "gpt2"
# These are needed specifically to get nanoGPT to work, and likely aren't needed for
# other custom models.
config.training.enable_tensorboard = False
config.training.save_steps = 0
config.training.save_final_model = False

In [None]:
oumi.train(config)