Supported Models#

This page lists all the language models that can be used with Oumi. Thanks to the integration with the 🤗 Transformers library, you can easily use any of these models for training, evaluation, or inference.

Models prefixed with a checkmark (✅) have been thoroughly tested and validated by the Oumi community, with ready-to-use recipes available in the configs directory.

📚 Model Categories#

Instruct Models#

Model

Size

Paper

HF Hub

License

Open [1]

Recommended Parameters

✅ SmolLM-Instruct

135M/360M/1.7B

Blog

Hub

Apache 2.0

✅ Llama 3.1 Instruct

8B/70B/405B

Paper

Hub

License

✅ Llama 3.2 Instruct

1B/3B

Paper

Hub

License

✅ Llama 3.3 Instruct

70B

Paper

Hub

License

✅ Phi-3.5-Instruct

4B/14B

Paper

Hub

License

Qwen2.5-Instruct

0.5B-70B

Paper

Hub

License

OLMo 2 Instruct

7B

Paper

Hub

Apache 2.0

MPT-Instruct

7B

Blog

Hub

Apache 2.0

Command R

35B/104B

Blog

Hub

License

Granite-3.1-Instruct

2B/8B

Paper

Hub

Apache 2.0

Gemma 2 Instruct

2B/9B

Blog

Hub

License

DBRX-Instruct

130B MoE

Blog

Hub

Apache 2.0

Falcon-Instruct

7B/40B

Paper

Hub

Apache 2.0

Vision-Language Models#

Model

Size

Paper

HF Hub

License

Open [1]

Recommended Parameters

✅ Llama 3.2 Vision

11B

Paper

Hub

License

✅ LLaVA-1.5

7B

Paper

Hub

License

✅ Phi-3 Vision

4.2B

Paper

Hub

License

✅ BLIP-2

3.6B

Paper

Hub

MIT

✅ Qwen2-VL

2B

Blog

Hub

License

✅ SmolVLM-Instruct

2B

Blog

Hub

Apache 2.0

Base Models#

Model

Size

Paper

HF Hub

License

Open [1]

Recommended Parameters

✅ SmolLM2

135M/360M/1.7B

Blog

Hub

Apache 2.0

✅ Llama 3.2

1B/3B

Paper

Hub

License

✅ Llama 3.1

8B/70B/405B

Paper

Hub

License

✅ GPT-2

124M-1.5B

Paper

Hub

MIT

DeepSeek V2

7B/13B

Blog

Hub

License

Gemma2

2B/9B

Blog

Hub

License

GPT-J

6B

Blog

Hub

Apache 2.0

GPT-NeoX

20B

Paper

Hub

Apache 2.0

Mistral

7B

Paper

Hub

Apache 2.0

Mixtral

8x7B/8x22B

Blog

Hub

Apache 2.0

MPT

7B

Blog

Hub

Apache 2.0

OLMo

1B/7B

Paper

Hub

Apache 2.0

Reasoning Models#

Model

Size

Paper

HF Hub

License

Open [1]

Recommended Parameters

Qwen QwQ

32B

Blog

Hub

License

Code Models#

Model

Size

Paper

HF Hub

License

Open [1]

Recommended Parameters

✅ Qwen2.5 Coder

0.5B-32B

Blog

Hub

License

DeepSeek Coder

1.3B-33B

Paper

Hub

License

StarCoder 2

3B/7B/15B

Paper

Hub

License

Math Models#

Model

Size

Paper

HF Hub

License

Open [1]

Recommended Parameters

DeepSeek Math

7B

Paper

Hub

License

Additional Resources#