MCP Server

MCP Server#

Experimental

The Oumi MCP server is under active development. Tools and resources may change.

Oumi ships an MCP (Model Context Protocol) server that lets an MCP-capable assistant — Claude Desktop, Claude Code, Cursor, and others — discover Oumi’s ~500 ready-to-use YAML configs, launch and monitor training / eval / inference jobs (local or cloud), and read built-in workflow guidance, all without leaving the chat.

It’s installed as a separate extra and launched as a standalone stdio process. The client spawns it; you never run it by hand.

When to Use It#

The MCP server is a productivity layer on top of the normal oumi CLI. Reach for it when you want to:

Find a config: “Find me a config for LoRA fine-tuning Llama 3.1 8B on an A100.”
Understand a config: “Explain what this config does and flag anything risky.”
Launch a job from chat: “Run oumi train on this config locally / on GCP.”
Babysit a running job: “Poll the job every minute and tell me when it finishes.”
Look up API docs: “How do I configure FSDP2 in Oumi?”

If you’d rather run commands yourself, the CLI is always available and documented in CLI Reference.

Installation#

pip install "oumi[mcp]"

This pulls in fastmcp, the mcp package, and httpx. Two entry points are installed:

oumi-mcp           # console script — use this one in client configs
python -m oumi.mcp # equivalent, if you prefer module invocation

Both start the server on stdio.

Note

Path-sensitive tools (job launch, pre-flight, validation) rely on you having a normal Oumi environment wherever the server runs. To launch cloud jobs, also install the per-provider extras (oumi[aws], oumi[azure], oumi[gcp], oumi[kubernetes], oumi[lambda], oumi[nebius], oumi[runpod]) and complete sky check per Running Jobs on Clusters.

Connecting from an MCP Client#

Every MCP client lets you register servers via JSON. Point them at the oumi-mcp script. Examples:

Claude Desktop#

On macOS, edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "oumi": {
      "command": "oumi-mcp"
    }
  }
}

Then fully restart Claude Desktop. On Linux the path is ~/.config/Claude/claude_desktop_config.json; on Windows it’s %APPDATA%\Claude\claude_desktop_config.json.

Claude Code#

From the project directory:

claude mcp add oumi oumi-mcp

Or add it manually to ~/.claude.json under mcpServers.

Cursor#

Settings → MCP Servers → add:

{
  "oumi": {
    "command": "oumi-mcp"
  }
}

Using a Specific Python Environment#

If oumi-mcp isn’t on your client’s PATH, use an absolute path or wrap it in the environment where Oumi is installed:

{
  "mcpServers": {
    "oumi": {
      "command": "/Users/you/.venvs/oumi/bin/oumi-mcp"
    }
  }
}

Tip

oumi-mcp needs credentials (HF_TOKEN, WANDB_API_KEY, GOOGLE_APPLICATION_CREDENTIALS, etc.) in its process environment. The most reliable way is the MCP config’s own env block, which every stdio-launching client supports:

{
  "mcpServers": {
    "oumi": {
      "command": "oumi-mcp",
      "env": {
        "HF_TOKEN": "hf_...",
        "WANDB_API_KEY": "..."
      }
    }
  }
}

Shell rc files (~/.zshrc, ~/.bashrc) only cover terminal-launched clients like Claude Code — macOS GUI apps (Claude Desktop, Cursor from the Dock) are started by launchd and never read them. Don’t commit secrets inside the client config; reference them from a local secret store or keep the config out of version control.

What’s Exposed#

The server surfaces two kinds of MCP primitives:

Tools#

Assistant-callable functions. The assistant should call get_started first — it returns a detailed catalog, path rules, and the recommended order of operations.

Tool	Purpose
`get_started`	Call first. Returns the full tool catalog, path rules, and workflow guidance.
`search_configs`	Fuzzy-search the ~500 bundled YAML configs by path, filename, and content.
`get_config`	Fetch one config’s path, model, dataset, and raw YAML — for use as a reference.
`list_categories`	Browse available config categories, model families, and API providers.
`validate_config`	Validate a local YAML config against its schema before launch.
`pre_flight_check`	Check HF auth, hardware, local paths, and provider setup before launching.
`run_oumi_job`	Execute an Oumi command locally or on cloud. Dry-run by default.
`get_job_status`	Snapshot status for a tracked job.
`get_job_logs`	Tail logs for a job.
`list_jobs`	Inventory of running and completed jobs (filter with `status=`).
`cancel_job`	Cancel a running job.
`stop_cluster`	Stop a cluster (preserves infra, reduces compute cost).
`down_cluster`	Fully delete a cluster (halts all billing). Destructive.
`get_docs`	Search indexed Oumi Python API docs.
`list_modules`	List modules available for `get_docs` searches.

Resources#

Read-only content the assistant can fetch by URI.

Workflow guidance (MLE playbooks):

URI	Content
`guidance://mle-workflow`	Overall ML engineering workflow with Oumi.
`guidance://mle-train`	Guidance for `oumi train`.
`guidance://mle-synth`	Guidance for `oumi synth`.
`guidance://mle-analyze`	Guidance for `oumi analyze`.
`guidance://mle-eval`	Guidance for `oumi evaluate`.
`guidance://mle-infer`	Guidance for `oumi infer`.
`guidance://cloud-launch`	Anatomy of a cloud job config and setup patterns.
`guidance://post-training`	Post-training steps (download weights, eval, teardown).

Job state (live):

URI	Content
`jobs://running`	JSON list of currently running jobs.
`jobs://completed`	JSON list of recent completed / failed / cancelled jobs.
`jobs://{job_id}/logs`	Plain-text log output for a specific job.

Path Handling#

There are two path surfaces, and they do not overlap:

Bundled config library. search_configs, get_config, and list_categories browse the ~500 YAML files shipped with the oumi package. get_config accepts substring queries against those config paths (e.g. "llama3_1/sft/8b_lora"); it is read-only and its results should be treated as references — copy/adapt them into your own project, don’t pass library paths to path-sensitive tools.
Your project files. run_oumi_job, validate_config, and pre_flight_check take local filesystem paths (absolute, or relative to client_cwd). They do not resolve oumi:// URIs or library config names — you must point them at a YAML file that exists on disk.

Assistants handle client_cwd automatically — get_started returns the full rules the model needs, and the assistant passes your project root on every path-sensitive call. You don’t configure anything.

Safety#

Because run_oumi_job executes real commands, the server is deliberately cautious:

Dry-run by default. run_oumi_job previews the command and exits unless the caller explicitly passes dry_run=False.
Pre-flight on cloud launches. Cloud runs invoke pre_flight_check automatically (HF auth, gated repo access, hardware, local paths, SkyPilot compatibility). Blocking issues prevent launch.
Strict YAML parsing. run_oumi_job rejects configs that fail schema validation.
Destructive actions require confirmation. down_cluster requires a confirm=True flag and a typed confirmation string; cancel_job has a force flag that’s opt-in.

An assistant that has been told to “just run it” still goes through these gates — use that to your advantage when delegating long-running work.

A Typical Session#

Here’s what a full cloud-training flow looks like through a well-behaved assistant. You don’t type any of this; the assistant does.

get_started() — fetch the current tool catalog and workflow.
search_configs(query=["llama", "lora", "8b"]) — find candidate configs.
get_config(path=...) — inspect the most promising one as a reference.
Build a job config (resources + setup + run block) for the target cloud — see guidance://cloud-launch.
validate_config(config="configs/my_job.yaml", task_type="job", client_cwd=...) — catch schema errors early.
pre_flight_check(config="configs/my_job.yaml", client_cwd=..., cloud="gcp") — catch auth / hardware / path issues.
run_oumi_job(config_path="configs/my_job.yaml", command="train", client_cwd=..., cloud="gcp", cluster_name="exp-1") — dry-run first, then re-call with dry_run=False after you confirm.
get_job_status(job_id=..., cloud="gcp", cluster_name="exp-1") — poll.
stop_cluster(cloud="gcp", cluster_name="exp-1") — pause to save cost while you review results, or down_cluster(...) to shut everything down.

For local runs, drop the cloud/cluster_name arguments and point at a training config directly — there’s no job-config requirement.

Troubleshooting#

The assistant says it can’t see the Oumi tools#

Run oumi-mcp in a terminal to confirm the process starts without errors. It has no CLI flags — it blocks on stdio and logs to stderr; kill it with Ctrl-C.
Check the client’s MCP logs. Claude Desktop logs to ~/Library/Logs/Claude/mcp*.log on macOS.
Verify the client is spawning oumi-mcp with the Python environment where Oumi is installed. If which oumi-mcp only works inside a virtualenv, pass an absolute path in the client config.
Fully restart the client after editing its config.

Logs go where?#

The server uses stdio: MCP protocol traffic is on stdout, and Oumi’s own logs are written to stderr at INFO level. Most clients surface stderr in their MCP log file. To see logs interactively, launch the server in a terminal (oumi-mcp) and watch stderr — the client won’t start its own instance if one is already attached, so use this only for ad-hoc inspection, not as a permanent setup.

Cloud launches fail pre-flight#

Run sky check and follow the provider-specific setup in Running Jobs on Clusters. pre_flight_check surfaces the same issues but sky check gives the authoritative diagnostic.

A job tool returns “job_id not found”#

Call list_jobs to see every job the server currently knows about. For cloud jobs it queries the launcher live, so anything still alive on the provider shows up with its real job ID — pass that ID (plus cloud and cluster_name) to get_job_status / get_job_logs. Local-job logs are always on disk at ~/.oumi/mcp/job-logs/<job_id>/.

Under the Hood#

Source lives under oumi-ai/oumi.
Tool implementations are registered in oumi-ai/oumi.
Guidance resources are simple strings under oumi-ai/oumi.
Config discovery uses the same YAML library under oumi-ai/oumi that the CLI does — no duplicate source of truth.