Oumi AI

DCVLR flow

Compete to Curate Smarter Vision-Language Data—And Win Big at NeurIPS 2025

June 18th, 2025

Introducing the DCVLR Competition

AI’s next leap hinges on reasoning—and vision-language models are the new frontier. While models like DeepSeek-R1 and GPT-o1 have brought reasoning to the forefront, progress in visual reasoning has been slower to catch up.

That’s exactly why the Data Curation for Vision-Language Reasoning (DCVLR) competition at NeurIPS 2025, sponsored and organized by Oumi and Lambda, matters.

We’re challenging researchers, engineers, and enthusiasts to do something novel: curate a compact, high-impact dataset that helps a small vision-language model (less than 10B parameters) reason better. You’ll bring your own data—sourced from HuggingFace or elsewhere—and create a dataset that elevates performance through fine-tuning.

The Twist?

You won’t just select existing samples. You can synthesize, augment, or generate entirely new data. Your mission is to assemble a set of examples that enables a hidden model to achieve the best accuracy on an evaluation set. Although the evaluation datasets themselves aren’t going to be revealed (yet), successful entries will demonstrate improved reasoning capabilities on images of tables, diagrams and natural images.

Choose between two tracks:

  • Track 1: Curate 1,000 examples
  • Track 2: Curate 10,000 examples

This competition builds on the success of techniques seen in LIMO and S1—where small, high-quality instruction-tuning datasets led to outsized gains in reasoning performance.

The next breakthrough could be yours.

What’s in It for You?

  • 🏆 $1,500 — First place on the leaderboard
  • 💡 $1,000 — Most innovative approach (among top entries)
  • đŸ„ˆ $200 — Honorable mention

Plus, to help you get started:

  • Oumi provides a complete starter kit with scripts, tutorials, and baseline models.
  • Lambda is offering GPU cloud credits for select student teams.

Whether you’re in it for the prizes, the science, or just the fun—you’ll be helping shape the next generation of multimodal reasoning models.

Why This, Why Now?

DCVLR isn’t just another competition. It’s a new kind of challenge with three key innovations:

  1. Creation over selection: You’re encouraged to build data from scratch—via retrieval, augmentation, synthesis, or anything in between.
  2. Curation over scale: This is about fine-tuning with smartly chosen examples—not brute-force pretraining. It’s more accessible, especially for academic and indie teams.
  3. Reasoning over breadth: The goal is to push performance on multimodal reasoning, a specific and critical capability where many models still struggle.

How to Get Started

We’re limiting participation to 500 teams (1–20 members per team), so don’t wait!

👉 Register now on the DCLVR homepage and for more details on the rules
👉 Give us a ⭐ on GitHub
👉 Dive into the starter kit and join our Discord community

Let’s find out just how far the right data can take us.

Resources

Contributors: Stefan Webb, Benjamin Feuer, Oussama Elachqar

Oumi AI
Built to be truly open for everyone
Truly open and collectively developed AI is the future.
Build that future.