AIâs next leap hinges on reasoningâand vision-language models are the new frontier. While models like DeepSeek-R1 and GPT-o1 have brought reasoning to the forefront, progress in visual reasoning has been slower to catch up.
Thatâs exactly why the Data Curation for Vision-Language Reasoning (DCVLR) competition at NeurIPS 2025, sponsored and organized by Oumi and Lambda, matters.
Weâre challenging researchers, engineers, and enthusiasts to do something novel: curate a compact, high-impact dataset that helps a small vision-language model (less than 10B parameters) reason better. Youâll bring your own dataâsourced from HuggingFace or elsewhereâand create a dataset that elevates performance through fine-tuning.
You wonât just select existing samples. You can synthesize, augment, or generate entirely new data. Your mission is to assemble a set of examples that enables a hidden model to achieve the best accuracy on an evaluation set. Although the evaluation datasets themselves arenât going to be revealed (yet), successful entries will demonstrate improved reasoning capabilities on images of tables, diagrams and natural images.
Choose between two tracks:
This competition builds on the success of techniques seen in LIMO and S1âwhere small, high-quality instruction-tuning datasets led to outsized gains in reasoning performance.
The next breakthrough could be yours.
Plus, to help you get started:
Whether youâre in it for the prizes, the science, or just the funâyouâll be helping shape the next generation of multimodal reasoning models.
DCVLR isnât just another competition. Itâs a new kind of challenge with three key innovations:
Weâre limiting participation to 500 teams (1â20 members per team), so donât wait!
đ Register now on the DCLVR homepage and for more details on the rules
đ Give us a âïž on GitHub
đ Dive into the starter kit and join our Discord community
Letâs find out just how far the right data can take us.
Contributors: Stefan Webb, Benjamin Feuer, Oussama Elachqar