Why I Joined Oumi: A Personal Journey Toward Truly Open AI

Why I Joined Oumi:
A Personal Journey Toward Truly Open AI

June 2nd, 2025

Not long ago, I came across a fascinating demo on Hacker News—a real-time voice chat system that felt like a glimpse into the future*. It worked like a chatbot, but instead of typing, you could talk to it, and it would talk back instantly. The responsiveness was striking. Naturally, I was curious: how did it work under the hood, and how did it achieve such impressively low latency?

As I dug into the project, I discovered it was built using a combination of well-known, supposedly open-source machine learning tools: Whisper for speech-to-text, Kokoro for text-to-speech, Llama 3 for language understanding and response, and Silero for voice activity detection. Inspired, I started thinking about creating a video series that would explore each of these components in depth—an open-source guide to building a real-time voice agent from the ground up.

But as I looked closer, a pattern emerged. While these models had open weights, the openness largely stopped there. Some—like Llama 3 and Whisper—also shared their model architectures and partial code. Others, like Silero, only released a frozen snapshot in an intermediate format. None of them shared the data they were trained on, or the complete training code required to replicate their results from scratch.

That’s when the realization hit: even with big-tech scale compute, we still couldn’t fully reproduce or build upon these systems†. The core knowledge and processes are locked away inside tech giants like Google, Meta, and OpenAI.

This isn’t just a technical inconvenience—it’s a missed opportunity for the entire field. I believe foundational models and generative AI are, or at least should be, public goods. To maximize their impact on research, education, and innovation, they need to be truly open.

This is exactly why Oumi’s mission resonated with me. Oumi is building open-source tools to democratize frontier AI development—covering the entire lifecycle from data engineering and synthesis to training, fine-tuning, evaluation, and deployment. When the chance came to join the team as Lead Developer Relations Engineer, I didn’t hesitate.

Now, I have the privilege of helping grow Oumi’s open-source community and connecting with researchers, developers, and enthusiasts across both academia and industry. I’m genuinely excited for what lies ahead—and I’d love to hear from you.

If you're working on similar challenges or just passionate about open-source AI, let’s connect on LinkedIn. Send me a message—I’d be glad to chat!

* For those of you who are curious, the project is RealtimeVoiceChat.

† It would be challenging to train, for example, Meta 3 from scratch using an open-source recipe. However, will this still be the case in 5 years as the cost of compute exponentially decreases? And even the details of the training recipe themselves would be useful for research purposes. On the other hand, smaller foundation models like Silero VAD can be reasonably trained with a single GPU.

Contributors: Stefan Webb