Case Study: Original Voices builds a voice-authentic AI on proprietary persona data with 31% higher authenticity

With Oumi’s technology, Original Voices’s personal-twin product moved from 52% to 83% authenticity pass rate — driven entirely by better evaluation, not more data.

By Stefan Webb

May 15, 2026

0 View Original Post

Problem

Building an AI that speaks in a specific person’s voice surfaces a measurement problem most NLP teams never face: how do you quantify “authenticity”? Standard benchmarks reward fluency, accuracy, and helpfulness — none of which capture whether a response sounds like this person, with their phrasing, life experiences, and emotional register.

Original Voices is building a personal-twin product where every customer’s data is proprietary by design — conversational history, profile, and answer patterns that belong to one specific individual. Off-the-shelf chat models produced confident, polished output that was unmistakably AI. In early evaluations the most common failure modes were Generic AI Language, Lack of Personal Voice, Missing Personal Anecdotes and Lived Experience, and Overly Polished Tone.

The team needed two things simultaneously: a measurement system that could quantify “authenticity” as a structured, repeatable score, and a training pipeline that kept proprietary persona data inside their own environment rather than passing it to a closed third-party API.

Solution

Original Voices used Oumi to build the full loop — proprietary data ingestion, custom evaluators, base-model exploration, and fine-tuning — all within a single platform.

“Oumi enabled us to take our proprietary data and easily create and run custom evaluations on any model. We were also able to start training on our data in minutes.” — Vedad Šoše, CTO & Cofounder, Original Voices

Custom LLM-as-judge evaluators: The team built a suite of authenticity judges from scratch. Each judge could be run independently against any model — fine-tuned, base, or prior-version checkpoints.

Iterating the rubric, not just the data: The key insight was that for voice work, judge design dominated dataset size. The team ran multiple iterations refining the authenticity rubric itself, treating the evaluation criteria as a first-class engineering artifact.

Proprietary data, kept proprietary: Question/persona-answer pairs were uploaded directly into the customer’s storage namespace, versioned alongside training and evaluation runs, with no exposure to external APIs.

Outcome

The same 2,000-row evaluation set scored 52% under the v1 authenticity judges and 83% under the v2 judges — a 31-percentage-point improvement driven entirely by tightening the rubric, with no change to the data or the underlying model. Flagged rows dropped from 955 to 340, a 64% reduction in failure rate.

A dedicated “no PII leakage” judge flagged approximately 1% of rows on a 5,000-row sweep, surfacing exact rows for review. The Llama 3.1 8B LoRA fine-tune was trained on the resulting dataset, ready for self-hosted deployment.

Beyond the headline number, the durable win was the measurement system itself. A fine-tuned model is a snapshot; a custom, runnable, version-controlled judge suite is compounding infrastructure. Every new base model, dataset, or fine-tune flows through the same authenticity rubric, and the team gets a comparable score back in minutes.

Authenticity pass rate: 52% → 83% on the same 2,000-row evaluation set
Failure-row reduction: 955 → 340 flagged rows (−64%) in 73 minutes of rubric iteration
PII detection: ~1% flag rate on 5,000-row sweep, with exact rows surfaced for remediation

What’s next

With Oumi’s technology, Original Voices were able to build a small custom AI model that solved pressing business needs. For teams building products that depend on subjective quality — voice, tone, persona, brand alignment, emotional register — the Original Voices pattern is directly transferable: custom judges, proprietary data in a controlled environment, and a measurement system that compounds in value with every iteration.

Why not try it out today and see for yourself? You only need to come with your task prompt and the Oumi Agent takes it from there!