The word laboratory is not decoration. A lab earns the name by holding open questions long enough to answer them properly — by treating a hard problem as a thing to be measured, modeled, and understood, not merely shipped.
We build production systems for clients. That work pays the bills and, more importantly, surfaces the questions worth chasing. The engagement is the occasion; the research is what makes the next engagement cheaper, safer, and better than the last. Each thread is framed the same way: the question, the approach, what we learned or built, and where it is going.

Buying research increasingly happens inside AI assistants, not search results, where there is no rank and no console. We decompose 'AI visibility' into ~14 proprietary, formulaic metrics — embedding-alignment and token-probability style measures — so the question becomes diagnosable rather than opaque.
Transcribe-then-analyze discards almost everything that makes speech speech — tremor, hesitation, the hedged word. We route audio through a Praat phonetics sidecar to extract real vocal features, then fuse them with the transcript so feedback can say not just 'you hedged' but 'you hedged, and your voice confirmed it.'
A spec is decomposed into independently verifiable tasks; a runner dispatches agents that must pass a gate — format, lint, type-check, tests — before 'done' counts. Fleets run in isolated git worktrees so changes can't collide. Independent verifiability is the whole game.
A fast voice front-end talks to a patient and gathers detail; a separate supervisor service makes the triage decision against an established clinical protocol. The voice is never trusted to make the call — it gathers; the supervisor adjudicates and owns the record.
Routing a problem across multiple frontier models and reconciling their answers sometimes wins and sometimes just costs more. We study the production patterns that make an ensemble worth its overhead — and the cases where one strong default is the right call.
Regulatory corpora are long, dense, and unforgiving — a wrong or unsourced answer is worse than none. We index multiple corpora and pursue retrieval that returns answers a reader can trace back to the passage that proves them.
A question pursued with an academic collaborator: simulate populations on a spatial lattice and watch whether a shared language emerges, drifts, or fragments. A first version has shipped.
Many participants, shared state, GPU-rendered, in real time — the substrate question underneath collaborative simulation. We have an architecture prototype and a list of the constraints that actually bind.


A thread is open until we can state, precisely, what we now know that we did not know before. Some close into products. Some close into a method we reuse on every engagement. A few stay open because the honest answer is still we are not sure — and that is a legitimate place for a lab to stand.