LLM evaluation and guardrails
Built evaluation frameworks for AI-powered product experiences, including groundedness, relevance, and safety checks, to make output quality measurable before broader rollout.
01 Origin
I trained as a Chemical Engineer because I wanted to work in oil. I did, for some of the biggest companies in the industry. It wasn't for me. But what I took from it was how to think from first principles. How to create value from raw, messy inputs. How small variables change everything downstream. How the failure of one component quietly breaks the whole.
That's still how I think. I came into product through the unglamorous side: requirements gathering, QA, enterprise transformation work where the client doesn't tell you the real problem until week three. I learned to wait for it.
The unglamorous side turns out to be where all the real learning is.
02 Approach
Most teams don't fail because they move too slowly. They fail because they build confidence around the wrong problem. I've learned to spend more time where uncertainty is highest, not to delay shipping, but to make later decisions cheaper, clearer, and easier to defend.
My default is to create structure early: better questions, sharper hypotheses, prototypes that are honest about what they're testing, and enough clarity to know what success should look like before a team starts optimizing for speed.
The question nobody asked is usually the one that unlocks everything.
AI is compressing this in useful ways. I'm using it to run more experiments in the problem space faster, stress-test assumptions that used to take weeks, and surface edge cases earlier. The discipline stays the same. The velocity is new. Discovery still comes first, but now the window between hypothesis and evidence is shorter.
03 Philosophy
Not values from a workshop. Things I've been wrong about, then right about, enough times that I just hold them now.
The product is never the point. The human problem is. Every feature, every sprint - if I can't trace it back to a real human pain, I get suspicious.
Good questions age better than good answers. Frameworks change. The right question does work for years.
Slow down to ship faster. The rushed decision that skips discovery always costs more downstream.
Conviction is not certainty. I'll commit fully to a direction while staying genuinely open to being wrong.
The metric proves it worked. The human proves it mattered. I track both.
04 On AI
I've spent the better part of the last two years building AI-native products and evaluation frameworks. That changes how you think about what "done" means. With LLMs, done isn't a state, it's a dial.
What I built at Tech1M
Evaluation and guardrails framework for LLM features: content relevance, groundedness scoring, PII safeguards, rubric-based output evaluation. Goal was stabilising quality before GA, not patching after.
The teams getting the most from AI understand their problem space deeply enough to evaluate the output. You can't prompt your way out of not knowing what you're solving for.
I'm also thinking about what AI does to the humans using the products. That's the question I don't think enough PMs are asking.
05 Selected Work
Built evaluation frameworks for AI-powered product experiences, including groundedness, relevance, and safety checks, to make output quality measurable before broader rollout.
Led work on a large enterprise transformation affecting 50,000+ partners, with 99.9% uptime post-launch and 75% adoption within three months.
Worked on payment systems spanning 60+ countries, shaping validation logic and edge-case handling in flows where downstream failures were expensive.
Redesigned onboarding and operational flows in environments where ambiguity created compliance and execution risk.
What matters to me
I care less about flashy metrics than whether the work held up under real constraints. The projects I'm proudest of are the ones where quality became clearer, risk got reduced earlier, and teams could move with more confidence because the system made more sense.
I'm easy to reach and enjoy a good conversation about hard problems. No pitch needed.