Background
I'm a founding engineer at Falconer, building production AI agent systems: tool execution, retrieval pipelines, evals, and streaming at scale. Before that, staff engineer at Stripe. MIT EECS. I take on a small number of consulting engagements.
What I see go wrong
Teams often overfit to rigid workflow pipelines that break the moment a user does something unexpected. In many cases a general agentic loop with well-designed tools works better, and the architecture decision is easier to get right early than to fix later. Most eval suites are built on the happy path too, which misses where the real failures happen and makes it harder to ship meaningful improvements. And premature model cost optimization tends to distort product decisions before you even know what you're building.
How I help
I advise on architecture, tool design, and eval strategy. These are decisions that are expensive to undo, and it helps to have someone who has made them before. I also sometimes build: if you need an end-to-end prototype to figure out whether an idea is worth pursuing at all, I can do that in a tight sprint.
Get in touch
If you're building an agentic product and need a second opinion on architecture, evals, or whether something is actually worth building, reach out with a bit of context.