Staff / Lead AI Engineer
"Has read every Hamel post and will sniff out a thin wrapper in 90 seconds."
Type: Champion · Function: Engineering · Size: mid-market + enterprise Lead with: AI-Native interface layer · Motion: top-down
Why them
Owns the eval infra; following Hamel's playbook; allergic to "reliable" and "production-ready" after burning on three vendors. Quiet shame: agents are "in prod" but that means answers a prompt and doesn't crash. Tempted to build it himself. Win him by speaking his ops vocabulary and conceding what he already knows.
Angle — what to say (pick by situation)
- Hand-rolling evals: Trust Lab (early access) — full-loop evals (LLM response plus API call plus state change), golden sets pulled from your production conversations, shadow mode, canary, outcome-matching across model versions. We're the interface layer above your orchestration — not your eval layer, not your framework; the system prompt isn't exposed.
- Fighting context, cost, and latency: real-time context engine, inference gateway, model routing. Send the model only what's relevant (accuracy, latency, cost); leverage newer models the moment they ship without re-plumbing.
- Proof: Optibus A2A integration · full-loop evaluation · 20–35% lower token cost.
Hooks
- "You're already building golden sets by hand. We made that the product."
- "Speed without accuracy isn't a flywheel. It's a centrifuge. The trust layer makes iteration directional."
- "Shadow mode. Canary. Golden set. The ops words you already use."
Objections → responses
- "How does it integrate with our agents?" → A2A + MCP bridge. Your agentic loops stay yours; we route intent to them and the result surfaces in-context.
- "I could build this in a sprint." → You'd rebuild the generic interface layer and still own a 12-month roadmap. One sprint with us; your team keeps building IP.
- "Generic eval metrics are useless." → Agreed — Trust Lab uses golden sets from your production conversations, full-loop, not off-the-shelf scores.
- "Will it lock us into one model?" → Multi-model provider, model routing, progressive deployment — best model per task, swap freely.
Targeting & channels
- Technographics: LangChain / LangGraph · Braintrust · Arize · Galileo · Cursor · Claude Code
- Paid/organic: LinkedIn (AI/ML eng titles) · HN-adjacent placements · MCP & agent-dev communities · GitHub · technical docs · the CLI + Cursor/Claude Code extensions · hands-on demos
- Voices: Simon Willison · Swyx · Hamel Husain · Latent Space
Avoid
Marketing-speak; "agentic transformation"; "production-ready"; "reliable"; and never use "Operator" as a product/feature name (OpenAI collision — icp-b §9).
Source: icp-b-ainative.md §3–7, §9 · matrix positioning/gtm-messaging-matrix.md · core/architecture.md.