I design and build multi-agent AI operating systems with living memory, local inference, and human-in-the-loop governance. 25+ years of enterprise architecture — now building the agentic infrastructure that compounds knowledge across sessions, agents, and machines. The rare combination of deep enterprise experience and hands-on AI systems building.
Most organizations experimenting with AI agents are running single-agent demos against isolated problems. I design and build systems where multiple specialized agents collaborate under structured governance, building on each other's work through shared memory. I also ship practical AI-powered tools that real teams use every day.
A memory architecture where AI agents, humans, and subsystems share one substrate. Three orthogonal axes: storage (Living Memory, Alexandria, Episodic Continuity), interpretation (substrate — universal state; lens — persistent role-shaped filter; frame — transient task-binding), and action (typed state-changes, outcome → procedure feedback, intentional non-action). The render equation: view = render(substrate, lens, frame). Sentra named substrate and lens; we add frame as the missing transient layer. The same pattern shipped years earlier at Macmillan as UXF / ASO26 / EDR — this is its solo-scale continuation.
The development OS behind every project here. Forge is a harness engineering framework — it doesn't add AI to an existing process, it replaces the process entirely. Twelve specialized agents (architect, builder, analyst, reviewer, tester, debugger, and more) operate under structured SOPs, a governed backlog, TDD enforcement, and persistent session memory. The loop: a planning agent decomposes work into tickets via the Forge backlog API; Stewart grooms intake to ready with verified AC and an agent assignment; the dashboard auto-generates an agent-scoped prompt at the ready→in-progress transition; one click on Launch copies it to the clipboard, the agent runs in a fresh session, and submission triggers an automated multi-stage review gate (lint, 700+ test suites, Eval, code review, pre-commit) that auto-closes the ticket on pass or stages a "fix it" prompt via a Relaunch button on fail. When something fails, the fix is never "try harder" — it's "what capability is missing, and how do we make it legible to the agent?"
Documentation →Applied the Forge framework to Salesforce development workflows with bidirectional Jira integration, Confluence documentation sync, and automated requirements traceability from business need to deployed code.
Designed and built an internal media campaign management system used by the HR People Team. Multi-format content creation (AI-generated audio, avatar video, text), Slack distribution, branded media player with engagement analytics. In active daily use.
A product bet exploring AI-powered compounding learning. Students upload course notes (images, PDFs, docs) and an LLM extracts atomic concepts with prerequisites and cross-course connections into a personal knowledge graph. Builds scoped study guides, with 8 built-in skills (concept extraction, confusion pair detection, exam postmortem, bridge detection). The same knowledge-graph-plus-learning-layer pattern from the enterprise work, applied to student learning.
Designed a framework for synthesizing customer engagement signals across touchpoints into a unified context layer. Identity resolution, CRM architecture, behavioral data, and real-time orchestration powering personalization and support cost reduction.
An autonomous AI operating system where eight specialized agents collaborate through governed PostgreSQL to discover, evaluate, match, and help apply to jobs. Three-layer dedup eliminates 95% of noise before LLM scoring. Local inference handles the majority of scoring via a four-machine Apple Silicon fleet — Qwen3.6-35B MoE for fast evaluation, Qwen3.6-27B Dense for synthesis — with cloud API as fallback only. Two public PyPI libraries (strata-match, strata-harvest). Variable API cost: $2–5/month.
Documentation →A four-machine Apple Silicon cluster running oMLX 0.3.8 across all nodes. mac-studio1 (M3 Ultra, 96GB) is the primary inference host: Qwen3.6-35B MoE (8bit, DFlash) for interactive and scoring workloads at 245–326 tok/s; Qwen3.6-27B Dense (8bit, thinking ON) for synthesis and coaching at 91.9% TruthfulQA. mac-mini3 (M4 Pro, 64GB, 4TB) is the dedicated Alexandria worker — nightly librarian schedule (synthesis-detector, wiki-lint, link-builder, dedup) without competing with studio1's live traffic. mac-mini1 + mac-mini2 (M4, 16GB each) are the always-on edge tier: Qwen3.6-2B-MLX-4bit + snowflake-arctic-embed for sub-3s extract-fast and embed-fast regardless of studio1 load. Application code calls named slots (score-fast, score-deep, synth-deep, extract-fast, embed-fast); the role-resolver maps each to the right host, model, and oMLX profile. Alexandria session priming, daily digests, and agent knowledge retrieval all run on-device. Shadow-mode calibration benchmarks local quality against cloud continuously.
Without memory, agency does the same work repeatedly. An agent fleet without shared memory is individually capable but collectively starts over every time. The architecture I designed solves this with three distinct layers, each with its own failure modes and quality signals.
The learning layer is the part most teams skip. It's where raw experience becomes structured understanding: what worked, what didn't, what should be applied next time. Without it, you have storage and retrieval but no compounding.
Speed without governance means fast in the wrong direction. The framework enforces bounded authority (each agent has a narrow, architecturally enforced scope), continuous approval gates, automatic audit trails, and session isolation with conflict detection. Governance isn't a policy layer. It's a first-class architectural concern.
The system is designed for real enterprise infrastructure: Salesforce with governor limits, Jira with all its workflow complexity, Confluence as a living documentation target. Most agent demonstrations run in isolation. This one operates where the constraints are real and the consequences matter.
This architecture didn't emerge in isolation. It synthesizes ideas from Ashwin Gopinath / Sentra (Company Brain — substrate & lens vocabulary), Andrej Karpathy (Software 2.0, LLM OS — memory is the file system), Boris Cherny (context engineering as the core discipline), and Program-Aided Language Models (PAL) (structured reasoning through code). The deeper lineage runs through UXF / ASO26 / EDR at Macmillan Learning, where the substrate-vs-lens pattern shipped in production years before Sentra named it. The key insight: most agentic systems bolt memory onto agents as an afterthought. We built the substrate as the foundation, declared lenses as the role-shaped filter, and added frames as the transient layer Sentra and others don't ship.
Memory platforms like Mem0, Zep, and LangMem solve recall. Orchestration frameworks like CrewAI and LangGraph solve coordination. Per-tool memory in Notion, Linear, and Cursor remembers each tool's own stuff. Glean searches across them but doesn't preserve state-change. None of them ship the layer that matters: one substrate that remembers across systems, with role-shaped lenses on top and transient task frames inside. The substrate generalizes; lenses don't. Memory fragmentation in everyone else's stack is the moat — the substrate that remembers state-changes across systems and renders them through a Sales lens, On-call lens, or Exec lens is the durable position.
The thread through my career: designing technology systems that help organizations understand their customers, make better decisions, and operate more intelligently. The actors have changed over 25 years. The architecture thinking hasn't.
Self-contained, interactive HTML presentations covering the architecture, the research, and the strategic vision. Each one is a complete narrative, not a slide deck.
Strategic pitch for institutional memory as a moat: one substrate, many role-shaped lenses, transient task frames. UXF / ASO26 / EDR lineage as production prior-art; memory fragmentation across per-tool memory as the durable position.
StrategyHow multi-agent orchestration transforms the Salesforce delivery lifecycle: confidence evaluation, brownfield discovery, and human-agent co-development.
ArchitectureDeep technical walk-through of the three-axis memory architecture (storage × interpretation × action), the substrate / lens / frame primitives, the render equation view = render(substrate, lens, frame), and the lens roster of consumers across agents and humans.
StrategySynthesizing engagement signals across customer touchpoints into a unified context layer. Identity resolution, behavioral data, and AI-powered orchestration.
ResearchComparative analysis of the 2026 memory landscape: substrate-vs-lens vocabulary from Sentra, three-axis classification, the moat thesis (memory fragmentation in per-tool memory). 35+ sources with claim traceability and gap analysis.
ArchitectureInside the harness engineering framework: agent network, SOP enforcement, approval gates, TDD pipeline, session continuity, and the execution loop that runs every ticket from plan to done. The architecture that makes the rest of this possible.
Observations from building multi-agent systems for real enterprise work. No theory. Just what I've learned.
Why a dozen fast agents without shared memory is just expensive repetition.
The distinction most teams miss, and the three-layer architecture that solves it.
Approval gates, bounded authority, session isolation, and calibrating the human-in-the-loop.
Why retrieval-augmented generation solves the search problem but not the knowledge problem.
Decades of enterprise complexity aren't just relevant to the agentic stack — they're the part most teams are missing.
Why I built Strata: eight governed agents, three-layer dedup, local LLM inference across a four-machine Apple Silicon fleet, and the architecture decisions that make it run for under $5/month in API costs.
Building a four-machine Apple Silicon inference cluster with oMLX — slot-based routing, tiered hardware, and why control and privacy matter more than API cost savings.
I'm looking for my next role leading AI-native engineering organizations — where the ability to actually build agentic systems, architect for enterprise reality, and ship production-grade AI infrastructure all matter. VP/Head of AI Engineering, AI Strategy, or Agentic Platform leadership.