Workshop BuildWorkshop / Public Build May 20261 day · 8 hours hands-on

The Agentic Operating System — workshop build

A live multi-agent ops shell, designed and built with 40 engineers in one room.

AIMED · public workshop · ~40 engineers

Multi-Agent SystemsAI Workflow OrchestrationMCP IntegrationsAI Engineering Architecture

engineers shipped a running multi-agent shell on their own laptops

MCP servers per attendee, written from scratch

8 hrs

concept to working artefact

Business problem

What the team was actually solving

Most teams meeting agentic AI for the first time get stuck on one of three blockers: tool design, orchestration choice, and the gap between a working demo and a system that survives Monday morning. The AIMED workshop format compresses the answers into one day of hands-on building.

Existing workflow

Where the old process broke

1Engineers had used Claude and Copilot but never built an agent loop end-to-end
2Multi-agent orchestration is widely-discussed but rarely walked through in code
3Workshops that just demo finished agents leave attendees without the muscle memory to ship their own
4A real artefact that survives the day is the difference between a workshop and a memorable workshop

Proposed solution

The AI / technical solution we shipped

A day-long live build of "the Agentic Operating System" — a multi-agent shell with a supervisor (planning, decomposition), handoff agents (parallel reads, sequenced writes), shared tool registry via MCP, and observability wired in from line one. Every attendee leaves with a running shell on their own laptop, the source, and the patterns to extend it.

System architecture

How the system is wired

Agentic OS — the shell we built live

Agent workflow

The specialist roles

The three specialists

Technology

Workshop stack — same primitives every attendee built on

Integration

How attendees integrated the shell into their own work

The shell is generic — file reads, shell commands, web search. By the afternoon, attendees were forking it to wire in their own tools: GitHub API tools, Jira tools, internal codebase searchers, and so on.

Files MCP: read/write scoped to a workshop sandbox dir
Shell MCP: scoped to a small allowlist of commands
Search MCP: web + GitHub code search wrapped behind one tool
Each attendee added one tool of their own by the end of the day

Security & scalability

Patterns we drilled (production-grade, not workshop-grade)

Scope tools, not models

The shell's tool registry is what we tightened, not the model prompts. Lesson: the model is the cheap part to swap; the tools are the contract.

Audit-first design

Every tool call traces. Every cost attributes. Workshop or not, this is the only way to debug multi-agent.

Cache the prefix

Stable system prompt + tool registry caches across calls. Workshop attendees saw the cache-hit rate climb live as we built.

Fail loudly

Validation failures escalate to the supervisor. Silent fallback is what makes agents flaky.

Methodology

Workshop format

Hour 1 — Setup

uv venv, Anthropic SDK, repo cloned. Every laptop calling Claude before coffee.

Hour 2–3 — First agent

Single-agent loop with a single tool. Hello-world for tool calling.

Hour 4 — MCP

Wrap the file + shell tools as MCP servers. First reusable abstraction.

Hour 5–6 — Supervisor + handoff

The shell takes shape. Multi-agent orchestration in code attendees can read.

Hour 7 — Observability

Wire Langfuse. Per-step trace becomes the debugger of choice for the rest of the day.

Hour 8 — Add-your-own-tool

Attendees fork. Forty different production-shaped extensions in one room.

Observability

Observability from line one

The single most-fed-back lesson from the day was that wiring Langfuse before the first multi-agent call (not after) changed how attendees debugged. Once the per-step trace was visible, problems that took minutes to find took seconds.

Per-step trace with the workshop attendee's name as session prefix
Cost-per-attendee surface — running cost visible all day
Cache-hit rate climbed from 0% at hour 1 to ~72% by hour 6 as the system prompt stabilised

Automation impact

What the attendees walked away with

A running shell on their own laptop. The source on GitHub. Three working MCP servers. A supervisor + handoff pattern they can extend. And — most valuable — the muscle memory of having built this end-to-end.

engineers shipped a running multi-agent shell on their own laptops

MCP servers per attendee, written from scratch

8 hrs

concept to working artefact

Lessons learned

What we'd tell another team building this

01Workshop attendees retain ten times more from one full build than from ten finished demos. Build it together; do not show it pre-built.
02Langfuse-from-line-one was the highest-leverage decision in the curriculum. Without per-step trace, hour 5 onwards becomes guesswork; with it, debugging is direct.
03Three specialists with narrow scopes beat one big agent loaded with everything — confirmed live by an A/B at hour 6, with the audience watching the eval-score deltas.
04The "add-your-own-tool" hour is what makes the workshop generative. The variety of extensions in one room is what the attendees remember three months later.

What's next

How the workshop pattern extends

The Agentic OS shell is now the curriculum spine of the corporate training programs we deliver. Customers come in with their own integrations to bolt on, and the same 8-hour structure compresses neatly into one full day of an 8-day intensive.

Same pattern delivered as Day 5 of the corporate 8-day intensive (IT track)
The Agentic OS GitHub repo is the public reference implementation
Workshop graduates often return with their own MCP servers and extensions to share

Want this workshop for your team?

The same format runs on-site for engineering teams of 8–14. We customise the integrations to your stack — by the end of the day your team has shipped a working agentic shell in your codebase.

Enquire about an on-site workshop See the training tracks

Deep dives

Read what we publish on this

Multi-Agent

Adjacent

Solutions & topics worth reading next

Solution

More implementation proof

Open-Source / PoC

Multi-agent research synthesis — open PoC for swarm vs supervisor

An open-source experiment comparing orchestration patterns on a real research task.

Read this case study Enterprise Engagement

PR review pipeline cuts senior-engineer time 4×

Multi-agent CI workflow for a 180-engineer monorepo.

Read this case study

Browse all case studies