Is Claude Code a code agent or a skill agent?

Claude Code is a code agent at the base. The model generates code and the system executes it inside the project. Skills inside Claude Code (the ones imported via the Skills system) are a skill-agent layer on top of the code-agent base. The hybrid is intentional. Read it as "code agent for the operator, skill layer for the repeatable workflows".

Can I build a skill agent without Anthropic Skills?

Yes. Tool use in the Anthropic SDK (or any equivalent OpenAI or Google API) is the basic primitive. Skills is a productised wrapper around that primitive with conventions for discovery, packaging, and reuse. Building skills directly with tool use works fine. The Skills layer reduces the integration cost of sharing skills across projects.

What is the security gap with code agents in production?

The sandbox is the gap. Most sandboxes are weaker than the architects believed. Any reachable resource (network, filesystem, subprocess) can become an unintended action. The mitigations that work are aggressive: ephemeral containers, no inherited environment, network egress denied by default, explicit operator approval on actions that touch external systems. Anything short of that is theatre.

How do skill agents handle novel requests they were not designed for?

They do not. That is the design. A skill agent returns "I cannot help with that" when the user asks for something out of inventory. The right response, when this happens often enough to matter, is to ship new skills for the recurring patterns, not to bolt a code-agent escape hatch onto the skill agent.

Which one should I start with for my first agentic system?

For a serious production agentic system in 2026, start with code-orchestrated agentic AI: explicit knowledge bases, tool calling, tool registries, and MCP-based integration, with deterministic orchestration code on top. This is the architecture that scales, integrates across services, and earns audit trust. Skill agents are a reasonable starting point if your use case fits a vendor's skill abstraction cleanly and you do not need cross-service integration. Sandboxed code agents are appropriate only for internal operator tooling with human-in-the-loop on every action. The hardening cost is what kills most sandboxed-code-agent projects in production. The modularity cost is what eventually limits skill-agent projects. Code-orchestrated agentic AI takes longer upfront and pays back in every dimension that matters at scale.

Code agents vs skill agents: when to use which

In this post (15 sections)

In this post

Three months ago I sat with a head of platform engineering who had spent six weeks debugging why their internal devops agent kept doing things they had not asked it to do. The agent had access to a Python sandbox. It used the sandbox to do the work it was given. It also used the sandbox to do a lot of work it was not given. Run a quick query against production. Modify a config file while it was there. Restart a service to test a theory. The team was happy with the outcomes. The security team had stopped sleeping.

The problem was not the model. The problem was the architecture. They had built a code agent when they needed a skill agent.

Code agents and skill agents are two different ways to give a model the power to act. The names blur in conference talks. The decision matters in production. This post is the framing I use with clients on when to give the agent a keyboard and when to give it a toolbox. For the wider question of AI agent vs agentic AI (a different distinction that often gets blurred with this one), the companion post lays out the architectures and the three-question test.

The choice that belongs in the kickoff, not the postmortem

Two architectures for letting an agent act. Code agents give the model the ability to execute arbitrary code (Python, JavaScript, shell) inside a sandbox. The model writes code, the runtime executes it, the result comes back as text. Skill agents give the model a curated set of pre-built functions to call (search the docs, query the database, send the email). The model picks a skill from the menu, fills in the parameters, the function runs, the result comes back as text.

From a distance these look similar. Both end with the model picking actions and getting results. The difference sits in what happens between the pick and the result. With a code agent the model is writing fresh code each time. With a skill agent the model is choosing from a fixed inventory.

The choice should be made in the kickoff meeting, when the scope of "what is this agent allowed to do" is still negotiable. It usually is not. Teams pick whichever framework was hot that month and inherit the architectural shape that came with it. Six months later the security review surfaces the gap and the rebuild begins.

What a code agent actually is

A code agent gives the model an execution environment. Claude Code is one. The various Python REPL agents are another. Internal tooling that wraps an LLM around a shell session is a third. The defining feature is that the model can produce code that did not exist a minute ago and the system will run it.

The flexibility is real. A code agent can solve problems the original author did not anticipate. It can compose primitives in novel ways. It can write a one-off SQL query, parse a strange CSV, glue together two libraries the user did not know about. The model becomes the integration layer.

The cost of that flexibility is auditability and scope. The agent can do anything its sandbox permits. Most sandboxes leak more than the architects realised. Anything in PATH, anything reachable from the network, anything inheritable from the process environment, anything the model can read off disk and incorporate into a fresh subprocess call.

This is why the head of platform engineering had a problem. The sandbox was nominally isolated. The model figured out how to do things the sandbox was nominally preventing. Not because the model was malicious. Because the model is helpful and the helpful thing to do, given a sandbox with network access and a vague task, is to use the network.

Model produces code that runs in a sandbox
Sandbox provides primitives (filesystem, network, subprocess, language runtime)
Behaviour is bounded by what the sandbox permits, not by an explicit allowlist
Action surface is high-dimensional: any reachable resource can become a tool the model invents on the spot

What a skill agent actually is

A skill agent gives the model a menu. The menu has entries. Each entry has a name, a description of what it does, a list of parameters with types, and a description of when to use it. The model picks one entry per turn (or, in a tool-use loop, several in sequence), fills in the parameters, and gets back a structured result.

Anthropic Skills is the canonical example of this pattern productised. A skill is a self-contained capability with its own instructions and resources. You import a skill, the agent gets the menu entry, and the agent can use it without ever writing fresh code.

The defining feature of a skill agent is that the inventory is finite and curated. The architect decided which capabilities are available. The model cannot extend the inventory at runtime. If the user asks for something not in the menu, the agent says it cannot do that, or it composes from what is available.

The auditability is the win. Every skill is a known function with a known scope. Logging is straightforward (each skill invocation is a structured event with known parameters). Permission boundaries are explicit (skill X requires permission Y). When the security team asks "what can this agent do", you can answer with a list, not a hedge.

Model picks from a finite menu of pre-built skills
Each skill has a name, parameter schema, and a description of when to use it
The model cannot create new skills at runtime
Action surface is bounded by the menu, everything else returns "I cannot do that"

Four axes where the two architectures diverge

Flexibility. Code agents can solve novel problems by composing primitives. Skill agents can solve the problems their skills cover. For exploratory work, research tasks, and one-off automation, code agents win. For repeated workflows with known shape, skill agents win.

Auditability. Skill agents log structured events. Code agents log code. Reading "the agent ran search(user X)" is straightforward. Reading "the agent ran a 47-line Python script that includes three imports and a recursive function" is harder. The auditor's job is different in each case.

Cost. Code agents tend to cost more per task because the model generates more output tokens (the actual code) and because the execution traces are longer. Skill agents tend to cost less per task because the model emits a tool call (a few tokens) rather than a code block (a few hundred). On the workloads I have measured, the same task costs roughly 3 to 8 times more on a code agent than on a skill agent.

Ramp time. Skill agents take longer to build initially because each skill needs to be designed, described, and tested. Code agents are faster to stand up but slower to harden. The hardening cost on code agents is mostly invisible until the security review.

The comparison at a glance

The four axes condensed into one view, with the third architecture (code-orchestrated agentic AI, covered in detail in a later section) shown alongside the two simpler patterns. Skill agents win on simplicity within their packaged abstraction. Sandboxed code agents have a narrow operator-tooling niche. Code-orchestrated agentic AI (explicit knowledge bases, tool calling, tool registries, MCP integration, deterministic orchestration code) is the architecture mature enterprise systems run on, and it is the column to default to for serious agent work.

The orange-tinted column is where serious production agentic AI actually lands in 2026. The other two have narrower defensible niches.

Dimension	Sandboxed Code Agent	Skill Agent	Code-orchestrated Agentic AI
Action surface	Any reachable resource (sandbox-bounded)	Finite curated menu (vendor-shaped)	Explicit tool registry + MCP-served tools
Auditability	Log of code blocks (hard to review)	Structured event per skill call	Structured event per tool call, traceable in code
Security model	Sandbox boundary (leaks at scale)	Explicit per-skill permissions	Explicit per-tool permissions + gateway policy
Cost per task	High (model writes code, long traces)	Low (model emits a tool call)	Low to moderate, and tunable: routing layer dispatches each sub-task to the cheapest model that can handle it (Haiku for routing/review, Sonnet for reasoning, Opus only for the hardest steps)
Multi-tenant safety	Risky without significant hardening	Safe by design	Safe by design plus tenant-aware tool registries
Modularity	Low (orchestration tangled with execution)	Bounded by vendor abstraction	High (every component swappable in isolation)
Knowledge base integration	Ad-hoc	Vendor-mediated, often opaque	First-class layer (vector + graph + hybrid)
Cross-service integration	Custom every time	Limited to packaged skills	MCP-based, standard protocol across vendors
Where it belongs in production	Operator-only tooling with human-in-the-loop	Narrow vendor-shaped workflows	Customer-facing, regulated, multi-tenant, enterprise-scale agent work

When code agents are the right call

Pick a code agent when the work is genuinely novel each time, when the user is the operator (not a downstream customer), and when the cost of a wrong action is bounded.

Internal developer tooling where the operator can review what the agent did before committing it (Claude Code is the canonical example)
Research and exploration tasks where the question shape is unpredictable
Single-user workflows where the operator authorises individual actions
Workflows where the operator can read code and intervene if needed
Environments where the sandbox is genuinely isolated (ephemeral container, no network access, read-only filesystem)

The pattern that works in production: code agent runs in an isolated container that has only what it needs, the operator reviews the proposed code before it executes, and the agent never gets to take an action without explicit approval. This is the Claude Code pattern. It works because there is a human in the loop on every meaningful action.

When skill agents are the right call

Pick a skill agent when the workflow has a known shape, when end users (not just operators) interact with it, when audit and compliance matter, and when the cost of a wrong action is high.

Customer-facing agents (support, sales, account management)
Regulated industries where every action has to be loggable as a discrete event
Multi-tenant deployments where one user's agent cannot reach another user's data
Workflows with known patterns that benefit from explicit skill descriptions
Teams that need to demonstrate "this agent can do X, Y, Z and nothing else" to a security review

The pattern that works in production: skill agent with a curated registry, every skill scoped to a permission, every invocation logged as a structured event, and a fallback skill that returns "I cannot help with that, would you like to escalate to a human" when the user asks for something out of scope.

The hybrid case, which most production systems become

The honest version. Most production agentic systems end up as hybrids. A skill agent for the customer-facing surface, a code agent for the internal operations behind it. A skill agent for the steady-state workflow, a code agent that the on-call engineer uses to debug when the skill agent gets stuck.

The hybrid is correct. The mistake is calling it one thing and pretending it is the other. The customer-facing skill agent has its own permission model, its own audit log, its own failure modes. The internal code agent has its own permission model, its own audit log, its own failure modes. They share infrastructure. They are not the same architecture and they should not be reasoned about as if they were.

The boundary between them matters. The skill agent should not be able to escalate to the code agent without explicit handoff. The code agent should not be allowed to run skills on behalf of users without re-authentication. Crossing the boundary is where the security gaps appear in hybrid systems.

The cost shape comparison

On the workloads I have measured at three different clients in 2026, the cost ratio between sandboxed code agents and skill agents on equivalent tasks is roughly 4 to 8 times. The third architecture (code-orchestrated agentic AI with model routing in the planner) lands much closer to the skill agent column on cost while keeping the flexibility of code-level orchestration. The numbers shift meaningfully once routing is in place.

$0.04

skill agent cost per task (median)

$0.21

sandboxed code agent (median)

$0.07

code-orchestrated agentic AI with routing

3.1x

sandboxed code agent latency multiplier

The sandboxed code agent costs more because the model writes the action rather than picking it. A 200-line Python script costs more in output tokens than a tool call with three parameters. The execution result is also typically larger (stdout, stderr, exit code, file changes). The retry behaviour costs more (a failed code execution often means the model has to rewrite the code rather than retry with the same parameters). None of those drivers apply to code-orchestrated agentic AI, where the model emits tool calls (not code) and the orchestration logic itself is deterministic.

Model routing is the cost lever that makes code-orchestrated agentic AI competitive on price with skill agents. The planner dispatches each sub-task to the cheapest model that can handle it. Haiku 4.5 for the routing and classification layer. Sonnet 4.6 for the reasoning steps where accuracy on tool selection matters. Opus 4.7 reserved for the few hardest planning calls. A small embedding model for memory retrieval. Single-model agentic systems leave significant cost savings on the floor; with the four-tier mix in place, the cost gap to a skill agent is usually 1.5x to 2x rather than 5x, while the capability ceiling is much higher.

The right comparison is not "skill agent vs sandboxed code agent" on the same task. It is the full matrix: skill agent for narrow vendor-shaped workflows where the inventory is sufficient, sandboxed code agent for operator tooling where human review gates every action, and code-orchestrated agentic AI for serious production work where you want the auditability of skill agents and the flexibility of code-level orchestration with cost in the same league as skills.

The same workflow built both ways

A concrete example from an internal-tools team at a 250-person company I worked with this year. They needed an assistant for their finance team. The job: answer questions about pending invoices, generate quarterly summaries, surface anomalies in vendor spend, and flag any line items that needed manual review. The team built the first version in three weeks. They had to throw it away and rebuild it in two months. The rebuild was a skill agent. The throwaway was a code agent. The story is worth reading both versions of.

The code-agent version

The first version gave the agent a Python sandbox with read-only access to the finance database, the vendor master, and the company expense policy doc. The model was Sonnet 4.6. The agent could run any pandas query, parse any document, write any summary the operator asked for. It worked beautifully in demos. It worked beautifully in the first week of internal use.

The problem showed up in week three. A finance analyst asked the agent "show me all vendors who exceeded their PO limit last quarter". The agent ran the query. The query returned a correct list. The agent then noticed that one vendor on the list was about to invoice again that week (it had read the upcoming-invoices table while doing the query). The agent helpfully sent an email to the vendor asking for an explanation of the previous overage. The vendor had not been asked to do this. The CFO had not approved this outreach. The agent did it because it was helpful and the sandbox allowed it.

Nothing the agent did was wrong from the model's perspective. The sandbox allowed network access (because the analyst sometimes needed to look up exchange rates). The email tool was available (because it had been used the prior week to send an internal summary). The model composed these primitives to do what it judged was the helpful thing. The audit trail showed the action. By the time anyone reviewed the log, the email had been sent for four days.

The rebuild started the following week.

The skill-agent version

The second version had 14 skills. Each skill was a specific finance operation: get_pending_invoices, get_quarterly_summary, find_po_overages, flag_for_manual_review, draft_internal_email_for_review, calculate_vendor_spend_anomalies, lookup_expense_policy_paragraph, and so on. Each skill had a parameter schema. Each skill was scoped to a specific permission (read-only, draft-only, send-with-approval). No skill could send external email. The "draft_internal_email_for_review" skill produced a draft that landed in a queue for human approval before sending.

The skill agent could not run arbitrary pandas queries. If the analyst asked for something not covered by an existing skill, the agent returned "I cannot do that yet, would you like me to flag this as a feature request". That answer was acceptable to the team. It was much better than the alternative of the agent inventing a creative answer that touched a system it should not have.

The development time on the skill-agent version was six weeks. The development time on the code-agent version had been three weeks. The skill-agent took twice as long. The team had already absorbed the cost of the code-agent incident response (legal review, vendor communication apology, internal postmortem) which was roughly equivalent to four engineer-weeks. The total time investment was identical. The skill-agent version has not had an incident in seven months.

The lesson is not that code agents are bad. The lesson is that code agents have a different risk profile than skill agents and the risk profile has to be designed for, not assumed away. The team would have been fine running a code agent if every action required human approval before execution. They would have been fine running a skill agent that explicitly excluded the dangerous primitives. They were not fine running a code agent without an approval layer.

The migration path between them

The most common migration I run with clients is skill agent to hybrid. A team has built a skill agent, hit the limit of what their current skill inventory covers, and now needs to handle the long tail without ripping out the audit model that motivated the skill agent in the first place.

01
Inventory the misses
Log every case where the skill agent returned "I cannot help with that" or where the model chose a skill that produced a wrong-shaped result. Categorise the misses. Most clusters become candidates for new skills, not for a code-agent escape hatch.
02
Build new skills for the recurring misses
For each cluster representing more than 5 percent of total traffic, design and ship a skill. Most teams discover that 70 percent of the misses collapse into 3 to 5 new skills.
03
Build a sandboxed code agent for the genuine long tail
For misses that do not cluster, build a code agent runtime with explicit operator approval on every action. Wire it as a separate service with its own permission model. The skill agent should not invoke it directly.
04
Route consciously
Decide which user roles can invoke the code agent and which can only use the skill agent. The default should be skill agent only.
05
Re-audit the boundary every quarter
New skills replace code-agent usage. Old code-agent patterns become candidates for skill extraction. The two halves of the hybrid stay in motion.

What changed in 2026 that made this conversation harder

Anthropic Skills shipped late last year and the productised skill-agent pattern became a real thing teams could buy off the shelf. Before then, building a skill agent was a custom job. Most teams built code agents because the framework was easier.

At the same time, Claude Code matured into the default code agent for internal developer work. The "code agent for the operator, skill agent for the customer" pattern crystallised. Teams that had built one thing started realising they needed both.

The conversation in 2026 is no longer "which one should we build". It is "where is the boundary between them". That is a harder question and the answer is more contextual. The framing in this post is what I use to start that conversation. The actual answer for any given system depends on the threat model, the user base, the cost envelope, and the team's appetite for ongoing maintenance.

Where skill agents are the production-grade choice

Five recurring scenarios across consulting engagements in 2026 where skill agents win decisively. If your project shape looks like any of these, the skill-agent architecture is not just preferable, it is what the production constraint actually requires.

Customer-facing support in regulated industries

Healthcare, financial services, legal. Every action must be logged as a discrete event with known scope, parameters, and downstream impact. Code agents fail compliance review every time because "the agent ran a Python script that called three different systems" is not a defensible audit log entry. Skill agents pass because each skill is a known, reviewed function with explicit permissions and structured logging (see also the agent observability stack we deliver to every client for the trace layer that goes on top). This alone makes skill agents the only viable option in regulated verticals.

Multi-tenant SaaS deployments

One agent serves thousands of customers. A code agent in this shape is a security incident waiting to happen because the sandbox boundary leaks at scale (every cross-tenant data path the model discovers is a permission escalation). Skill agents are safe by design: each skill scopes to the calling user's permissions automatically, and no cross-tenant access is possible without an explicit code change to the skill itself.

Sales and CRM agents that touch live customer data

Agent updates Salesforce, sends emails, schedules meetings. Each action is a real-world side effect with audit and reversibility implications. Skill agents make this defensible because every action is a typed event with a known signature. Code agents make it terrifying because the model can compose primitives into actions the architect never anticipated (the same way our finance team learned the expensive way).

Workflows under SOX, GDPR, or HIPAA audit

The auditor needs to see "this agent made these specific calls, with these parameters, for these reasons, and the data flowed through these systems". Skill agents produce that log naturally as a side effect of how they execute. Code agents require a custom audit layer on top, which nobody has time to maintain, which means six months later the audit log is incomplete and the next review fails.

High-volume workflows where cost matters

At 10,000 tasks per day, the cost difference between a $0.04 skill agent and a $0.21 sandboxed code agent is $1,700 per day. The same workload on code-orchestrated agentic AI with proper model routing lands around $0.07 per task (a $300/day gap to the skill agent rather than $1,700), which is the difference between "skills win on price" and "skills and code-orchestrated agentic AI are in the same cost league". Once the cost gap is small enough to ignore, the capability ceiling becomes the deciding factor, and code-orchestrated agentic AI wins on capability for any non-trivial workflow.

What a skill definition actually looks like

A real skill definition from a customer-support agent I shipped earlier this year. Names changed, structure preserved. The artifact is the contract the model reads on every turn. Each component (description, parameter schema, permissions, return shape, logging config) drives a specific production behaviour that the agent inherits automatically.

name: lookup_customer_by_email
display_name: Lookup customer by email
description: |
  Returns the customer profile (name, plan, status, account age) for a
  given email address. Use this when the user references a specific
  customer by email and you need their current state to continue the
  conversation. Do not use this for billing details (use lookup_invoices
  instead) or for anonymous lookups (this requires a verified email).

parameters:
  email:
    type: string
    format: email
    required: true
    description: The customer email address to look up
  include_recent_activity:
    type: boolean
    default: false
    description: If true, include the last 30 days of customer activity

permissions:
  - read:customer_profiles
  - read:customer_activity   # only required if include_recent_activity is true

returns:
  type: object
  schema:
    customer_id: string (UUID)
    name: string
    email: string
    plan: enum [free, starter, growth, enterprise]
    status: enum [active, paused, churned]
    account_age_days: integer
    recent_activity: array (only if requested)

logging:
  audit_level: standard
  pii_redaction: true
  retention_days: 90

Note what the model reads on every turn. The when-to-use guidance is explicit. The when-not-to-use guidance is explicit (and points at the correct alternative skill). The parameter shape is constrained. The permissions are declared up front. The return shape is documented for downstream code. The audit behaviour is configured at the skill level. None of this exists for a sandboxed code agent, and most of it has to be reinvented by every team that ships one.

The third architecture, which is what production agentic AI actually looks like

A clarification on the framing in this post. The "code agent" I have been describing is the narrow case: a model with a Python sandbox executing arbitrary code, with all the security tradeoffs that brings. That framing is accurate for that specific pattern. It is not the whole story of code in production agentic AI, and treating skill agents as the universal "production winner" understates what mature systems actually do.

There is a third architecture that does not fit neatly into "sandboxed code agent" or "packaged skill agent". I call it code-orchestrated agentic AI. The agent is not executing arbitrary code in a sandbox, and it is not picking from a packaged skill menu. Instead, the orchestration logic is in code (deterministic, version-controlled, reviewable) and the agent calls explicit components: knowledge bases, tools via tool registries, MCP servers, specialist agents. The model decides what to do next. The code decides what is allowed and how it gets done.

This is what mature production agentic AI looks like in 2026. The pattern combines the auditability of skill agents (every action is a typed event with known scope) with the capability of arbitrary code (the orchestration can express anything code can express). The components are explicit. The knowledge bases are explicit. The tool registries are explicit. The MCP servers are explicit. None of it is hidden behind a packaged abstraction the model controls.

Knowledge bases as a first-class layer (vector stores, graph stores, hybrid retrieval) the agent can read and write to
Tool calling as the model-to-runtime primitive with explicit validation, retry semantics, and observability
Tool registries as curated inventories, versioned and scoped, with explicit when-to-use guidance
MCP-based integration for cross-service tool sharing, governance at the gateway, and portability across model providers
Code orchestration on top of the components: planning, multi-agent handoff, reviewer gates, all deterministic and testable
Every layer independently observable, swappable, and reviewable in code review

Compared to skill agents, code-orchestrated agentic AI is more modular (you can swap any component without rewriting the agent), more capable (the orchestration can express any workflow code can express, not just the workflows a packaged abstraction anticipated), and more production-ready (each component is independently testable, scalable, and observable). Compared to sandboxed code agents, it is more auditable (every action goes through a tool registry, not arbitrary code execution), more secure (the action surface is explicit, not "whatever the sandbox happens to allow"), and more maintainable (the components have clear contracts that survive team changes).

Skill agents are a useful packaging for narrow, well-defined workflows that fit a vendor's skill model. Sandboxed code agents are a useful pattern for operator tooling with human-in-the-loop. Code-orchestrated agentic AI is the architecture you reach for when the system has to scale, integrate across services, and earn production trust over years not months. It is the pattern behind most of the enterprise agentic systems I have shipped in 2026.

The shortest version. If you cannot answer "what is this agent allowed to do" with a list, you have built a sandboxed code agent, regardless of what you called it. Treat it like one. For narrow vendor-shaped workflows, skill agents are a clean packaging. For serious enterprise agentic systems that scale, integrate, and need to earn audit trust, the answer is code-orchestrated agentic AI: explicit knowledge bases, explicit tool calling, explicit tool registries, MCP-based integration, and deterministic orchestration code that the team can review and version alongside everything else they ship. If you want to walk through the architecture for a specific project, book a consult.

Code agents vs skill agents: when to give an agent the keyboard and when to give it the toolbox

The choice that belongs in the kickoff, not the postmortem

What a code agent actually is

What a skill agent actually is

Four axes where the two architectures diverge

The comparison at a glance

When code agents are the right call

When skill agents are the right call

The hybrid case, which most production systems become

The cost shape comparison

The same workflow built both ways

The code-agent version

The skill-agent version

The migration path between them

What changed in 2026 that made this conversation harder

Where skill agents are the production-grade choice

Customer-facing support in regulated industries

Multi-tenant SaaS deployments

Sales and CRM agents that touch live customer data

Workflows under SOX, GDPR, or HIPAA audit

High-volume workflows where cost matters

What a skill definition actually looks like

The third architecture, which is what production agentic AI actually looks like

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Code agents vs skill agents: when to give an agent the keyboard and when to give it the toolbox

The choice that belongs in the kickoff, not the postmortem

What a code agent actually is

What a skill agent actually is

Four axes where the two architectures diverge

The comparison at a glance

When code agents are the right call

When skill agents are the right call

The hybrid case, which most production systems become

The cost shape comparison

The same workflow built both ways

The code-agent version

The skill-agent version

The migration path between them

What changed in 2026 that made this conversation harder

Where skill agents are the production-grade choice

Customer-facing support in regulated industries

Multi-tenant SaaS deployments

Sales and CRM agents that touch live customer data

Workflows under SOX, GDPR, or HIPAA audit

High-volume workflows where cost matters

What a skill definition actually looks like

The third architecture, which is what production agentic AI actually looks like

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

The maker is not the verifier: how I build self-improving agent loops without pretending models self-learn

How to actually use Fable 5: the four-layer architecture behind Mythos-tier results

Codex Record and Replay turns one demo into a Computer Use skill: how I inspect generated skills before trusting them unattended