All posts
Architecture Published 13 min

Codex Record and Replay turns one demo into a Computer Use skill: how I inspect generated skills before trusting them unattended

Codex app 26.616 adds Record and Replay on macOS: perform a workflow once, Codex packages it into a skill you replay with different inputs. Thread handoff and automation run history ship alongside. Computer Use must be enabled. Here is the review checklist I run before any recorded skill runs unattended.

Jigar JoshiJigar JoshiAgentic AI Architect and Consultant
In this post (7 sections)

Introduction

The gap in most Computer Use rollouts is not "can the model click the right button." It is "can we encode the workflow without a week of prompt engineering." Record and Replay is OpenAI's answer: show once, replay with variables. I recorded an expense filing flow and a Jira ticket creation flow the week 26.616 shipped. The throughput win is real. So is the supply-chain risk if you trust the generated skill without reading it.

This post sits next to agent supply chain security and governing agent autonomy. Record and Replay creates skills automatically. Skills are executable instruction. That is the same trust boundary as loading an unsigned SKILL.md from the internet.

What Record and Replay does (release overview)

  • Record a macOS workflow while Codex observes screen and input events.
  • Package the demo into a Computer Use skill reusable across threads.
  • Replay with different inputs (amounts, ticket titles, report dates).
  • Thread handoff between local and remote Codex hosts in the same release.
  • Bulk actions on automation run history for ops at scale.
  • Requires Computer Use enabled; EEA/UK/CH excluded initially for Record and Replay.

Record and Replay vs hand-written skills

When to record vs when to write skills by hand
ScenarioRecord and ReplayHand-written skill
Repeating admin UI workflow with stable layoutStrong fitOverkill unless compliance demands review
Production deploy or infra changesDo not record blindlyTyped tools and CI scripts instead
Workflow with sensitive credentials on screenNever record rawRedact and use vault-injected env vars
Cross-app orchestration with branching logicRecord baseline, then edit skillPlan branches explicitly in SKILL.md
Regulated audit trail requiredRecord plus mandatory human review gateSigned internal skill registry

The inspection checklist before unattended replay

  1. 01
    Read the generated skill file end to end
    Look for hardcoded URLs, account names, and accidental keystrokes you did not intend to teach. Recording captures what you did, including mistakes you corrected mid-demo.
  2. 02
    Replay once in a sandbox account
    Never first-run against prod finance or HR systems. I use the same sandbox discipline as Agentjacking triage splits: read-only or fake data first.
  3. 03
    Parameterize inputs explicitly
    Expense amount, vendor name, ticket priority should be skill inputs, not buried in prose the model might mis-parse.
  4. 04
    Pair with scheduled monitoring, not blind cron
    OpenAI's scheduled monitoring tasks fit condition-driven reruns. Combine "replay skill" with "alert if UI changed" before daily unattended execution.
  5. 05
    Log every Computer Use run
    Screenshots and video from automation history are debugging aids, not audit logs. Export structured events to your observability stack.

How Record and Replay fits multi-vendor agent stacks

Many teams run Codex beside Cursor and Claude Code. Record and Replay is Codex-specific, but the skill trust model is not. Apply the same provenance rules I use for NVIDIA Verified Agent Skills: no unsigned skills in production paths, internal registry for anything unattended.

If the recorded workflow touches MCP connectors, remember MCP EMA governs who reaches the connector, not what the skill does with it after login.

Common mistakes with recorded Computer Use skills

  • Scheduling replay daily without detecting UI layout changes.
  • Recording workflows that flash secrets or customer PII on screen.
  • Assuming regional Computer Use availability matches Record and Replay availability.
  • Skipping skill file review because "OpenAI generated it."
  • No rollback when replay clicks the wrong destructive button.

Conclusion

Record and Replay is the fastest path I have seen from demo to repeatable Computer Use skill. It is also a new supply-chain surface. Record in sandbox, inspect the skill like code review, replay with parameters, then schedule with monitoring. Skip inspection and you did not automate the workflow. You automated whatever the model remembered from one noisy demo.

Sources: OpenAI Codex Record and Replay documentation at https://developers.openai.com/codex/record-and-replay; Codex app 26.616 release notes.

The weekly take

Agentic AI patterns, delivered Thursdays

What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.

Shipping an agentic AI project this quarter?
Book a 30-min consult
Frequently asked

Questions readers ask about this post

Share this post
LinkedIn Facebook