Codex Record and Replay turns one demo into a Computer Use skill: how I inspect generated skills before trusting them unattended
Codex app 26.616 adds Record and Replay on macOS: perform a workflow once, Codex packages it into a skill you replay with different inputs. Thread handoff and automation run history ship alongside. Computer Use must be enabled. Here is the review checklist I run before any recorded skill runs unattended.
In this post (7 sections)
Introduction
The gap in most Computer Use rollouts is not "can the model click the right button." It is "can we encode the workflow without a week of prompt engineering." Record and Replay is OpenAI's answer: show once, replay with variables. I recorded an expense filing flow and a Jira ticket creation flow the week 26.616 shipped. The throughput win is real. So is the supply-chain risk if you trust the generated skill without reading it.
This post sits next to agent supply chain security and governing agent autonomy. Record and Replay creates skills automatically. Skills are executable instruction. That is the same trust boundary as loading an unsigned SKILL.md from the internet.
What Record and Replay does (release overview)
- Record a macOS workflow while Codex observes screen and input events.
- Package the demo into a Computer Use skill reusable across threads.
- Replay with different inputs (amounts, ticket titles, report dates).
- Thread handoff between local and remote Codex hosts in the same release.
- Bulk actions on automation run history for ops at scale.
- Requires Computer Use enabled; EEA/UK/CH excluded initially for Record and Replay.
Record and Replay vs hand-written skills
The inspection checklist before unattended replay
- 01Read the generated skill file end to endLook for hardcoded URLs, account names, and accidental keystrokes you did not intend to teach. Recording captures what you did, including mistakes you corrected mid-demo.
- 02Replay once in a sandbox accountNever first-run against prod finance or HR systems. I use the same sandbox discipline as Agentjacking triage splits: read-only or fake data first.
- 03Parameterize inputs explicitlyExpense amount, vendor name, ticket priority should be skill inputs, not buried in prose the model might mis-parse.
- 04Pair with scheduled monitoring, not blind cronOpenAI's scheduled monitoring tasks fit condition-driven reruns. Combine "replay skill" with "alert if UI changed" before daily unattended execution.
- 05Log every Computer Use runScreenshots and video from automation history are debugging aids, not audit logs. Export structured events to your observability stack.
How Record and Replay fits multi-vendor agent stacks
Many teams run Codex beside Cursor and Claude Code. Record and Replay is Codex-specific, but the skill trust model is not. Apply the same provenance rules I use for NVIDIA Verified Agent Skills: no unsigned skills in production paths, internal registry for anything unattended.
If the recorded workflow touches MCP connectors, remember MCP EMA governs who reaches the connector, not what the skill does with it after login.
Common mistakes with recorded Computer Use skills
- Scheduling replay daily without detecting UI layout changes.
- Recording workflows that flash secrets or customer PII on screen.
- Assuming regional Computer Use availability matches Record and Replay availability.
- Skipping skill file review because "OpenAI generated it."
- No rollback when replay clicks the wrong destructive button.
Conclusion
Record and Replay is the fastest path I have seen from demo to repeatable Computer Use skill. It is also a new supply-chain surface. Record in sandbox, inspect the skill like code review, replay with parameters, then schedule with monitoring. Skip inspection and you did not automate the workflow. You automated whatever the model remembered from one noisy demo.
Sources: OpenAI Codex Record and Replay documentation at https://developers.openai.com/codex/record-and-replay; Codex app 26.616 release notes.
Agentic AI patterns, delivered Thursdays
What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.