<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title><![CDATA[Jigar Joshi — Agentic AI Updates]]></title>
    <link>https://jigarjoshi.in/ai-updates</link>
    <description><![CDATA[What is shipping, changing, and worth watching in agentic AI — curated weekly.]]></description>
    <language>en</language>
    <lastBuildDate>Mon, 22 Jun 2026 00:00:00 GMT</lastBuildDate>
    <atom:link href="https://jigarjoshi.in/ai-updates/feed.xml" rel="self" type="application/rss+xml" />
    <item>
      <title><![CDATA[Fable 5 included subscription access window closes today: Anthropic planned June 23 removal to usage credits, but global suspension since June 12 still blocks all access]]></title>
      <link>https://www.anthropic.com/news/claude-fable-5-mythos-5</link>
      <guid isPermaLink="false">https://www.anthropic.com/news/claude-fable-5-mythos-5</guid>
      <pubDate>Mon, 22 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. The June 9 launch post promised Fable 5 at no extra cost on Pro, Max, Team, and seat-based Enterprise plans through June 22, with removal from those plans on June 23 and usage credits required after that. The June 12 export-control suspension disabled Fable 5 and Mythos 5 worldwide before most teams finished piloting, so this calendar milestone is mostly about routing and budget planning, not a live model you can call today. If you pinned fable-5 in configs before the suspension, keep Opus 4.8 as fallback and treat the June 23 credit shift as irrelevant until Anthropic restores access. When Fable returns, re-run cost-per-completed-task evals against Opus fast mode before promoting it back to default. Routing playbook: /blog/claude-fable-5-frontier-models-for-agent-builders.]]></description>
    </item>
    <item>
      <title><![CDATA[Claude Code ships Artifacts in beta: Team and Enterprise sessions publish live, org-private review pages that update in place at a claude.ai URL]]></title>
      <link>https://code.claude.com/docs/en/artifacts</link>
      <guid isPermaLink="false">https://code.claude.com/docs/en/artifacts</guid>
      <pubDate>Fri, 19 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Claude Code Docs. Artifacts turn terminal output into a single self-contained HTML page on claude.ai that republishes to the same URL as the session progresses, with version history and org-only sharing from the page header. Claude builds from session context (codebase, connectors, conversation), so PR walkthroughs, incident timelines, and dashboards do not need a separate export step. Pages run under a strict CSP with no external fetch and no backend, so they are review captures, not hosted apps. Requires Team or Enterprise, /login to claude.ai (not API-key auth), and Anthropic API routing (not Bedrock or Vertex). If your reviewers still get screenshots in Slack, pilot one incident or PR artifact this sprint and set retention in admin settings before wider rollout. Governance context: /blog/governing-agent-autonomy-auto-review-and-pre-push-review.]]></description>
    </item>
    <item>
      <title><![CDATA[Claude Code v2.1.183 tightens auto-mode safety: blocks destructive git resets, agent-amend commits, and infrastructure destroy unless you asked for them]]></title>
      <link>https://github.com/anthropics/claude-code/releases/tag/v2.1.183</link>
      <guid isPermaLink="false">https://github.com/anthropics/claude-code/releases/tag/v2.1.183</guid>
      <pubDate>Fri, 19 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Claude Code. Auto mode now refuses git reset --hard, git checkout -- ., git clean -fd, and git stash drop when you did not ask to discard local work; blocks git commit --amend on commits the agent did not make this session; and blocks terraform destroy, pulumi destroy, and cdk destroy unless you named the stack. The same release warns on stderr when a requested model is deprecated or auto-upgraded (including models set in agent frontmatter), fixes WebSearch returning empty results in subagents, and stops scheduled task and webhook deliveries from being treated as keyboard input that could approve pending actions in auto mode. If you run auto mode for throughput, upgrade to v2.1.183+ and keep secret scanning at commit time because these guards are convenience, not a compliance boundary. Guardrails guide: /blog/governing-agent-autonomy-auto-review-and-pre-push-review.]]></description>
    </item>
    <item>
      <title><![CDATA[GitHub Copilot usage metrics API adds ai_credits_used per user for enterprise and org-level attribution]]></title>
      <link>https://github.blog/changelog/2026-06-19-ai-credits-consumed-per-user-now-in-the-copilot-usage-metrics-api/</link>
      <guid isPermaLink="false">https://github.blog/changelog/2026-06-19-ai-credits-consumed-per-user-now-in-the-copilot-usage-metrics-api/</guid>
      <pubDate>Fri, 19 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: GitHub. The Copilot usage metrics REST API now returns an ai_credits_used field on each user in the single-day and 28-day user-level reports at enterprise and org scope, derived from the same consumption data as usage-based billing. It is a per-user total across all Copilot activity, not yet split by feature, model, or surface, and it is a metrics signal rather than an invoice line. If you govern agent spend across Cursor, Claude Code, and Copilot, wire this field into the dashboard you already pull from the usage metrics API and compare day-over-day credit burn by team before the June 29 Opus 4.6 fast deprecation reshuffles model routing. Billing split context: /blog/claude-agent-sdk-billing-june-15-checklist.]]></description>
    </item>
    <item>
      <title><![CDATA[OpenAI Codex app 26.616 adds Record and Replay on macOS: demonstrate a workflow once and Codex turns it into a reusable Computer Use skill]]></title>
      <link>https://developers.openai.com/codex/record-and-replay</link>
      <guid isPermaLink="false">https://developers.openai.com/codex/record-and-replay</guid>
      <pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[OpenAI]]></category>
      <description><![CDATA[Source: OpenAI. Record and Replay lets you perform a workflow on macOS while Codex watches, then packages the demonstration into a skill you can replay in new threads with different inputs (expense filing, issue creation, recurring report pulls). It ships alongside thread handoff between local and remote hosts and bulk actions on automation run history. Computer Use must be enabled, and initial availability excludes the EEA, UK, and Switzerland even though base Computer Use expanded to those regions on June 16. If you already script Codex automations, record one repetitive admin workflow this week and inspect the generated skill before trusting it unattended. Pairs with ChatGPT scheduled monitoring tasks for condition-driven reruns.]]></description>
    </item>
    <item>
      <title><![CDATA[Xiaomi MiMo V2-Flash and TTS endpoints auto-route to MiMo-V2.5 on June 18: legacy model IDs retire June 30]]></title>
      <link>https://mimo.mi.com/docs/en-US/news/latest/v2.5-news</link>
      <guid isPermaLink="false">https://mimo.mi.com/docs/en-US/news/latest/v2.5-news</guid>
      <pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Open Source]]></category>
      <description><![CDATA[Source: Xiaomi MiMo. If you pinned mimo-v2-flash or the TTS variant in agent configs, Xiaomi started auto-routing those IDs to the V2.5 series on June 18 (GMT+8) with V2.5 pricing, and full deprecation lands June 30. MiMo-V2.5 is a 310B sparse MoE (15B active) with native text, image, video, and audio input, 1M context on the flagship variant, and MIT open weights on Hugging Face for the base release. The migration is a one-line model string change on OpenAI-compatible endpoints, but re-run your eval suite after the swap because multimodal agent behavior shifted from the Flash generation. Grep every repo and secret store for legacy MiMo IDs this week. Routing economics: /blog/gemini-3-5-flash-vs-sonnet-4-6-routing-layer.]]></description>
    </item>
    <item>
      <title><![CDATA[MCP Enterprise-Managed Authorization is now stable: IdP-provisioned connector access replaces per-server OAuth consent for Claude, VS Code, and supported servers]]></title>
      <link>https://blog.modelcontextprotocol.io/posts/enterprise-managed-auth/</link>
      <guid isPermaLink="false">https://blog.modelcontextprotocol.io/posts/enterprise-managed-auth/</guid>
      <pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[MCP]]></category>
      <description><![CDATA[Source: Model Context Protocol. EMA makes the organization IdP the decision-maker for which MCP servers a user can reach. Admins enable connectors once through group and role policy; clients exchange an Identity Assertion JWT (ID-JAG, Okta Cross App Access) for scoped tokens without redirecting every employee through a separate OAuth screen per server. Anthropic ships it across Claude, Claude Code, and Cowork; VS Code supports it in the IDE; Okta is the first IdP; Asana, Atlassian, Canva, Figma, Granola, Linear, and Supabase are live with Slack coming. If MCP adoption stalled on "everyone authorizes Jira, Linear, and Figma individually," this is the enterprise baseline to pilot on one team before July 28 stateless transport work lands. Governance stack context: /blog/databricks-unity-ai-gateway-mcp-governance.]]></description>
    </item>
    <item>
      <title><![CDATA[Cursor Automations add the /automate skill, five GitHub review triggers, and computer-use demos for always-on cloud agents]]></title>
      <link>https://cursor.com/changelog/06-18-26</link>
      <guid isPermaLink="false">https://cursor.com/changelog/06-18-26</guid>
      <pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Cursor. Cursor 3.8 turns automations into something you can describe in plain language: /automate in a local session configures triggers, instructions, and tools for you. Five new GitHub triggers (issue comments, inline PR review comments, review submitted, review thread resolved/unresolved, workflow run completed) let you wire triage and auto-fix loops to real review events instead of only push hooks. Cloud agents kicked off by automations now ship screenshots or video demos by default via computer use, PRs open by default, and drafts can be saved mid-setup while you finish MCP auth. If you run Bugbot or Auto-review locally, pick one noisy PR workflow (failed Actions or unresolved review threads) and stand up an automation this week instead of babysitting it by hand. Guardrails context: /blog/governing-agent-autonomy-auto-review-and-pre-push-review.]]></description>
    </item>
    <item>
      <title><![CDATA[Cursor adds /in-cloud subagents, /babysit for PR iteration, and reliable handoff between local and cloud agent sessions]]></title>
      <link>https://cursor.com/changelog/cloud-in-agents-window</link>
      <guid isPermaLink="false">https://cursor.com/changelog/cloud-in-agents-window</guid>
      <pubDate>Wed, 17 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Cursor. Cursor 3.7 in the Agents Window is where parallel agent work gets practical. /in-cloud spins a subagent in its own VM and branch so CI fixes, investigations, and long explorations do not block your local session; /babysit keeps a cloud agent iterating on a PR until it is merge-ready. Cloud environment setup now captures a reusable snapshot in .cursor/environment.json so the next cloud agent boots in minutes instead of re-installing deps every time. Handoff between local and cloud is more reliable, so you can offload a long run, spin up several cloud agents, and pull one back down to test locally. If you already use background agents, try /in-cloud for one parallel task this sprint and commit the environment snapshot so the team shares the same cloud boot path. Pairs with Auto-review and pre-push /review: /blog/governing-agent-autonomy-auto-review-and-pre-push-review.]]></description>
    </item>
    <item>
      <title><![CDATA[Tenet demonstrates Agentjacking: a poisoned Sentry error report hijacks Cursor, Claude Code, and Codex into running attacker code with no repo compromise]]></title>
      <link>https://tenetsecurity.ai/blog/agentjacking-coding-agents-with-fake-sentry-errors/</link>
      <guid isPermaLink="false">https://tenetsecurity.ai/blog/agentjacking-coding-agents-with-fake-sentry-errors/</guid>
      <pubDate>Wed, 17 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Architecture]]></category>
      <description><![CDATA[Source: Tenet Security. Tenet Threat Labs showed that injecting a fake stack trace through a public Sentry DSN can redirect coding agents to execute attacker-controlled shell commands during normal triage. In controlled tests they saw 100+ agents act on injected errors across Cursor, Claude Code, and Codex, with an 85% success rate, and they open-sourced agent-jackstop drop-in configs to harden agents against untrusted telemetry. The attack needs no git write access and bypasses approval UX because the agent treats the error as ground truth. If your agents ingest Sentry, Datadog, or similar MCP feeds, treat observability output as untrusted input: scope MCP read tools, block auto-exec on triage prompts, and audit DSN exposure in public repos this week. Supply-chain frame: /blog/agentic-ai-supply-chain-security.]]></description>
    </item>
    <item>
      <title><![CDATA[Zhipu ships GLM-5.2: MIT open weights, 1M context, and Anthropic-compatible API for long-horizon coding agents]]></title>
      <link>https://z.ai/blog/glm-5.2</link>
      <guid isPermaLink="false">https://z.ai/blog/glm-5.2</guid>
      <pubDate>Tue, 16 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Open Source]]></category>
      <description><![CDATA[Source: Z.ai. GLM-5.2 is Zhipu's (Z.ai) latest 744B MoE flagship with roughly 40B active parameters per token, a solid 1M-token context window, up to 128K output, and High/Max thinking presets aimed at multi-step coding and agent loops. Weights are on Hugging Face under zai-org/GLM-5.2 with an MIT license; the API exposes glm-5.2 with Anthropic-compatible endpoints so Claude Code-style configs can swap ANTHROPIC_DEFAULT_OPUS_MODEL without rewriting the harness. Coding Plan subscribers burn quota at 3x during peak hours (14:00 to 18:00 UTC+8), so treat it as an Opus-tier path, not a cheap classifier. If Zhipu already sits in your OpenRouter mix, add glm-5.2 to long-horizon evals and measure cost per completed task against Kimi K2.6 and DeepSeek V4-Pro before promoting it to default. Routing frame: /blog/gemini-3-5-flash-vs-sonnet-4-6-routing-layer.]]></description>
    </item>
    <item>
      <title><![CDATA[IIT Bombay unveils BharatGen Param2: a 17B MoE with tool calling across all 22 scheduled Indian languages, plus Shrutam2 ASR and Patram document vision]]></title>
      <link>https://bharatgen.com/text-models/</link>
      <guid isPermaLink="false">https://bharatgen.com/text-models/</guid>
      <pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Open Source]]></category>
      <description><![CDATA[Source: BharatGen. At Bharat Innovates 2026 (June 14 to 16, Nice), IIT Bombay presented BharatGen as India's sovereign multilingual stack, not a single chatbot. Param2 is a 17B mixture-of-experts model (2.4B active per token) trained on roughly 22T tokens with reasoning, chain-of-thought, coding, and tool calling across English plus all 22 scheduled Indian languages. The same stack adds Shrutam2 for multilingual speech-to-text, Sooktam2 for zero-shot voice cloning TTS, and Patram for Indian document and form understanding. Weights and checkpoints live on Hugging Face under bharatgenai and AI Kosh. If you ship agents for Indian users, run your Hindi and regional-language eval harness against Param2 before paying Western-model prices for work that needs native script and tool reliability. Routing economics for mixed stacks: /blog/gemini-3-5-flash-vs-sonnet-4-6-routing-layer.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic pauses the Agent SDK billing split on launch day: headless Claude still draws from subscription limits for now]]></title>
      <link>https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan</link>
      <guid isPermaLink="false">https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan</guid>
      <pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Claude Help Center. The May 13 plan to move Claude Agent SDK, claude -p, Claude Code GitHub Actions, and ACP third-party apps to a separate monthly credit at API list rates did not take effect. Anthropic confirmed on June 15 that nothing changed: programmatic usage still counts against your Pro, Max, Team, or Enterprise subscription pool, there is no credit to claim, and overflow settings for a separate pool do not apply. If you started migrating cron jobs and CI agents to Platform API keys ahead of the split, you can keep that work for attributable spend, but you do not need to rush because subscription auth still works as before. Watch the Help Center for a revised plan with advance notice. Checklist context (still valid for model retirement the same day): /blog/claude-agent-sdk-billing-june-15-checklist.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic splits programmatic Claude off subscriptions: Agent SDK, claude -p, and Claude Code GitHub Actions now draw from a separate monthly credit pool]]></title>
      <link>https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan</link>
      <guid isPermaLink="false">https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan</guid>
      <pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Claude Help Center. Anthropic announced this split for June 15, then paused it the same day (see entry above). As originally described, headless and programmatic Claude usage would move off your Pro/Max/Team subscription limits into a separate monthly Agent SDK credit (sized to your plan tier) at standard API list rates, with no rollover. Interactive Claude Code in the terminal, Claude Cowork, and claude.ai would have been unchanged. The pause means none of this is in force yet. When a revised plan ships, audit which auth path your cron jobs, CI agents, and SDK scripts use and move production automation to a dedicated Platform API key so spend is attributable. Full checklist: /blog/claude-agent-sdk-billing-june-15-checklist.]]></description>
    </item>
    <item>
      <title><![CDATA[Claude Opus 4 and Sonnet 4 retire on the API today: requests to claude-opus-4-20250514 and claude-sonnet-4-20250514 now fail]]></title>
      <link>https://docs.anthropic.com/en/docs/about-claude/model-deprecations</link>
      <guid isPermaLink="false">https://docs.anthropic.com/en/docs/about-claude/model-deprecations</guid>
      <pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. The April 14 deprecation notice lands today. API calls to claude-opus-4-20250514 and claude-sonnet-4-20250514 fail outright (no silent fallback), with Anthropic pointing replacements at claude-opus-4-8 and claude-sonnet-4-6. This is easy to miss if you pinned model IDs in agent configs, eval harnesses, or GitHub Actions and have not deployed since spring. Grep every repo and secret store for the retiring IDs, swap to the recommended replacements, and re-run your eval suite before the next scheduled job fires. Same-day billing split for headless Claude: /blog/claude-agent-sdk-billing-june-15-checklist.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic suspends Claude Fable 5 and Mythos 5 worldwide after a US export-control directive]]></title>
      <link>https://www.anthropic.com/news/claude-fable-5-mythos-5</link>
      <guid isPermaLink="false">https://www.anthropic.com/news/claude-fable-5-mythos-5</guid>
      <pubDate>Fri, 12 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. Three days after launch, Anthropic disabled Fable 5 and Mythos 5 for all users globally, not just the foreign nationals the order nominally targeted. Other Claude models are unaffected. Anthropic says the underlying concern is a claimed jailbreak of Fable 5, and that the capability described is already available in other public models used routinely for defensive security work. For enterprise buyers, this is the governance reality check behind frontier releases: capability, retention policy, and geopolitical access can change faster than your rollout calendar. If you piloted Fable 5 this week, fall back to Opus 4.8 in routing configs until access is restored, and treat mandatory retention and export-control risk as first-class inputs in model adoption reviews. Full routing playbook: /blog/claude-fable-5-frontier-models-for-agent-builders.]]></description>
    </item>
    <item>
      <title><![CDATA[MiniMax M3 open weights ship on Hugging Face: 428B MoE with 1M sparse-attention context, native multimodality, and computer use]]></title>
      <link>https://www.minimax.io/blog/minimax-m3</link>
      <guid isPermaLink="false">https://www.minimax.io/blog/minimax-m3</guid>
      <pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Open Source]]></category>
      <description><![CDATA[Source: MiniMax. MiniMax followed its June 1 launch by publishing MiniMaxAI/MiniMax-M3 on Hugging Face and an MSA (MiniMax Sparse Attention) technical report, making M3 the third open-weight agent/coding contender this quarter beside DeepSeek V4 and Kimi K2.6. The stack is roughly 428B total parameters with about 23B active per token, up to 1M context via sparse attention (MiniMax claims large prefill and decode speedups at long context), plus native image and video input and desktop computer-use positioning. API pricing runs about $0.60/$2.40 per million tokens; self-hosting needs datacenter-class hardware and the MiniMax Community License, so read the license before any commercial rollout. If you are tuning a routing table, add M3 to evals for long-horizon coding and multimodal agent loops where Kimi swarms or DeepSeek cost efficiency do not cover vision. Routing frame: /blog/gemini-3-5-flash-vs-sonnet-4-6-routing-layer.]]></description>
    </item>
    <item>
      <title><![CDATA[Cursor makes Auto-review the default run mode: a classifier gate that scales agent autonomy by context instead of a global allow/deny switch]]></title>
      <link>https://cursor.com/blog/agent-autonomy-auto-review</link>
      <guid isPermaLink="false">https://cursor.com/blog/agent-autonomy-auto-review</guid>
      <pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Cursor. Auto-review is Cursor's answer to the approval-prompt tax on long agent runs. Shell, MCP, and Fetch calls that match your allowlist still run immediately; sandboxable calls run sandboxed; everything else routes through a fast classifier subagent that can allow, redirect, or surface an approval. Cursor reports roughly 7% of chats in Auto-review mode hit at least one interruption, versus about 40% of actions blocked under some enterprise allowlist setups. The SDK already exposes the same gate via local.autoReview and permissions.json steering instructions for headless runs. If your team lives in background agents, turn Auto-review on in Settings > Agents and tune block_instructions for destructive ops before you trust it as a safety boundary (Cursor is explicit it is convenience, not a hard security control). Full adoption guide: /blog/governing-agent-autonomy-auto-review-and-pre-push-review.]]></description>
    </item>
    <item>
      <title><![CDATA[Bugbot is now ~3x faster on Composer 2.5, and `/review` runs Bugbot and Security Review before you push]]></title>
      <link>https://cursor.com/changelog/bugbot-updates-june-2026</link>
      <guid isPermaLink="false">https://cursor.com/changelog/bugbot-updates-june-2026</guid>
      <pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Cursor. Cursor shipped a meaningful shift in when agentic review happens. Bugbot average review time dropped from about five minutes to about 90 seconds on Composer 2.5, with roughly 10% more bugs found per run and about 22% lower cost. The workflow change is /review (or /review-bugbot and /review-security): run the same agents locally before opening a PR, and if the diff matches, Bugbot on GitHub or GitLab skips a redundant scan and notes it already reviewed that patch. That closes the loop between local agent work and CI review, which is where most teams still lose signal. If you are on Cursor 3.7+, add a pre-push /review step to your agent workflow this week and measure whether PR comment noise drops. Pairs with Auto-review: /blog/governing-agent-autonomy-auto-review-and-pre-push-review.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic releases Claude Fable 5: a Mythos-class frontier model with tiered safeguards, mandatory 30-day retention, and $10/$50 per-million pricing]]></title>
      <link>https://www.anthropic.com/news/claude-fable-5-mythos-5</link>
      <guid isPermaLink="false">https://www.anthropic.com/news/claude-fable-5-mythos-5</guid>
      <pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. Fable 5 is Anthropic's first generally available Mythos-class model, with state-of-the-art scores on long-horizon coding, knowledge work, and vision, and pricing at $10 per million input tokens and $50 per million output. High-risk domains (cybersecurity, biology, chemistry, distillation) hit hard safeguards that fall back to Opus 4.8; Mythos 5 ships the same weights with some safeguards lifted for Glasswing defenders only. Two deployment details matter for buyers: even zero-retention enterprise agreements get a mandatory 30-day traffic retention window on Fable and Mythos traffic for abuse defense, and subscription access rolls in stages (included on Pro/Max/Team through June 22, then usage credits). If Fable lands in your routing layer, re-benchmark cost per completed task against Opus 4.8 fast mode before assuming the old split still wins. Full routing guide: /blog/claude-fable-5-frontier-models-for-agent-builders.]]></description>
    </item>
    <item>
      <title><![CDATA[Claude Managed Agents add cron scheduled deployments and vault-stored environment variables for CLI auth]]></title>
      <link>https://claude.com/blog/whats-new-in-claude-managed-agents</link>
      <guid isPermaLink="false">https://claude.com/blog/whats-new-in-claude-managed-agents</guid>
      <pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Architecture]]></category>
      <description><![CDATA[Source: Anthropic. Two operational primitives that remove glue code from production agent stacks. Scheduled deployments let a managed agent run on a cron expression with a fresh session each fire, so nightly syncs, weekly compliance scans, and digest jobs no longer need your own scheduler infrastructure. Vault environment variables extend credential injection to CLIs and SDKs: the sandbox holds a placeholder, the platform attaches the real secret at the network boundary on approved domains only, and the model never sees the key. Browserbase and KERNEL CLIs are called out as first-class, giving managed agents browser navigation without bespoke harness work. If you run recurring agent jobs on Anthropic infrastructure, migrate one external cron trigger to a scheduled deployment and one hardcoded API key to a vault env var this sprint. MCP governance patterns: /blog/databricks-unity-ai-gateway-mcp-governance.]]></description>
    </item>
    <item>
      <title><![CDATA[Codex CLI 0.139.0 adds standalone web search in code mode and preserves oneOf/allOf in MCP tool schemas]]></title>
      <link>https://github.com/openai/codex/releases/tag/rust-v0.139.0</link>
      <guid isPermaLink="false">https://github.com/openai/codex/releases/tag/rust-v0.139.0</guid>
      <pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[OpenAI]]></category>
      <description><![CDATA[Source: OpenAI. Two changes that matter if you run Codex against rich MCP tool catalogs. Code mode can now call standalone web search directly, including from nested JavaScript tool calls, and receive plaintext results without bolting on a separate search tool. Tool and connector schemas preserve oneOf and allOf through compaction, and large schemas keep more shallow structure, which fixes a class of MCP integrations that broke when connectors lost composition metadata. Sandbox execution also preserves approved escalation decisions more consistently. If Codex is in your CI or local agent loop, upgrade past 0.139.0 and re-test any MCP server that relied on schema composition. Typed tool contracts: /blog/tool-registry-design-for-agentic-ai.]]></description>
    </item>
    <item>
      <title><![CDATA[Google ships Agentic RAG on Gemini Enterprise with a Sufficient Context Agent that stops when retrieval is incomplete]]></title>
      <link>https://research.google/blog/unlocking-dependable-responses-with-gemini-enterprise-agent-platforms-agentic-rag/</link>
      <guid isPermaLink="false">https://research.google/blog/unlocking-dependable-responses-with-gemini-enterprise-agent-platforms-agentic-rag/</guid>
      <pubDate>Fri, 05 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Architecture]]></category>
      <description><![CDATA[Source: Google Research. Standard RAG retrieves once and hopes the context is enough. Google Research and Google Cloud's Agentic RAG framework, now in public preview as Cross-Corpus Retrieval on Gemini Enterprise Agent Platform, adds a multi-agent loop with a Sufficient Context Agent that re-searches until evidence is complete before synthesis. On factuality benchmarks they report up to 34% higher accuracy versus vanilla RAG, and on internal cross-corpus tests roughly 90% accuracy on FramesQA while routing across four corpora with latency within 3% of single-corpus runs. For enterprise buyers evaluating grounded agents, this is the pattern to benchmark against your current RAG stack: not more chunks, but an explicit "do I have enough context?" gate before the model answers. Full comparison: /blog/agentic-rag-sufficient-context-vs-vanilla-rag.]]></description>
    </item>
    <item>
      <title><![CDATA[Cursor SDK ships custom tools, nested subagents, JSONL stores, and auto-review for headless local agents]]></title>
      <link>https://cursor.com/changelog/sdk-updates-jun-2026</link>
      <guid isPermaLink="false">https://cursor.com/changelog/sdk-updates-jun-2026</guid>
      <pubDate>Thu, 04 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Cursor. The TypeScript and Python SDKs picked up the primitives production agent scripts were still hand-rolling. Custom tools register through local.customTools and expose to the model via a built-in custom-user-tools MCP path, so you no longer need a stdio server for every one-off capability. Nested subagents can spawn subagents to any depth automatically. JSONL and custom LocalAgentStore implementations replace SQLite-only persistence for CI-friendly, diffable agent state. And local.autoReview routes headless tool calls through the same classifier gate as the IDE instead of auto-approving everything. If you run SDK agents in CI or cron, upgrade @cursor/sdk or cursor-sdk, pin a JSONL store for reproducible runs, and turn autoReview on with block_instructions for destructive shell ops. Guardrails and pre-push review: /blog/governing-agent-autonomy-auto-review-and-pre-push-review.]]></description>
    </item>
    <item>
      <title><![CDATA[Alibaba ships Qwen3.7-Plus as a hybrid GUI-and-CLI agent: native screen grounding, 1M context, and Anthropic-compatible API endpoints]]></title>
      <link>https://www.alibabacloud.com/blog/qwen3-7-plus-multimodal-agent-intelligence_603206</link>
      <guid isPermaLink="false">https://www.alibabacloud.com/blog/qwen3-7-plus-multimodal-agent-intelligence_603206</guid>
      <pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Architecture]]></category>
      <description><![CDATA[Source: Alibaba Cloud. Qwen3.7-Plus is the multimodal sibling to Qwen3.7-Max: one agent loop that reads screenshots, operates GUIs, runs terminal commands, writes code, and self-verifies instead of splitting visual and CLI work across two models. Alibaba positions it for ScreenSpot Pro-class pixel grounding, AndroidWorld-style mobile tasks, and long demo runs (an 11-hour, 1,000-call app build loop). It ships on Model Studio with OpenAI- and Anthropic-compatible endpoints at about $0.40/$1.60 per million tokens; weights stay closed, so this is a routing and eval play, not self-hosting. If you benchmark computer-use agents against Claude or GPT Operator, run the same Playwright or desktop harness against qwen3.7-plus before assuming the Western stack still wins on grounding cost. Routing economics: /blog/gemini-3-5-flash-vs-sonnet-4-6-routing-layer.]]></description>
    </item>
    <item>
      <title><![CDATA[Claude Code 2.1.161 fixes parallel tool calls and adds OTEL resource labels: a failed tool no longer cancels the batch]]></title>
      <link>https://github.com/anthropics/claude-code/releases/tag/v2.1.161</link>
      <guid isPermaLink="false">https://github.com/anthropics/claude-code/releases/tag/v2.1.161</guid>
      <pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Anthropic. Two quiet but real fixes for anyone running tools in parallel. In Claude Code 2.1.161, a failed Bash command in a parallel batch no longer cancels the other calls, so each tool returns its own result independently instead of one failure taking down the whole fan-out. The same release threads OTEL_RESOURCE_ATTRIBUTES through to metric datapoints, so you can finally slice Claude Code usage telemetry by custom dimensions like team or repo, and the agents view now shows done/total when work is fanned out. If you run Claude Code with parallel tools or pipe its metrics into an observability stack, update and re-label your dashboards with the new resource attributes. The observability discipline this plugs into: /blog/agent-observability-stack-clients.]]></description>
    </item>
    <item>
      <title><![CDATA[The agent-to-agent layer consolidates: Microsoft Foundry adds A2A support at Build 2026 as the protocol passes 150 organizations]]></title>
      <link>https://devblogs.microsoft.com/foundry/agent-service-build2026/</link>
      <guid isPermaLink="false">https://devblogs.microsoft.com/foundry/agent-service-build2026/</guid>
      <pubDate>Tue, 02 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Architecture]]></category>
      <description><![CDATA[Source: Microsoft. Tool calling settled on MCP. Agent-to-agent communication is settling on A2A, and Build 2026 (June 2-3) is where that got hard to ignore. Microsoft Foundry Agent Service is adding A2A support in public preview, alongside hosted agents reaching general availability within 30 days and a new autopilot agents mode. A2A reached its v1.0 stable spec earlier this year, with signed agent cards for cryptographic identity, enterprise multi-tenancy, and a web-aligned stateless shape, and it now runs across Google Cloud, Azure Foundry, and Amazon Bedrock AgentCore with more than 150 organizations behind it. If you build systems where agents from different teams or vendors have to talk, stop hand-rolling that glue and evaluate A2A as your interop layer the way you already treat MCP for tools. Where cross-agent coordination fits in an architecture: /blog/supervisor-pattern-vs-handoffs-multi-agent.]]></description>
    </item>
    <item>
      <title><![CDATA[Microsoft Build 2026 puts agent governance front and center: cross-stack risk controls, open-instrumentation evals, and FIDES middleware against prompt injection]]></title>
      <link>https://devblogs.microsoft.com/agent-framework/microsoft-agent-framework-at-build-2026/</link>
      <guid isPermaLink="false">https://devblogs.microsoft.com/agent-framework/microsoft-agent-framework-at-build-2026/</guid>
      <pubDate>Tue, 02 Jun 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Microsoft. Build 2026 (June 2-3, San Francisco) frames Microsoft's agent push around governing agents you did not write. The Microsoft Agent Framework sessions center on governance patterns that span MAF and open-source agent stacks, evaluations and risk controls built on open instrumentation, and hosted-agent lifecycle management in Foundry Agent Service. The concrete security piece landed on May 20: FIDES, an information-flow-control middleware that limits what tainted data can do inside an agent to stop prompt injection from hijacking it. The cross-stack framing is the part worth noting, because most teams already run more than one agent runtime and cannot govern each in isolation. If you operate agents in production, put information-flow control and a cross-framework policy layer next to your current guardrails and audit trail, and benchmark what each actually blocks. Where the guardrail layer sits in an agent: /blog/agentic-ai-supply-chain-security.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic ships Claude Opus 4.8: a stronger frontier default, Claude Code dynamic workflows, and a fast mode that is 2.5x faster and about 3x cheaper]]></title>
      <link>https://www.anthropic.com/news/claude-opus-4-8</link>
      <guid isPermaLink="false">https://www.anthropic.com/news/claude-opus-4-8</guid>
      <pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. Anthropic released Claude Opus 4.8 on May 28, its most capable generally available model, with gains across coding, agentic, and knowledge work and more consistency on long-running tasks. Two things matter beyond the benchmark bump. Claude Code gains dynamic workflows that orchestrate work across tens to hundreds of background agents for large problems, and fast mode for Opus 4.8 runs about 2.5x faster at roughly a third of the previous fast-mode cost. List pricing holds at $5 per million input and $25 per million output tokens, with up to 90% off via prompt caching, and it is live on the Claude Platform, AWS, Google Cloud, and Microsoft Foundry. If Opus sits in your routing layer, re-benchmark cost per completed task with the new fast-mode economics before assuming the old cheap-model split still wins. Routing trade-offs: /blog/gemini-3-5-flash-vs-sonnet-4-6-routing-layer.]]></description>
    </item>
    <item>
      <title><![CDATA[OpenAI Codex CLI 0.135.0 hardens MCP: per-server OAuth, concurrent read-only tools, and connector schemas that stop breaking]]></title>
      <link>https://developers.openai.com/codex/changelog</link>
      <guid isPermaLink="false">https://developers.openai.com/codex/changelog</guid>
      <pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: OpenAI. OpenAI's coding agent shipped a release aimed squarely at MCP reliability. Codex CLI 0.135.0 adds per-server environment targeting and OAuth for streamable HTTP MCP servers, lets read-only MCP tools run concurrently when they advertise readOnlyHint, and preserves local $ref and $defs in connector tool schemas so they stop breaking under schema compaction. Hooks now receive richer context, including conversation history and subagent identity. The throughline is the same discipline I push for tools: typed, honest, parallel-safe contracts the agent can trust. If you run Codex against private MCP servers, the per-server OAuth and concurrency are worth the upgrade, and readOnlyHint is worth auditing across your own tools. Background on getting tool contracts right: /blog/tool-registry-design-for-agentic-ai.]]></description>
    </item>
    <item>
      <title><![CDATA[Cortex ships persistent memory for Claude Code: a local, pgvector-backed engine exposing 49 MCP tools]]></title>
      <link>https://github.com/cdeust/Cortex</link>
      <guid isPermaLink="false">https://github.com/cdeust/Cortex</guid>
      <pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Open Source]]></category>
      <description><![CDATA[Source: Cortex. Claude Code forgets everything between sessions, so every morning you re-explain the architecture, the decisions you made, and why you ruled options out. Cortex is an open-source (MIT) memory engine that closes that gap. It runs entirely locally over MCP stdio on PostgreSQL with pgvector, consolidates what you worked on in background cycles modelled on sleep, and exposes 49 MCP tools to read and write that memory. The v3.17 line adds autonomous wiki curation (a headless Claude agent that keeps per-project docs current) and, on May 27, a fix for an arbitrary-code-execution flaw in an untrusted dev-source install path (GHSA-gvpp-v77h-5w8g), so update past v3.17.2 and install through the plugin marketplace rather than a raw dev source. If session amnesia is your real reliability tax, this earns a retrieval benchmark against your own setup. I unpack the memory patterns it leans on in /blog/persistent-memory-for-coding-agents.]]></description>
    </item>
    <item>
      <title><![CDATA[RAGFlow v0.25.6 adds an autonomous browser component, a one-line @tool decorator, and dataset-level retrieval]]></title>
      <link>https://github.com/infiniflow/ragflow/releases/tag/v0.25.6</link>
      <guid isPermaLink="false">https://github.com/infiniflow/ragflow/releases/tag/v0.25.6</guid>
      <pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Open Source]]></category>
      <description><![CDATA[Source: RAGFlow. The open-source RAG engine from InfiniFlow keeps moving toward agents, not just retrieval. v0.25.6 adds a browser component that lets the agent navigate web pages on its own, a lightweight @tool decorator that registers a plain Python function as a tool in one line, and an AHC mode (Ψ-RAG) that builds RAPTOR indexes at the dataset level instead of per document for better recall. The quiet win is on cost: /chat/completions now accepts just the latest message instead of the full history, trimming tokens on every turn of a long agent loop. If you run a self-hosted, RAG-grounded agent, it is worth the upgrade for the @tool ergonomics alone, and the dataset-level indexing is worth a recall benchmark against your current setup. That @tool pattern is the same atomic, typed-contract discipline I push in /blog/your-agents-arent-broken-your-tools-are-three-questions.]]></description>
    </item>
    <item>
      <title><![CDATA[Field note: the agent reliability bug is almost always a tool contract, not the model]]></title>
      <link>https://jigarjoshi.in/blog/your-agents-arent-broken-your-tools-are-three-questions</link>
      <guid isPermaLink="false">https://jigarjoshi.in/blog/your-agents-arent-broken-your-tools-are-three-questions</guid>
      <pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Jigar Joshi. A pattern from three client calls this month: the agent is "unreliable", the team has rewritten the system prompt four times, and the actual fault is a tool that does too much, returns null on failure, or dumps raw DB rows into the context. An AI tool is a contract the model trusts, not a function you happen to expose. Three questions catch most of it before you build: is it atomic (one verb), what happens on failure (semantic errors, never null), and is it typed and token-efficient (a schema, never SELECT *). Full write-up with the before/after on a real manage_order tool: /blog/your-agents-arent-broken-your-tools-are-three-questions. One-glance visual version: /notes/agents-arent-broken-tools-are.]]></description>
    </item>
    <item>
      <title><![CDATA[Supply-chain worm harvests Claude Code credentials: poisoned Nx Console VS Code extension breaches GitHub, hits OpenAI and Mistral dev machines]]></title>
      <link>https://thehackernews.com/2026/05/github-internal-repositories-breached.html</link>
      <guid isPermaLink="false">https://thehackernews.com/2026/05/github-internal-repositories-breached.html</guid>
      <pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: The Hacker News. This is the one to read if your team runs Claude Code. A trojanised Nx Console extension was live on the VS Code marketplace for 18 minutes on May 18, and the Mini Shai-Hulud worm spread from there across 170-plus npm packages. GitHub has confirmed roughly 3,800 internal repos were stolen. The part that matters for agentic dev shops: the payload specifically targeted ~/.claude/settings.json and ~/.claude/mcp.json, and installed a persistence hook that re-runs the credential stealer on every Claude Code session start. This is the first supply-chain attack I have seen built specifically to harvest AI tool credentials and MCP server configs. Practical steps this week: rotate anything that lived in those files, audit your MCP server configs, pin extension and npm versions, and stop auto-updating editor extensions on machines that hold production credentials. Full hardening playbook across all four layers of the agent supply chain: /blog/agentic-ai-supply-chain-security.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic opens Claude Security in public beta as Project Glasswing reports 10,000+ critical vulnerabilities found]]></title>
      <link>https://www.anthropic.com/research/glasswing-initial-update</link>
      <guid isPermaLink="false">https://www.anthropic.com/research/glasswing-initial-update</guid>
      <pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. One month into Project Glasswing, Anthropic put a number on it: roughly 50 partner orgs running an unreleased frontier model (Claude Mythos Preview) in defensive workflows surfaced more than 10,000 high or critical vulnerabilities across widely used software, with Cloudflare alone reporting about 2,000. The shippable part for the rest of us is Claude Security, now in public beta for Claude Enterprise: point it at a codebase, it scans for vulnerabilities and proposes fixes. Two ways to read the takeaway, and both are true: defensive AI security tooling is now a product you can buy, and the same capability that finds these bugs is available to anyone else who trains a model this good. If you own appsec, pilot Claude Security on one repo this quarter and measure its hit rate against your current SAST.]]></description>
    </item>
    <item>
      <title><![CDATA[NVIDIA ships Verified Agent Skills: scanned, signed skill packages with machine-readable risk cards]]></title>
      <link>https://developer.nvidia.com/blog/nvidia-verified-agent-skills-provide-capability-governance-for-ai-agents/</link>
      <guid isPermaLink="false">https://developer.nvidia.com/blog/nvidia-verified-agent-skills-provide-capability-governance-for-ai-agents/</guid>
      <pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: NVIDIA. Skills make an agent more capable and also widen its attack surface, a risk the poisoned Nx Console extension made concrete this month. NVIDIA's answer is a verification pipeline: every skill is scanned, signed with a detached signature you can check after download, and paired with a skill card that spells out its provenance, license, dependencies, and known risks. It builds on the open agentskills.io spec, so the same SKILL.md is meant to verify across Claude Code, Codex, and Cursor. If you let agents pull skills from anywhere, this is the provenance model to adopt before one of them ships a backdoor. Pairs with the registry hygiene in /blog/tool-registry-design-for-agentic-ai.]]></description>
    </item>
    <item>
      <title><![CDATA[Cheap open-weight models now ~60% of OpenRouter usage: DeepSeek, Kimi and Zhipu reshape the cost floor]]></title>
      <link>https://www.buildfastwithai.com/blogs/ai-news-today-may-22-2026</link>
      <guid isPermaLink="false">https://www.buildfastwithai.com/blogs/ai-news-today-may-22-2026</guid>
      <pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Research]]></category>
      <description><![CDATA[Source: BuildFastWithAI. A CNBC investigation put numbers on something I have watched creep up in client routing logs all year: Chinese open-weight models (DeepSeek V3.2, Kimi K2.6, Zhipu GLM-5.1) are now around 60% of OpenRouter usage, up from roughly 1% in 2024. The cost spread they quote for an identical workload is the headline: about $4,811 on Claude versus $1,071 on DeepSeek versus $544 on Zhipu. I am not telling anyone to rip Claude out of the reasoning path. The point is that a flat single-model stack is leaving real money on the table, and the right answer is a routing layer that sends each sub-task to the cheapest model that can actually do it. Developer adoption tends to lead enterprise by roughly 18 months, so this is the curve to plan against, not react to.]]></description>
    </item>
    <item>
      <title><![CDATA[MCP locks its 2026 spec release candidate: the protocol goes stateless ahead of a July 28 final]]></title>
      <link>https://blog.modelcontextprotocol.io/posts/2026-07-28-release-candidate/</link>
      <guid isPermaLink="false">https://blog.modelcontextprotocol.io/posts/2026-07-28-release-candidate/</guid>
      <pubDate>Thu, 21 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[MCP]]></category>
      <description><![CDATA[Source: modelcontextprotocol.io. The biggest revision of MCP since the 1.0 spec, and the headline is that the protocol goes stateless. No initialize handshake, no Mcp-Session-Id, no sticky sessions: any request can land on any server instance, so a remote MCP server runs behind a plain round-robin load balancer. Extensions, Tasks, and MCP Apps move out of the core, authorization aligns properly with OAuth and OpenID Connect, and Roots, Sampling, and Logging are deprecated on a twelve-month runway. The final spec lands July 28, so there is a ten-week window to validate against real workloads before Tier 1 SDKs ship support. Pin your SDK versions now, grep your clients for the changed -32002 error code, and read the migration I would run: /blog/mcp-stateless-spec-release-candidate.]]></description>
    </item>
    <item>
      <title><![CDATA[Cursor 3.5 brings Automations into the Agents Window and adds multi-repo and no-repo cloud agents]]></title>
      <link>https://cursor.com/changelog/05-20-26</link>
      <guid isPermaLink="false">https://cursor.com/changelog/05-20-26</guid>
      <pubDate>Wed, 20 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Cursor. Cursor keeps pushing background agents from novelty toward routine. 3.5 moves Automations into the Agents Window, lets a single cloud agent span multiple repos, and adds no-repo automations that just watch an external signal (Slack, analytics, a cron) and act on it. The multi-repo piece is the one that matters for services teams: most real changes touch more than one repository, and an agent scoped to a single repo could never finish the job. If you wrote off Cursor background agents back at 1.0, this is the release to re-evaluate them on. Worth a half-day spike to see which of your recurring chores map cleanly onto an automation.]]></description>
    </item>
    <item>
      <title><![CDATA[Cohere releases Command A+ under Apache 2.0: a 218B sparse MoE built for agentic work that runs on two H100s]]></title>
      <link>https://cohere.com/blog/command-a-plus</link>
      <guid isPermaLink="false">https://cohere.com/blog/command-a-plus</guid>
      <pubDate>Wed, 20 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Open Source]]></category>
      <description><![CDATA[Source: Cohere. Cohere's first fully Apache-2.0 model, and it is aimed squarely at agents: 218B sparse MoE with 25B active, 128K context, 48 languages, and native citation grounding that links every claim to a source span. The number that matters for self-hosting is the footprint. W4A4 lossless quantization puts it on as few as two H100s, which moves a genuinely capable agentic model inside the reach of a single on-prem box. For regulated or sovereign deployments where Claude and GPT are off the table, this is now a real option to bench against, not a compromise. If you run an on-prem or air-gapped agent stack, add Command A+ to your routing eval this week and judge it on tool-call reliability and tokens-per-second, not headline scores. The cost-per-completed-task framing I use for these calls: /blog/gemini-3-5-flash-vs-sonnet-4-6-routing-layer.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic ships MCP Tunnels and self-hosted sandboxes for Claude Managed Agents at Code with Claude London]]></title>
      <link>https://www.infoq.com/news/2026/05/claude-mcp-tunnels/</link>
      <guid isPermaLink="false">https://www.infoq.com/news/2026/05/claude-mcp-tunnels/</guid>
      <pubDate>Tue, 19 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[MCP]]></category>
      <description><![CDATA[Source: Anthropic. Two features I have been waiting on. Self-hosted sandboxes (public beta) let tool execution run on customer-controlled infrastructure (Cloudflare, Daytona, Modal, Vercel) so sensitive data stays in your VPC. MCP tunnels (research preview) let agents reach private MCP servers via outbound encrypted connections so you do not have to open inbound firewall rules. Together they close the two biggest enterprise blockers on Claude Managed Agents. If your security team has been blocking a Claude rollout on "we cannot expose internal services or send our data through Anthropic's compute", this changes the answer. Pair with Databricks Unity AI Gateway from earlier this week and the enterprise governance story is finally complete.]]></description>
    </item>
    <item>
      <title><![CDATA[Google Cloud Next '26 ships Gemini Enterprise Agent Platform, 8th-gen TPUs (8t for training, 8i for inference)]]></title>
      <link>https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/google-cloud-next-26-recap/</link>
      <guid isPermaLink="false">https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/google-cloud-next-26-recap/</guid>
      <pubDate>Wed, 20 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Architecture]]></category>
      <description><![CDATA[Source: Google. Day after I/O, the enterprise story dropped at Cloud Next. Gemini Enterprise Agent Platform is positioned as "mission control for the agentic enterprise" with managed agents API, governance, and Workspace plus third-party connectors (SharePoint, OneDrive, ServiceNow). The numbers that matter for production stacks are on the 8i inference chip: Google claims 80% better perf-per-dollar on agentic workflows. If that holds up under independent benchmarks, the inference economics for high-volume agents shift meaningfully. Cheap inference is the moat being built right now.]]></description>
    </item>
    <item>
      <title><![CDATA[Google ships Gemini 3.5 Flash: outperforms 3.1 Pro on agentic and coding benchmarks, 4x faster output than other frontier models]]></title>
      <link>https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/</link>
      <guid isPermaLink="false">https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/</guid>
      <pubDate>Tue, 19 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Google. Flash is the model that shipped publicly at I/O. Pro stays internal for another month. Google's own framing is "frontier intelligence with action" which is them conceding the 3.x line was lagging on tool-use reliability and they have closed the gap. The 4x output-tokens-per-second claim is the number to test against your own agent traces. If Gemini sits in your routing layer for cost reasons, re-benchmark this week. Cost-per-completed-task on Flash may have flipped against Sonnet 4.6 for short-horizon work. Full migration recipe and the cost-per-completed-task maths nobody is doing in public: see the deep-dive at /blog/gemini-3-5-flash-vs-sonnet-4-6-routing-layer.]]></description>
    </item>
    <item>
      <title><![CDATA[Andrej Karpathy joins Anthropic's pre-training team with a mandate to use Claude to build the next Claude]]></title>
      <link>https://www.techtimes.com/articles/316852/20260519/karpathy-who-called-todays-ai-agents-slop-joins-anthropic-use-claude-build-next-claude.htm</link>
      <guid isPermaLink="false">https://www.techtimes.com/articles/316852/20260519/karpathy-who-called-todays-ai-agents-slop-joins-anthropic-use-claude-build-next-claude.htm</guid>
      <pubDate>Tue, 19 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: TechTimes. The bet, stated plainly: if pre-training can be accelerated by current-gen models running their own research loops, the next capability jump arrives faster than the last. Karpathy is the right hire to test the thesis because his "autoresearch" work has been the most public version of the idea. The cadence of Opus releases over the next 12 months is the number worth watching.]]></description>
    </item>
    <item>
      <title><![CDATA[Google launches Gemini Spark: 24/7 agentic Gemini app running on Cloud VMs, with Workspace, Canva, and OpenTable integrations]]></title>
      <link>https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/</link>
      <guid isPermaLink="false">https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/</guid>
      <pubDate>Tue, 19 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Google. Spark is built on Gemini 3.5 plus the agentic harness from Google Antigravity. The framing that matters is "runs on Google Cloud VMs, you do not need to keep your laptop open." Most agentic assistants today only run while you are watching them. Spark is the first major launch to make long-horizon, headless execution the default UX promise. Whether it ships reliably is the open question, but the bar for what a personal agent is expected to do just moved.]]></description>
    </item>
    <item>
      <title><![CDATA[Databricks Unity AI Gateway adds MCP service policies, LLM guardrails, per-user cost limits, and full payload logging]]></title>
      <link>https://www.databricks.com/blog/whats-new-unity-ai-gateway-service-policies-guardrails-observability-and-cost-controls-ai</link>
      <guid isPermaLink="false">https://www.databricks.com/blog/whats-new-unity-ai-gateway-service-policies-guardrails-observability-and-cost-controls-ai</guid>
      <pubDate>Tue, 19 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[MCP]]></category>
      <description><![CDATA[Source: Databricks. Governance has been the unsexy MCP story for a year and the gap has been obvious. This is the first vendor to put tool-access policy, cost limits, and payload audit in one product surface. If you have been hand-rolling per-team budgets on top of an MCP gateway, replace that with this and reclaim a week. Worth a serious eval if Databricks is already in your stack. Deep-dive on the four primitives, the build-vs-buy line, and the six-step migration I would run: /blog/databricks-unity-ai-gateway-mcp-governance.]]></description>
    </item>
    <item>
      <title><![CDATA[Yugabyte ships Meko: agent-native data infrastructure for multi-agent memory and shared knowledge]]></title>
      <link>https://www.yugabyte.com/blog/meko-data-infrastructure-for-agents-that-work-and-learn-together/</link>
      <guid isPermaLink="false">https://www.yugabyte.com/blog/meko-data-infrastructure-for-agents-that-work-and-learn-together/</guid>
      <pubDate>Mon, 18 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Architecture]]></category>
      <description><![CDATA[Source: Yugabyte. Persistent shared memory is the bottleneck nobody architects for early enough. Most multi-agent systems I have audited bolt memory on with Redis and discover at scale they actually needed transactional reads across agent boundaries. Meko is pitched at exactly that primitive. Worth a look before the next swarm goes from prototype to load test.]]></description>
    </item>
    <item>
      <title><![CDATA[Microsoft ships multi-model agentic security system, tops industry benchmark for SOC automation]]></title>
      <link>https://www.microsoft.com/en-us/security/blog/2026/05/12/defense-at-ai-speed-microsofts-new-multi-model-agentic-security-system-tops-leading-industry-benchmark/</link>
      <guid isPermaLink="false">https://www.microsoft.com/en-us/security/blog/2026/05/12/defense-at-ai-speed-microsofts-new-multi-model-agentic-security-system-tops-leading-industry-benchmark/</guid>
      <pubDate>Mon, 18 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Architecture]]></category>
      <description><![CDATA[Source: Microsoft Security Blog. The interesting bit is not the benchmark, it is the routing. Microsoft did not pin the agent to a single frontier model. Triage on a small model, deep-dive on mid-tier, judgment calls escalated up. This is the routing pattern I have been pushing into client architectures for a year. Validation from a security team is the credibility I needed for the next consulting deck.]]></description>
    </item>
    <item>
      <title><![CDATA[DeepSeek V4-Pro hits the production-ready bar on Huawei Ascend 950 silicon, ends the NVIDIA-only assumption]]></title>
      <link>https://fortune.com/2026/04/24/deepseek-v4-ai-model-price-performance-china-open-source/</link>
      <guid isPermaLink="false">https://fortune.com/2026/04/24/deepseek-v4-ai-model-price-performance-china-open-source/</guid>
      <pubDate>Mon, 18 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Open Source]]></category>
      <description><![CDATA[Source: Fortune. 1.6T parameters, 1M context, top of every open-weight benchmark for maths and coding, trailing only Gemini 3.1 Pro on world knowledge. The shift that matters: training and serving entirely on Huawei Ascend 950, not H100s. If your CTO has been holding the "we will wait for a real open-weight option" position, this is the model that ends the wait. Pricing structure also reshapes the input-cost conversation for high-volume agent workloads.]]></description>
    </item>
    <item>
      <title><![CDATA[Kimi K2.6 scales agent swarms to 300 sub-agents and 4,000 coordinated steps in a single run]]></title>
      <link>https://moonshotai.github.io/Kimi-K2/</link>
      <guid isPermaLink="false">https://moonshotai.github.io/Kimi-K2/</guid>
      <pubDate>Mon, 18 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Open Source]]></category>
      <description><![CDATA[Source: Moonshot AI. Most Western multi-agent papers stop at 8 to 12 sub-agents because coordination overhead blows up past that. Moonshot is publishing real engineering on how to keep coordination from collapsing at 300. Whether or not you ever ship that scale, the failure modes they document (context contamination across siblings, hand-off drift, supervisor bottlenecks) show up at 20 sub-agents too. Read the technical post before your next swarm design review.]]></description>
    </item>
    <item>
      <title><![CDATA[MCP TypeScript SDK and the visual Inspector both ship updates on the same day]]></title>
      <link>https://github.com/modelcontextprotocol/inspector/releases</link>
      <guid isPermaLink="false">https://github.com/modelcontextprotocol/inspector/releases</guid>
      <pubDate>Sun, 17 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[MCP]]></category>
      <description><![CDATA[Source: modelcontextprotocol.io. Same-day releases for the SDK and the visual testing tool say the ecosystem is finally on a regular cadence. If you have an MCP server in production, pin the SDK version before upgrading. The Inspector update is the bigger deal day-to-day: visual step-through against a live server beats console.log debugging by a long way.]]></description>
    </item>
    <item>
      <title><![CDATA[VS Code publishes the Copilot agent-harness internals and a VSC-Bench for evaluating it]]></title>
      <link>https://code.visualstudio.com/blogs/2026/05/15/agent-harnesses-github-copilot-vscode</link>
      <guid isPermaLink="false">https://code.visualstudio.com/blogs/2026/05/15/agent-harnesses-github-copilot-vscode</guid>
      <pubDate>Fri, 15 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Architecture]]></category>
      <description><![CDATA[Source: Visual Studio Code. A genuinely useful engineering explainer, not a launch post. The VS Code team breaks down the coding harness under Copilot agent mode: context assembly, tool exposure (read_file, replace_string_in_file, run_in_terminal), and the validate-then-execute-then-feed-back loop. Their framing is the one to internalize: the model is the engine, the harness is the car. They also describe VSC-Bench, a way to measure harness changes independent of the model. The takeaway for anyone shipping an agent: most of your reliability lives in the harness, not the model, so eval harness changes on their own before blaming the next model swap. Same lesson as tool-registry hygiene: /blog/tool-registry-design-for-agentic-ai.]]></description>
    </item>
    <item>
      <title><![CDATA[OpenAI ships ChatGPT personal finance with Plaid: bank-account connectors for 12,000+ institutions]]></title>
      <link>https://techcrunch.com/2026/05/15/openai-launches-chatgpt-for-personal-finance-will-let-you-connect-bank-accounts/</link>
      <guid isPermaLink="false">https://techcrunch.com/2026/05/15/openai-launches-chatgpt-for-personal-finance-will-let-you-connect-bank-accounts/</guid>
      <pubDate>Fri, 15 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[OpenAI]]></category>
      <description><![CDATA[Source: TechCrunch. Vertical agent workflows are where the real differentiation now lives. OpenAI picked finance first because the data shape (transactions, balances, categories) maps cleanly to tool calls. Expect Claude and Gemini to ship their own vertical packs inside a quarter; this is the new ground for the model wars.]]></description>
    </item>
    <item>
      <title><![CDATA[MCP Apps protocol repository updated, formalising the install + sandboxing shape]]></title>
      <link>https://github.com/modelcontextprotocol/ext-apps</link>
      <guid isPermaLink="false">https://github.com/modelcontextprotocol/ext-apps</guid>
      <pubDate>Fri, 15 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[MCP]]></category>
      <description><![CDATA[Source: modelcontextprotocol.io. Apps are MCP's answer to the long-running question of how third-party MCP servers get installed, sandboxed, and revoked inside a client. The new repo layout is the canonical reference. If you are building consumer-facing MCP integrations, read this before your security review does.]]></description>
    </item>
    <item>
      <title><![CDATA[GitHub ships the Copilot App: a native desktop agentic client with isolated, parallel coding sessions]]></title>
      <link>https://github.blog/changelog/2026-05-14-github-copilot-app-is-now-available-in-technical-preview/</link>
      <guid isPermaLink="false">https://github.blog/changelog/2026-05-14-github-copilot-app-is-now-available-in-technical-preview/</guid>
      <pubDate>Thu, 14 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: GitHub. GitHub shipped a native desktop client for agentic coding, separate from the IDE and the CLI, on macOS, Windows, and Linux. Each session gets its own branch, files, conversation, and task state, so you can run several agent tasks in parallel without them stepping on each other, then land each one through normal pull-request review. It is in technical preview: included on Copilot Business and Enterprise where preview features and the Copilot CLI are enabled, and waitlist-only for Pro and Pro+. The interesting part for builders is the isolation model, the same per-task worktree discipline that keeps parallel Claude Code runs from colliding. If you run more than one coding agent at once, trial it against your current setup and compare how cleanly it keeps work separated. Background on when a code agent is the right shape: /blog/code-agent-vs-skill-agent-when-to-pick-which.]]></description>
    </item>
    <item>
      <title><![CDATA[Microsoft open-sources Conductor: deterministic, YAML-defined orchestration for multi-agent workflows]]></title>
      <link>https://opensource.microsoft.com/blog/2026/05/14/conductor-deterministic-orchestration-for-multi-agent-ai-workflows/</link>
      <guid isPermaLink="false">https://opensource.microsoft.com/blog/2026/05/14/conductor-deterministic-orchestration-for-multi-agent-ai-workflows/</guid>
      <pubDate>Thu, 14 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Open Source]]></category>
      <description><![CDATA[Source: Microsoft. The counter-move to making everything an LLM. Conductor (MIT-licensed) routes between agents with Jinja2 conditionals defined in YAML, so the orchestration layer consumes zero tokens and the structure is fixed and diffable at definition time. You still mix models and providers per step, run steps in parallel, drop in non-LLM script steps (pytest, shell), and gate on human approval. This is the right tool for workflows with known structure (code review, research-then-synthesize, plan-then-implement), where a model-driven supervisor just adds cost and nondeterminism. It is the deterministic end of the spectrum I argued for in /blog/supervisor-pattern-vs-handoffs-multi-agent: use it when the ordering is known, and reach for dynamic handoffs only when it is not.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic ships 12 practice-area Claude plugins for corporate, regulatory, and employment law]]></title>
      <link>https://www.lawnext.com/2026/05/anthropic-goes-all-in-on-legal-releasing-more-than-20-connectors-and-12-practice-area-plugins-for-claude.html</link>
      <guid isPermaLink="false">https://www.lawnext.com/2026/05/anthropic-goes-all-in-on-legal-releasing-more-than-20-connectors-and-12-practice-area-plugins-for-claude.html</guid>
      <pubDate>Thu, 14 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: LawSites. First serious vertical play from Anthropic post-Code with Claude. Practice-specific plugins solve the real problem with legal AI: the same prompt for an M&A deal and an employment dispute will get a firm sued. Worth a look for legal IT teams that have been holding out on AI adoption because the generic chatbot demos were never going to clear conflict checks.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic re-enables outside-agent tools on paid Claude plans, now behind a separate credit meter]]></title>
      <link>https://venturebeat.com/technology/anthropic-reinstates-openclaw-and-third-party-agent-usage-on-claude-subscriptions-with-a-catch</link>
      <guid isPermaLink="false">https://venturebeat.com/technology/anthropic-reinstates-openclaw-and-third-party-agent-usage-on-claude-subscriptions-with-a-catch</guid>
      <pubDate>Thu, 14 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: VentureBeat. Reverses a March policy change. The new model: standard plans cover everyday Claude usage, agentic tool calls draw from a separate credit pool you top up. Easier to budget at the team level. If you turned off outside tools in March, switch them back on and re-measure cost-per-completed-task.]]></description>
    </item>
    <item>
      <title><![CDATA[Notion ships a Developer Platform with an External Agents API: third-party agents become tracked workspace collaborators]]></title>
      <link>https://www.notion.com/releases/2026-05-13</link>
      <guid isPermaLink="false">https://www.notion.com/releases/2026-05-13</guid>
      <pubDate>Wed, 13 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Notion. The headline everyone ran was "Notion has agents now." The part worth your attention is the API shape. The External Agents API (alpha) lets outside agents like Claude Code, Cursor, and Codex show up as first-class participants: @-mentionable, assignable to database items, and reviewable in-thread alongside human teammates and Notion's own agents. That is a different integration model from the usual "bolt a chatbot onto the sidebar," and it is the pattern I expect every serious workspace tool to copy. Workers (a hosted runtime for custom code) and a live database-sync layer round it out. If you build agent integrations, study this as a template for "agent as collaborator" rather than "agent as chat box," and watch who ships a comparable API next.]]></description>
    </item>
    <item>
      <title><![CDATA[Claude Code adds Workload Identity Federation: agents authenticate without a stored API key]]></title>
      <link>https://platform.claude.com/docs/en/manage-claude/workload-identity-federation</link>
      <guid isPermaLink="false">https://platform.claude.com/docs/en/manage-claude/workload-identity-federation</guid>
      <pubDate>Wed, 13 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. Claude Code can now authenticate through Workload Identity Federation. Instead of a long-lived API key sitting in a config file, the agent mints a short-lived token from a federated identity (set via ANTHROPIC_WORKSPACE_ID), scoped to a workspace with its own rate limits, billing, and OAuth scope. Read it against the Nx Console worm that harvested ~/.claude credentials later in the month: the durable fix for stolen keys is to stop storing keys. If you run Claude Code in CI or any shared or production environment, move it off static API keys onto WIF and the blast radius of a leaked config drops toward zero. How it fits the wider toolchain hardening: /blog/agentic-ai-supply-chain-security.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic ships tool-use telemetry — every selection is scored and logged at the model boundary]]></title>
      <link>https://code.claude.com/docs/en/monitoring-usage</link>
      <guid isPermaLink="false">https://code.claude.com/docs/en/monitoring-usage</guid>
      <pubDate>Wed, 13 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. You can now see why the model picked tool A over B before the call lands in your code. Stops you guessing in postmortems — the score deltas tell you whether the registry, the description, or the prompt was the weak link. Wire this into your eval harness today.]]></description>
    </item>
    <item>
      <title><![CDATA[Claude Code adds parallel sub-agent execution — multi-file refactors land in a single turn]]></title>
      <link>https://www.anthropic.com/engineering/claude-code-best-practices</link>
      <guid isPermaLink="false">https://www.anthropic.com/engineering/claude-code-best-practices</guid>
      <pubDate>Wed, 13 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Anthropic Engineering. Sub-agents run in parallel for independent reads, then serialise on the write turn. Cuts a 20-minute refactor to under 4. The orchestration matches the supervisor / swarm split — supervisor for the plan, swarm for the read phase, supervisor again for the write.]]></description>
    </item>
    <item>
      <title><![CDATA[MCP remote-server registry crosses 500 listed servers — a curated production-ready tier emerges]]></title>
      <link>https://registry.modelcontextprotocol.io/</link>
      <guid isPermaLink="false">https://registry.modelcontextprotocol.io/</guid>
      <pubDate>Tue, 12 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[MCP]]></category>
      <description><![CDATA[Source: modelcontextprotocol.io. Quality bar is finally visible. The curated tier filters out the 80% of MCP servers that hallucinate auth flows or skip pagination. If you are still shipping unscoped MCP integrations, this is the index to audit against before your next release.]]></description>
    </item>
    <item>
      <title><![CDATA[GitHub cuts agentic CI workflow costs 19-62% by pruning tools and moving data-fetch outside the LLM loop]]></title>
      <link>https://github.blog/ai-and-ml/github-copilot/improving-token-efficiency-in-github-agentic-workflows/</link>
      <guid isPermaLink="false">https://github.blog/ai-and-ml/github-copilot/improving-token-efficiency-in-github-agentic-workflows/</guid>
      <pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Architecture]]></category>
      <description><![CDATA[Source: GitHub Engineering Blog. The underrated takeaway: every unused MCP tool registration adds ~8-12 KB of schema overhead per API call. Their workflow had ~40 tools; pruning unused ones saved thousands of tokens per run before the agent even started thinking. Audit your tool registry this week.]]></description>
    </item>
    <item>
      <title><![CDATA[Claude Opus 4.7 ships with 1M-token context window in production]]></title>
      <link>https://www.anthropic.com/claude/opus</link>
      <guid isPermaLink="false">https://www.anthropic.com/claude/opus</guid>
      <pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. 1M tokens reliably means you can drop a whole codebase in without RAG for most repos. The trick is not stuffing — it is strategic placement plus prompt caching. Cache hit rate is the new throughput metric.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic ships "dreaming" for Claude Managed Agents: offline memory consolidation as an API]]></title>
      <link>https://platform.claude.com/docs/en/managed-agents/dreams</link>
      <guid isPermaLink="false">https://platform.claude.com/docs/en/managed-agents/dreams</guid>
      <pubDate>Wed, 06 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. Launched at Code with Claude, dreaming (research preview) is Anthropic's answer to the memory-rot problem I keep flagging: agent memory stores accumulate duplicates, contradictions, and stale entries over many sessions. A dream is an async job that reads a memory store plus up to 100 past session transcripts and produces a new, reorganized store, with duplicates merged and contradicted entries replaced by the latest value. The input store is never modified, so you review the output and keep or discard it. This is the first managed primitive I have seen for memory consolidation rather than just memory writes, and it maps onto the agentic-memory layer most teams skip. If you run long-lived agents, wire this into a nightly cleanup job; background on why the layer matters is in /blog/three-paradigms-of-llm-memory-implicit-explicit-agentic.]]></description>
    </item>
    <item>
      <title><![CDATA[Claude Managed Agents add multi-agent orchestration and Outcomes: a lead agent delegates to specialists, a grader checks the work]]></title>
      <link>https://claude.com/blog/new-in-claude-managed-agents</link>
      <guid isPermaLink="false">https://claude.com/blog/new-in-claude-managed-agents</guid>
      <pubDate>Wed, 06 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. The other half of the Code with Claude managed-agents drop, alongside dreaming. Multi-agent orchestration (public beta) lets a lead agent decompose a job and hand pieces to specialist subagents, each with its own model, prompt, and tools, running in parallel on a shared filesystem and feeding back into the lead's context, up to 25 concurrent threads and 20 subagent definitions. Outcomes (public beta) is the eval-in-the-loop piece: you write a rubric, a separate grader scores the output in its own context window so it is not swayed by the agent's own reasoning, and it sends the agent back to revise until it passes. The orchestration matches the supervisor pattern; the grader is the runtime cousin of an offline eval suite. If you have been hand-rolling a supervisor plus a self-check, evaluate whether these replace that scaffolding. Pattern background: /blog/supervisor-pattern-vs-handoffs-multi-agent.]]></description>
    </item>
    <item>
      <title><![CDATA[Claude Code adds project memory — persistent context that survives across CLI sessions]]></title>
      <link>https://code.claude.com/docs/en/memory</link>
      <guid isPermaLink="false">https://code.claude.com/docs/en/memory</guid>
      <pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Anthropic. Multi-day refactors no longer reset between turns. Stored in a per-project memory file you can audit. Side-effect: stale memories now cause subtle wrong-confidence bugs — verify before acting on recall.]]></description>
    </item>
    <item>
      <title><![CDATA[GitHub secret scanning goes GA inside the MCP server: catch leaked credentials before the agent commits]]></title>
      <link>https://github.blog/changelog/2026-05-05-secret-scanning-with-github-mcp-server-is-now-generally-available/</link>
      <guid isPermaLink="false">https://github.blog/changelog/2026-05-05-secret-scanning-with-github-mcp-server-is-now-generally-available/</guid>
      <pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[MCP]]></category>
      <description><![CDATA[Source: GitHub. Secret scanning is now generally available through the GitHub MCP server, so any MCP-compatible agent or IDE (Copilot CLI, VS Code) can scan for exposed secrets before it commits or opens a PR. It honors your existing push-protection settings at the repo or org level, which means the policy you already maintain now applies to agent-authored code too. Coding agents leak credentials in ways humans rarely do, like pasting a key straight into a config to make a failing test go green, so moving the check left to the agent boundary is exactly the right place for it. If you have GitHub Secret Protection, turn this on for every repo your agents can write to this week.]]></description>
    </item>
    <item>
      <title><![CDATA[MCP 1.0 ratified — official SDKs in Python, TypeScript, Go, Rust, Java, .NET]]></title>
      <link>https://modelcontextprotocol.io/specification</link>
      <guid isPermaLink="false">https://modelcontextprotocol.io/specification</guid>
      <pubDate>Sat, 02 May 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[MCP]]></category>
      <description><![CDATA[Source: modelcontextprotocol.io. The protocol is no longer a moving target. If you held off on building MCP servers waiting for the spec to settle — that wait is over. Enterprise auth profile and remote-server registry are the headline additions.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic publishes "Writing effective tools for AI agents" — official guidance for production agents]]></title>
      <link>https://www.anthropic.com/engineering/writing-tools-for-agents</link>
      <guid isPermaLink="false">https://www.anthropic.com/engineering/writing-tools-for-agents</guid>
      <pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Architecture]]></category>
      <description><![CDATA[Source: Anthropic Engineering. Confirms what the field has been saying for two years: one tool, one verb. Schemas with concrete examples beat schemas alone. Read it before you design your next tool registry — it will save your agents from themselves.]]></description>
    </item>
    <item>
      <title><![CDATA[Sonnet 4.6 update: 1M-token context at standard pricing, sharper tool calls, fewer retry loops]]></title>
      <link>https://www.anthropic.com/news/claude-sonnet-4-6</link>
      <guid isPermaLink="false">https://www.anthropic.com/news/claude-sonnet-4-6</guid>
      <pubDate>Fri, 24 Apr 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. Sonnet stays the workhorse for most production agent steps. The accuracy bump matters more than the price drop — fewer retry loops in real workflows means real cost savings nobody quotes in benchmarks.]]></description>
    </item>
    <item>
      <title><![CDATA[Haiku 4.5 in production — small-model speed, surprising tool-use chops]]></title>
      <link>https://www.anthropic.com/news/claude-haiku-4-5</link>
      <guid isPermaLink="false">https://www.anthropic.com/news/claude-haiku-4-5</guid>
      <pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Claude]]></category>
      <description><![CDATA[Source: Anthropic. Best dispatcher in the line-up right now. Use it for pre-classification and routing in multi-agent setups; reserve Sonnet/Opus for the actual reasoning. The latency drop changes what is feasible in real-time UIs.]]></description>
    </item>
    <item>
      <title><![CDATA[Cursor 1.0 stabilises background agents and ships a review-and-merge workflow]]></title>
      <link>https://cursor.com/changelog</link>
      <guid isPermaLink="false">https://cursor.com/changelog</guid>
      <pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Cursor. Background agents stopped being experimental somewhere around 0.7 — now there is a real PM-style workflow around them. Worth re-evaluating for IT services teams that dismissed the early versions.]]></description>
    </item>
    <item>
      <title><![CDATA[Anthropic research: when to use supervisor vs. swarm patterns in multi-agent systems]]></title>
      <link>https://www.anthropic.com/engineering/multi-agent-research-system</link>
      <guid isPermaLink="false">https://www.anthropic.com/engineering/multi-agent-research-system</guid>
      <pubDate>Wed, 15 Apr 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Research]]></category>
      <description><![CDATA[Source: Anthropic Engineering. Long-overdue practical write-up. TL;DR: supervisor for known workflows with clear hand-offs, swarm for exploratory tasks with parallelisable sub-goals. Do not mix patterns in one system — that is where reliability dies.]]></description>
    </item>
    <item>
      <title><![CDATA[OpenAI AgentKit / Agent Builder GA — pricing finally competitive for enterprise tool use]]></title>
      <link>https://openai.com/index/introducing-agentkit/</link>
      <guid isPermaLink="false">https://openai.com/index/introducing-agentkit/</guid>
      <pubDate>Sun, 12 Apr 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[OpenAI]]></category>
      <description><![CDATA[Source: OpenAI. Closer to feature parity with Claude tool use than the early preview suggested. Worth a side-by-side eval if you are running both — the cost-per-completed-task differential is smaller than vendor benchmarks claim.]]></description>
    </item>
    <item>
      <title><![CDATA[Langfuse adds per-agent cost attribution and step-level cache-hit telemetry]]></title>
      <link>https://langfuse.com/docs/observability/features/token-and-cost-tracking</link>
      <guid isPermaLink="false">https://langfuse.com/docs/observability/features/token-and-cost-tracking</guid>
      <pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Tools]]></category>
      <description><![CDATA[Source: Langfuse. Cache-hit telemetry is the missing primitive. You will discover that 30-50% of your agent traffic is repeat queries you should be caching. This is the report nobody wants to read but everyone needs to.]]></description>
    </item>
    <item>
      <title><![CDATA[Sarvam open-sources Sarvam 30B and 105B: Apache 2.0 reasoning models trained in India on IndiaAI compute for 22 Indian languages]]></title>
      <link>https://www.sarvam.ai/blogs/sarvam-30b-105b</link>
      <guid isPermaLink="false">https://www.sarvam.ai/blogs/sarvam-30b-105b</guid>
      <pubDate>Fri, 06 Mar 2026 00:00:00 GMT</pubDate>
      <category><![CDATA[Open Source]]></category>
      <description><![CDATA[Source: Sarvam AI. Sarvam released two reasoning models trained entirely in India on compute from the IndiaAI mission: Sarvam 30B for edge and conversational workloads, and Sarvam 105B as a mixture-of-experts flagship for complex reasoning and agentic flows. Both ship under Apache 2.0 with weights on Hugging Face and AI Kosh, support 22 Indian languages, and are already in production (30B powers Samvaad, 105B powers Indus). The stack is optimized end to end from tokenization through inference kernels for deployment from laptops to datacenter GPUs. If your agent serves Indian-language users and you have been routing everything through a Western API with translation glue, benchmark Sarvam on vernacular tool-calling and reasoning before assuming you need Sonnet for every turn. API access is on the Sarvam dashboard; self-host if data residency requires it. Routing frame: /blog/gemini-3-5-flash-vs-sonnet-4-6-routing-layer.]]></description>
    </item>
  </channel>
</rss>