Cursor 2.0, GitHub Agent HQ, Codex digs deep
A flood of announcements; escaping the valley of waiting
In this issue
News and announcements
Spotlight: Codex internal investigation
Words From the Frontlines
Great Reads
Theme of the Week: Escaping the Valley of Waiting
Tips of the Week
📣 News and Announcements
The dominating theme of the week was agentic coordination: how do we effectively coordinate and review all those background agents and their work?
At their big annual event, GitHub announced Agent HQ, “a single command center to manage multiple agents from anywhere.” They promise eventual support for agents from all major providers, integrated deeply into the GitHub platform.
Cursor 2.0 launched, revealing a new Agents view.
Command Center announced their alpha launch: a local tool focusing on multi-agent coordination and designed-for-humans tools for reviewing AI output.
Editor: lots of personal votes of confidence for their founder popping up on X. I like the expressed vision of the product. We’ll definitely be watching this one.
Another theme of the week was faster agents.
As part of Cursor 2.0, their in-house coding model Composer 1 was released. While reviews noted its intelligence is not quite frontier-level, the near-real-time speed unlocks an iteration loop that is reported as ground-breaking for close human-agent collaboration.
Not to be outdone, Windsurf announced SWE‑1.5, their own in-house coding model that boasts impressive speed with “near frontier intelligence.”
In other news:
More features new in Cursor 2.0: built-in browser, voice support, commands and rules can be shared via link for reuse.
Claude Code adds Plan‑mode sub‑agent; subagents are now resumable
Amp Weekly Update: new review tool, command palette (replacing slash commands), @mention other threads.
Devin has new computer use powers: full desktop control with recording
Ian Nutall open sourced the OpenSkills Universal Loader: install Claude Skills from Anthropic marketplace or GitHub repo, run the same Claude Skills via CLI across any agent (Codex, Cursor, etc)
🔆 In the Spotlight
The Codex team responded to a chorus of complaints about degradation of quality with a thorough investigation and report.
No single root cause explains perceived degradation. Instead, small cumulative issues (hardware variance, compaction behavior, subtle bugs) and shifting user patterns contributed to the mixed experiences. OpenAI implemented fixes, is improving compaction and infrastructure, and is committing to ongoing real-world performance monitoring through a permanent team.
Editor: my takeaway is that performance remains best in short, focused sessions with minimalist setups (ie., not too many MCPs, etc).
💬 Words from the Frontlines
Great 5-minute Getting Started With Amp post.
This side-by-side review of the top agents for PR reviews found CodeRabbit to be the best.
Based on 6 months of deep Claude Code usage, this blog details extremely advanced use of hooks to ensure correct activation of Skills, plus other advanced techniques.
Claude Code Cheat Sheet from Tom Dorr: 10‑level, end‑to‑end guide for setup, MCP integration, workflow automation, IDE/Git integration, performance, and enterprise usage.
Video Demonstration of multi-modal Codex that visually checks work; whiteboard sketches become runnable implementations
Addy Osmani’s 30 pro Gemini tips: GEMINI.md hierarchy, checkpointing, MCP servers, memory, extensions, and large‑context workflows for power users.
📖 Recommended Reads
Practical notes on setting up a code base for AI productivity by Simon Willison
Fascinating report of real-world usage out of the Amazon Bedrock engineering team. I recommend digging into it. Some key takeaways:
Literal 10x increase in throughput as validated by real metrics
In order to support that velocity, serious investment infrastructure needed across CI, e2e testing systems, etc.
The extreme velocity requiring new cadence in communication as decision making becomes a key bottleneck.
Big success using AI to generate supporting infrastructure such as faked external dependencies to allow local e2e build-time tests.
🛠️ Tips of the Week
Codex: add a rule to
AGENTS.mdto request escalated permissions after sandbox failures, preventing futile troubleshooting loops.Cursor: keep rules minimal; only add after 2–3 observed failures to prioritize high‑quality tokens over verbosity.
Matt Pocock’s favorite AI coding tip: Add the agents.md rule `
Be extremely concise. Sacrifice grammar for the sake of concision`.
🔦 Theme of the Week
There are two emerging modalities in AI-assisted software engineering, occupying opposite ends of the spectrum:
Long running background agents
Rapid back-and-forth collaboration between human and agent
The announcements this week neatly cluster around these two poles:
GitHub Agent HQ and Cursor 2.0’s new Agents view predict multi-agent coordination will be a permanent, central part of AI-powered software development.
On the other end, the Composer 1 and SWE-1.5 unlock near real-time back-and-forth. There has been a flurry of positive reviews of these fast models, with users reporting the speed brings a surprising boost in effectiveness.
This aligns with what Jason Liu posted recently about avoiding the “valley of waiting:” where you have to be in the loop but the turn around time is too long to stay in the flow.
➡️ Next time in TLE Weekly
My investigation into getting the Agent to test its own changes in the browser didn’t come to completion, so that will continue this week and come out in the next issue. In the meantime, I’ll leave you with something novel to try:
Simon Willison points out you can simply tell the agent “use Playwright Python” and it works pretty great, saving some context tokens any MCP server would consume. If you have app flows you need to repeatedly have the agent run, it could write the .py file once and then run it with minimized token usage on subsequent tests.
📝 Feedback?
I would love your feedback on this newsletter! Have thoughts, suggestions, criticism? Please reply directly to this email with any feedback you have. I’ll be reading all of them.
If you found value in this edition, by all means, please share it with your colleagues and on your socials.
Thanks for reading,
Benjamin Grosse
Editor
The Leveraged Engineer Weekly
