Introducing Claude Opus 4.5
ANTHROPIC.COM
It’s been an exciting few weeks for model releases, with recent foundation model releases all having a strong focus on autonomous AI coding.
- 18th Nov, Google released Gemini 3.0, with leading results across almost every benchmark (SWE-Bench being the notable exception). They also released Antigravity, their AI-first IDE.
- 19th Nov, OpenAI released GPT-5 Codex Max, trained on agentic tasks across software engineering, math, research - with a focus on speed and efficiency
- 24th Nov, Anthropic released Claude Opus 4.5, achieving substantial improvements in complex code generation, autonomous agents, enterprise tasks, and long-running workflows
Earlier this year there was a lot of talk about models hitting the scaling laws, due to limitations in data and training time. While it is true that there are limits to what can be achieved through training alone, this year we’ve seen a lot of innovation in post-training activities; tools and computer use, improved reasoning and efficiency.
As a result, benchmark scores continue to improve at an impressive rate. Notably Opus 4.5 has now hit >80% pass rate on SWE-Bench, resulting in the creation of a newer and harder SWE-Bench Pro.
Putting Spec Kit Through Its Paces: Radical Idea or Reinvented Waterfall?
SCOTTLOGIC.COM
I recently put Spec-Driven Development (SDD) to the test by rebuilding a feature in my hobby app using GitHub’s Spec Kit. What I found was surprising: despite the promise of clean specifications and structured AI workflows, the real-world experience was slow, heavy, and far less effective than the lightweight iterative approach I normally use with AI coding agents.

In this post, I break down the experiment, share the data, and explore where SDD shines, where it struggles, and what this might mean for how we build software in the age of AI. If you’re curious whether SDD is the future—or just a fascinating detour—you might find this an interesting read.
Building an AI-Native Engineering Team
OPENAI.COM
AI augmented software development is much more than just writing code faster, it is about transforming the way that we approach the craft of software development itself.
Unfortunately this topic of conversation often veers into hype-fuelled nonsense!
I’m pleased to see OpenAI publishing a guide that looks at the full software lifecycle (Plan, Design, Build, … Maintain), considering the impact agentic AI has and what Engineers now “do instead”. This is a very practical way of looking at the transformative effects of AI.
Considering that OpenAI are a product company that sells AI Agents, there is a bit of overreach in some of their statements around what these coding agents are truly capable of, but the overall framework makes a lot of sense.
Google Antigravity Exfiltrates Data
PROMPTARMOR.COM
As we put more and more trust in agentic AI coding tools, exposing them to our codebases, SDLCs and internal data, security is going to become a massive issue. Unfortunately “prompt injection”, which is the most common attack vector, isn’t a solved issue.
This blog post outlines a successful exfiltration attack on Antigravity (the newly released vibe coding platform from Google) which causes it to leak credentials.
The attack was really quite straightforward:
- The user points Gemini towards an online guide, in their example a guide for integrating Oracle ERP’s new AI Payer Agents feature - but it could be any online resource
- The guide has a prompt hidden in 1pt font, which instructs the agent to send code snippets to an external service. However, this service requires AWS credentials, so the agent must read the users
.envfile. - Antigravity prevents the agent from reading sensitive files that are listed in
.gitignore, so this attack should be blocked - However, Antigravity simply writes a script to read those files directly
- The browser sub-agent accesses the external URL and sends the credentials via the querystring.
This attack is shockingly simple. I’d hesitate to call it a prompt injection, in that none of this is at all sophisticated.
Hiding a prompt in a 1pt font is as simple as it gets, and when it comes to circumventing the protection around reading sensitive files? The prompt just asked it to access the data and the agent ‘creatively’ circumnavigated its own protection.
I fear we are going to see a lot more attacks like this.
For now, I’d recommend being very careful about what you let your coding agent do. Follow the Principle of Least Privilege (both in terms of the services / tools you give the agent access to, and the environment you execute it within) and review scripts before they are executed.