Introducing Claude Sonnet 5

ANTHROPIC.COM

Another week, another model release - and this one is a major version ‘bump’. Surely this heralds another step change in performance, speed and capability? Well …

This release post is rather brief in comparison to other major model announcements. Confusingly, despite the ‘5’ version number, this model is actually pitched as a lower cost alternative to Opus 4.8, giving similar performance.

However, given that these models all allow you to specify the reasoning effort, it isn’t entirely clear why you would use Sonnet 5 rather than Opus 4.8 at a lower reasoning level? Furthermore, looking at the Artificial Analysis pricing benchmarks, it is still a very expensive model.

In summary, it is not cheap enough to be the obvious everyday model, nor is it strong enough to replace Opus. So what exactly is it for?!

Claude Code Is Steganographically Marking Requests

THEREALLO.DEV

Claude Code is written in TypeScript, which is compiled to an obfuscated JavaScript bundle. This means that anyone can look at the code, but it isn’t that easy to understand. However, you might recall that the sourcemap files was leaked a couple of months ago, leading to someone opening a pull request on GitHub to ‘contribute’ the sourcecode back to Anthropic!

Anyhow, the point here is that the code for Claude Code is relatively easy to inspect and scrutinise. Which is why it was very surprising to find that it uses steganographic techniques (i.e. concealed messages) to detect whether the application is being run on a machine with a specific timezone, has a specific hostname or the the hostname contains certain keywords. If these conditions match, it makes very subtle changes to the messages it send to Anthropic, using various apostrophe encodings for example.

Why on earth is it doing that you may ask?

Well, the keywords, timezones and hostnames all point towards a rudimentary check for whether this user is working for a competing Chinese model lab!

We know that model distillation (i.e. using a frontier model to train your own), is a common technique, that allows competing model labs to rapidly play catch-up, and we can assume that Anthropic are exploring ways to prevent this. But doing this via simple keyword matches, in a client application where we can all see the code? Honestly, I cannot understand why they thought this was a good idea.

Worse still, I don’t expect enterprise consumers of Anthropic’s products to be that impressed that they are employing client-side checks like this. It doesn’t look terribly professional.

What it means to be a Mathematician when AI does the maths

IEEE.ORG

It was just three years ago that Goldman Sachs published The Potentially Large Effects of Artificial Intelligence on Economic Growth which predicted that:

“our estimates suggest that generative AI could expose the equivalent of 300mn full-time jobs to automation”

They also predicted the industry-specific exposure, with more manual jobs (e.g. building and construction) having very little exposure, while legal and office admin were the most exposed - with close to 50% of the tasks associated with these roles being AI automated in the future.

But what of our industry? Software engineering was bundled up with mathematics, and sat roughly in the middle (29% of tasks facing automation, vs. the average of 24%).

And here we are, three years later, and this feels rather conservative. And that feels like an understatement!

This is why I find this article so interesting. Mathematicians, as they watch LLMs continue to gain capability at an astonishing pace and tackling more and more complex problems, are asking themselves the question “what is the future role of humans in mathematics?”.

Human role in maths

At a recent symposium, Yang-Hui He of the London Institute for Mathematical Sciences declared, human mathematicians could become “priests to oracles”. In other words, AI becomes the all-powerful mathematical Oracle, while humans facilitate and spread the teachings and knowledge.

“Frowning, fidgeting, and exchanging furtive glances—the crowd’s unease was palpable”

I’m not surprised!

Redeploying Fable 5

ANTHROPIC.COM

And we’re back on Anthropic again … Fable 5 is back!

By means of a brief recap, Anthropic announced their Mythos model back in April - claiming that it’s cyber security capabilitie made it too dangerous to release, rolling it out to a limited number of companies via Project Glasswing. A month ago they release Fable 5, which was Mythos ‘under the hood’, but with sophisticated safeguards that detected whether the model was being used for cyber attacks (among other things). Just a few days after release, the US government effectively shut down Fable and we all lost access. But now … a few weeks later, it is back.

This blog post tell’s Anthropic’s side of the story, validating the rumours that an AWS team found a prompting attack that defeated Fable 5 safeguards. However, they note that:

“the reported technique did not expose any unique Mythos-level cyber capabilities. The behavior reflected a borderline case for Fable 5’s safeguard”

It is clear that Anthropic considered this an over-reacting. The rest of the post goes on to describe their safeguarding approach, which they have strengthened, and quite rightly calls for a more collaborative approach (across industry partners and government) on effective frameworks for evaluating model safety.

Anthropic have dialled up their safeguards for the Fable 5 re-release. I’m already seeing people report that the models capabilities have been ‘nerfed’ as a result.