Getting AI to Work in Complex Codebases

GITHUB.COM

It is no great secret that AI tools excel at tasks that are in some way a reflection of their training dataset. However, when applying these tools to complex codebases, those that use proprietary or unusual libraries / APIs, or less popular languages, can be a challenge. Often, in these cases, the complexity of task you can delegate to your AI tool is modest. Put simply, it needs a lot of hand-holding.

In order to execute more sizeable tasks (e.g. one-shot feature development, large scale refactor), you must supply the AI tool with a lot of instructions. Although there is a limit to the amount of information you can provide to a model (due to context window token limits), especially when you consider that they are stateless (i.e. you have to provide them the complete set of instruction on each and every invocation)

Finding a workable approach that balances the need for detailed instructions, and the limited context window, is something of an art form.

In this blog post, the author outlines a structured approach to this challenge, through a process of Research → Plan → Implement, and intentional compaction of context they report some impressive result.

We’re at the very early stages of working out how to make best use of the incredible power that AI tools can deliver. I think we’re going to see a lot of innovation here before a consolidated and optimised approach begins to emerge.

CompileBench: Can AI Compile 22-year-old Code?

QUESMA.COM

And once again on the topic of applying AI to complex, messy, real-world tasks …

CompileBench is a new benchmark suite for evaluating agentic AI models (i.e. models that iteratively tackle complex tasks), by challenging them to perform complex and messy tasks, for example “reviving 2003-era code, cross-compiling to Windows, or cross-compiling for ARM64 architecture”.

What I like about this approach is that there will likely be limited information relating to that specific task in their training dataset. And as a result, they will have to employ genuine problem solving to successfully complete the task.

codebench

You can review the results to see which model currently performs the best across these gnarly problems.

Winners and losers aside, I think it is amazing that an AI agent can actually complete these tasks. It shows genuine problem solving ability. However, theses tasks are rather narrow in focus, i.e. get something to build. They are tasks that are easy to describe and easy to evaluate.

Regardless, this is a fascinating and interesting piece of work.

How are developers using AI? Inside our 2025 DORA report

BLOG.GOOGLE

Opinions about the impact of AI on software development vary wildly, from the vibe-coding boosters, claiming x100 productivity boost, to the recent METR report (which I covered in issue #1) whose research indicated these tools make engineers 19% slower.

Over time, we’ll start to see more reliable research and hopefully reach a consensus. This report, from Google, is a big step in the right direction. Their study of 5,000 developers contains many interesting findings.

I’ll not into all the details, but interesting themes include:

  • the majority report that AI increases productivity and code quality
  • a significant share of developers still express only partial confidence in AI outputs, underscoring the ongoing need for validation and oversight
  • AI delivers the greatest impact at the organizational level when paired with strong systems and practices
  • usage patterns show that most developers still depend on chat interfaces, but IDE-native tools are growing in importance

The above just scratches the surface of the 140 page report.

For me, the most important take-home message is that while AI is boosting the productivity of individuals, we need to look at the “system” as a whole (people, process, technology) to fully realise these benefits.