The Magenta Book just got a Test and Learn annex

Something fundamental just shifted in how UK government works

On 15 May 2026, HM Treasury quietly updated the Magenta Book.

If that sentence didn't get you excited, stay with me. The Magenta Book is the document that guides how the UK government develops and evaluates billions of pounds worth of policies, programmes and services. And buried inside the update is the most significant change to how the government does that work that I've seen in years.

A new annex called Test and Learn.

Alongside it, a refreshed supplementary guide on Handling Complexity in Policy Evaluation. Together, they describe a way of working that puts learning before scaling, builds policy with the people who use and deliver services, and tells civil servants – formally, in writing – that it's okay to find out something isn't working and change course.

This is the thing service designers, user researchers, agile teams and behavioural insights specialists have been making the case for, in scattered corners of government, for a long while now. Now it's written down in the document that shapes how the government works.

First, a quick word on the Magenta Book

If you don't work in central government, you may not have come across it. And even if you do, it's entirely possible you've heard of the Magenta Book but never taken a look.

Banner showing the Magenta Book from HM Treasury: Central Government guidance on evaluation from the Evaluation Task Force

The Magenta Book is HM Treasury's guidance on evaluation – it sits alongside the Green Book, which covers appraisal and value-for-money. Together they shape how government decides what to spend money on, and how to judge whether it worked.

It's been around since 2011. It was last meaningfully updated in 2020. The version published on 15 May 2026 is the biggest revision in years.

The closer you are to the Treasury, Finance or the analytical professions, the more likely you are to 'go by the book'. But out in mainstream policy-making land, a lot of people haven't heard of the Green or Magenta Book.


What's just changed

The gov.uk announcement describes the change as moving "from 'measuring what happened' to 'learning how to improve in real-time'." (Source: gov.uk news and Civil Service blog, 15 May 2026.)

For me, that's the headline. The government is officially telling us: don't wait until the end to find out whether or not it worked.


What Test and Learn means

The annex defines it as:

"Test and Learn is a way of working that centres around the use of feedback loops to check whether assumptions – that are key to policy success – are met in practice."

The shift isn't a new technique. It's a new norm, built on tried-and-tested methods.

The annex identifies seven key features:

  • Multidisciplinary teams. Policy, delivery, design, analysis, behavioural insights, frontline staff and people with lived experience, in the room together. Not one-directional hand-offs from policy to delivery to evaluators.

  • A shared, measurable outcome. Agreed at the start, not decided upon retrospectively – or never at all!

  • Grounded in evidence. From multiple sources. So you're working on the actual drivers of the problem, not the assumed ones.

  • Test the riskiest assumptions first. The ones with the weakest evidence base and the biggest consequences if you're wrong.

  • Iterate. Mixed methods, in parallel, with findings reviewed as they emerge and used immediately.

  • Supports both scale and local adaptation. Two distinct things: scaling tests whether an approach works consistently across contexts; adaptation makes sure it fits this particular place. Both stay iterative.

  • Be transparent. Make decisions, processes and progress visible to stakeholders and the public in real time, not after the fact.

And it lays out a four-stage framework: Explore, Co-design, Test, Grow.

Why this is a real shift

Section 6.4 on Co-design excited me the most. It calls for teams to maintain an iterative mindset, "treating each prototype as an opportunity to learn rather than as a final product."

That's the heart of user-centred service design and agile delivery. Build the cheapest thing that lets you learn the most. Bring in the people who will use and deliver it. Be willing to throw it away when it tells you something you didn't expect.

What's new isn't the practice. It's that it's now official guidance, sitting in the document used to shape evaluation across all of the civil service.

It also legitimises bringing the right mix of people into the room early. Multidisciplinary teams are described as

"working in partnership with appropriate stakeholders – including frontline staff, service users, delivery partners and others with relevant experience".

For anyone who has ever tried to argue their way into a policy team as a service designer, a researcher or a third sector partner – you now have published, HM Treasury-endorsed backing!


It pairs with the Handling Complexity guide

The Test and Learn annex doesn't sit on its own. It's designed to work alongside the supplementary guide on Handling Complexity in Policy Evaluation, which was updated on the same day.

This pairing matters.

Many of the hard policy problems the government is wrestling with – child poverty, climate, NHS pressures, regional inequality, housing – are complex. They don't behave linearly. They're shaped by context. The same intervention might land very differently in different places.

Test and Learn is about how you cope with that uncertainty as you design your solution. Handling Complexity is how you evaluate it honestly. They are close cousins.

The guidance covers the UK Civil Service.

What this means if you work in the Welsh Public Sector

A note for my Welsh audience, because this lands particularly well in Wales – and a word of context: most of my work is in Wales and England, so that's where this post is focused. 

Yes, this applies to you – even if you've never heard of it

If you work in Welsh Government, this is fairly direct. Welsh Government uses the Magenta Book.

If you work for an arms-length body – e.g., Natural Resources Wales, Sport Wales, Social Care Wales, Public Health Wales, or any other Welsh Government-sponsored body – the chain is longer but it still ends at the same point. 

Your Accounting Officer follows Managing Welsh Public Money, which points to the Green Book, which points to the Magenta Book. A mostly invisible chain! But when the Magenta Book changes, what's expected of you changes with it. 

Three documents, one chain. How the Magenta Book applies to Welsh public bodies. "Managing Welsh Public Monday (Welsh public sector rulebook)" points to "Green Book (Appraisal guidance)" points to "Magenta Book (Evaluation guidance)"

If you've never heard of any of this, you're in good company. I have friends working at sponsored bodies in Wales who'd never seen the Magenta Book referenced in their work. And if you have heard of it but tuned it out – assuming the Magenta Book is for analysts and others involved in evaluation – it's definitely time for a rethink.

Yes, it contains some technical content. But the opening chapters are written for everyone involved in shaping policies, programmes and projects – whether that's policymaking, evaluation or delivery.

The new annex pushes that even further: Test and Learn isn't just an analyst's tool. It's a way of working that needs policy, delivery and evaluators to work together.

And it fits with the way Wales already says it wants to work

The way Test and Learn is framed – long-term, collaborative, involving people who use and deliver services, focused on prevention – sits remarkably comfortably with the Well-being of Future Generations Act's five ways of working: long-term thinking, integration, involvement, collaboration, and prevention.

If you're shaping policy in Welsh Government, this annex gives you a sharper, more practical vocabulary for the kind of work the WFG Act already asks of you. The Future Generations Commissioner, the Auditor General for Wales and Senedd committees have all pointed to an implementation gap between the Act's aspirations and delivery on the ground. 

The Test and Learn annex offers a concrete framework for closing some of it.

The timing is striking

The new Welsh Government, which took office in mid-May, has explicitly committed in its First 100-Day plan to "adopt a test and learn approach to delivering on our key priorities, enabling action at pace and with flexibility". The same plan names a digitally-enabled Welsh "Cabinet Office" of policy, delivery and data specialists, measurable outcomes shared widely, and a National School of Government to grow skills across the Welsh public sector.

In a match made in governance nerd heaven, two days later, HM Treasury published the Test and Learn annex.

A minority government can't guarantee delivery on every commitment in a 100-Day plan, but it has named the approach. And there's an opportunity here to be ahead on putting it into practice, whatever part of the Welsh public sector you sit in.


A note for the third sector and delivery partners

If you work in the third sector, in social housing, in social enterprise, or any other organisation that ends up delivering or shaping public services – read this annex.

It repeatedly names "delivery partners" and "others with relevant experience" as people who should be in the room at the policy development stage. 

That's an open invitation.

The more fluent your team can be in this language – Theory of Change, the four phases (Explore, Co-design, Test, Grow), multidisciplinary working, testing assumptions – the more likely you are to be invited in earlier, rather than handed a finished policy to deliver.

Where to go next

If you want to dig into the detail:

This really is big news. I've spent over ten years making the case for this approach in rooms where it occasionally felt like a "nice to have". This official guidance ups the game.

Now we just need to make it "the way things are done around here".

 


Next
Next

What two cohorts taught me about service design maturity