In the last 15 years there’s been a welcome push for empiricism in public policy. It’s now more common for policy decisions — a new regulation, a big infrastructure project, a new support scheme — to be appraised and evaluated. We see a growing list of evidence-based policy successes, from sugar taxes, to 5p charges for carrier bags, to the UK’s increasingly powerful minimum wage. And there are now 1,000s of examples of policy delivery being improved through experiments.
There’s one aspect of evidence-based policy-making, however, that has proven trickier. This is the way we use evidence to solve complex problems at scale when we can’t regulate a solution. I’m thinking of sticky problems like drug addiction, poor childhood outcomes, or long-term worklessness.
For problems like these, empiricism often plays out in what I’d call the medical paradigm of policy-making. We try to develop a policy response the way we develop a medicine:
We identify the policy problem (the illness and its pathology)
We develop an ‘intervention’ to solve the problem (the medicine)
We evaluate the intervention (run a trial, ideally an RCT)
If the intervention works, we ‘roll it out’, meaning we scale it (mass produce the medicine) and allocate it (mass-prescribe the medicine)
In the gold standard version, we go further by trying to identify the ‘active ingredients’ of the policy intervention. We then try to make the intervention more affordable by scaling only the active ingredients. e.g. by codifying those ingredients in strict guidance, or in training for frontline workers.
The medical paradigm has been dominant ever since I got into policy-making in the early 2000s, and for a while before that. And for sticky problems I’d say it’s still the default way of thinking about policy development. Certainly it’s implicit in our language — think of how often you hear words like ‘pilot’, ‘roll out’, and ‘active ingredient’ in the context of policy.
All of which raises a question: when it comes to sticky social problems, do policy interventions act like medicines?
A truer way of thinking
For a while now there have been growing doubts about the medical approach to policy-making. We’ve seen lots of examples of interventions that were promising at first trial, like the Family Nurse Partnership, but that disappointed when scaled. And, as time has gone on, the absence of interventions that have followed that path — develop, trial, scale across the system — has become increasingly conspicuous.
The other day, a chilling thought occurred to me: what if there aren’t really any proper examples of policies that have worked this way? It felt like that moment in a film when you realise one of the main characters was a ghost all along. What if the idea of a ‘roll out’ exists only in our imagination?
Much as I like the drama of that image, I think it’s overstated. There are lots of examples of policy interventions that proved themselves in an initial trial and then scaled to multiple locations, albeit rarely (if ever?) reaching what we might think of as full scale across a system.
A lot of the best examples come from development, with the caveat that a lot of these are literally medicines. e.g. deworming. There are also examples from education, e.g. effective teaching techniques. And there are well-evidenced parenting programmes, and programmes to help people back into work, that now reach 1,000s of people in multiple locations.
Still, what has proven largely imaginary is the simple version of ‘pilot, test, scale’— and certainly those three little words: ‘roll it out’.
On the basis of the last 15–20 years, it seems to be extremely rare for policy interventions to work this way. Much too rare, at least, to justify the dominance of the medical paradigm.[1]
So, if ‘roll it out’ doesn’t really work, what’s a better way to think about evidence and scaling for sticky problems? A truer account goes something like this:
For any sticky social problem — drug addiction, poor child outcomes, long-term worklessness — there are now lots of approaches out there, functioning a bit like a marketplace, or maybe better an ecosystem.
Some programmes are well-evidenced, others less so, and they’re each used to differing degrees for a range of reasons (familiarity, inertia, persuasive evidence, incumbency, etc.)
When the government, or a charity, develops and trials a new intervention, it gets added to this ecosystem, and it’s a tiny part of the system. We don’t ‘roll out’ the new intervention — it’s rare in policy that we have the capacity to do so.[2] Instead, people who commission services, typically locally — in councils, hospitals, schools— cotton on to the new approach and use it, or not. So new approaches ‘diffuse’ and ‘spread’ rather than ‘scaling’.
People who work in and around these systems swap ideas, as in any profession — they go to conferences, switch employers, and read articles. New ways of working therefore spread between interventions — from one drug treatment programme to another — and they also seep from formal interventions into the underlying base of professional practice, e.g. the way drug treatment workers go about things. So we shouldn’t just think in terms of ‘programmes’, we should think in terms of skills, techniques, and professional norms. In fact most activity in any policy system is better described as professional practice than as a ‘policy intervention’.
The speed with which effective practices spread depends on a host of factors. In essence, professionals are doing their best but they’re busy and subject to all the normal limits of human beings — bounded rationality, habits, dogma, etc. So diffusion is a slow, social process.
In a word, it’s messy. And the mess is both inevitable — it’s how societies work — and good thing. Because a top down bureaucracy imposing an answer would be, yes maybe quicker to spread new ideas, but it would also be less adaptable, less capable of learning from frontline practitioners, and more prone to very harmful errors and crises (e.g. think of Horizon).
Still, in policy-making it’s not good enough to say ‘it’s complicated’. Of course it’s complicated. The question is: how should we go about things, and improve outcomes, given that it’s complicated?
I’m going to describe three approaches that improve on the idea of ‘roll it out’. The first is the most well-developed and is a more nuanced version of the medical paradigm. The second is a bit more of a stretch — an iterative way of working, adopted from the technology sector. And the third implies a more radical re-framing of what policy-making is all about.
1. Improve on the medical paradigm
First, we can think about evidence and scaling in a more nuanced way, based on what we’ve learned about the medical paradigm. Really this is already how experts in institutions like the UK’s What Works Centres think about evidence. So what we’re talking about here is bringing the wider policy-making profession up to speed with this way of thinking.
The shift I have in mind replaces a linear model, in which policy is seen as ‘developing solutions’, with a model in which we work to improve the health of the policy-making system. This means we replace those three sequential stages — ‘design, test, scale’ — with five complementary strands of work.
Innovate. This is how I’d describe adding new approaches to the system; it’s what we often call ‘pilots’. However, in this new model, we stop thinking of pilots as attempts to ‘solve the problem’— after all, thousands of smart people are already working hard to do that. Pilots are more about filling gaps. e.g. maybe current approaches are expensive, so we might take the best of existing programmes and adopt a new delivery model to make them cheaper. Or maybe we notice that some groups of people are underserved by existing programmes, so we might tailor an existing programme to better serve them. Hence thinking of this as innovation.
Evaluate. This is where we help people evaluate existing initiatives, e.g. with funding or evaluation expertise. We do this partly to understand efficacy but also to understand scalability or transferability. (NB: The latter has historically been under-emphasised, and recently more attention has been paid to the science of scalability — sometimes thought of as a shift in emphasis from internal validity to external validity.)
Improve. We also need to work hard to improve the delivery of current programmes. Senior decision-makers, especially politicians, often underplay improvement, I suspect because it’s less appealing than announcing a new solution. This helps to explain public policy’s failure to learn over time, especially between administrations. It’s also wasteful because improving existing initatives can be very cost-effective. Partly beacuse the fine-grained details of delivery can matter hugely for outcomes. But also because it’s often much cheaper to tweak an existing programme than it is to estabish a new one. So the lesson is that we should invest more in improvement. The ultimate goal is to run experiments continually at the edges of all major policy programmes, testing variations that can then be incorporated.[3]
Diffuse. This, to my mind, is the most overlooked aspect of evidence-based policy-making. Maybe because the route to impact is oblique. This is about helping good ideas spread across the system and it means doing things that sound micro or fluffy but that are really important. Communicating well about evidence —e.g. with well-written newsletters, compelling events, and content that busy practitioners remember (e.g. videos and animations) — rather than hiding evidence in long PDFs. It also means training local procurement teams to buy better services; helping professionals learn from each other, e.g. by supporting communities of practice; and encouraging a culture of innovation, e.g. by celebrating innovators with prizes.
Shape the system. This final strand of work, which is often missing entirely, is about closing the loop from practice back to the system. The basic model is: ‘try to improve things, notice why it’s difficult, and remove the barriers that made it difficult’, for example by changing regulation, or creating new institutions (or killing old ones). This is a key feature of Nesta’s work — alternating between hands-on innovation and system-level change. System-shaping is also about listening hard to reformers, who know best what stops good things from happening.
So that’s a more systematic way to think about evidence and scaling for sticky problems. As I said above, this is basically how experts in evidence-based policy already think. The issue is that it’s quite far from being the default mental model in government more widely, where policy-making is often still thought of as coming up with solutions to problems.
A note of hope, though, to send this section. The UK government is organising its long-term policy agenda around five missions and this is a big opportunity to take a more systematic approach to evidence and scaling.
We see a new generation of political leaders who think differently about what it takes to improve outcomes across a complex system. Most notably Georgia Gould in the Cabinet Office, who brings deep experience applying a mission-driven approach from her time leading Camden council.
More widely, the new government is setting up a delivery architecture for missions that could readily accomodate the approach to evidence and scaling I’ve described above.
A good mission can give clarity on the outcome we’re aiming for, alongside a clear theory of change, rooted in an understanding of the policy system in question — the key actors, and their key relationships. The work can start by synthesising the current state of evidence on what works, and being honest about critical uncertainties. It can map the best practices that are already happening, including scouring for positive deviants. And it can then work in the kind of systematic way I’ve described above, supporting a programme of experiments to drive innovation and spread good practice.
All of which is to say, it feels hopeful that we can move beyond ‘roll it out’ in a new phase in the push for empiricism. (We can argue another day about whether that would still constitute a medical paradigm.)[4]
But let me wrap up with two other approaches that push further, both of which are a bigger stretch, but which I think could offer bigger rewards.
2. Iterate like a tech company
Even as policy-making has upped its game on evidence, governments have fallen further behind the way data is used at the frontier of the private sector, especially in technology companies.
Google doesn’t ‘evaluate’ Gmail in the way we evaluate policy interventions. Like all good software, Gmail is improved constantly, using data generated as people use it complemented by rolling user research and testing. The very idea of running a big one-off evaluation —’did it work?’ — is bananas in the world of software development.
This points to an alternative paradigm for policy-making that is sometimes referred to as ‘test and learn’ (although this term is used in various ways). The purest model being that you build the smallest version of the thing — the Minimum Viable Product; try it out with a few people; improve it; and gradually expand it while continuing to improve it. Rather than running an evaluation, you use live operational data and user research to keep improving how the product/service works, including by comparing your current approch to alternatives in rolling A/B tests. (NB: Lots of other techniques for iterative policy-making are available, from prototyping to policy blueprinting to adaptive management practices.)
This might sound like an approach that only works with software and certainly there are challenges to iterative development in public policy, such as when outcomes have a long time lag (e.g. as they do in child development). But these problems are surmountable.
I won’t unpack this here because these approaches are well-covered in The Radical How, a report from Public Digital and Nesta that shares case studies of people using these approaches in government, outside digital teams. I also touched on the challenges in an earlier post, Move Fast and Fix Things.
My TL;DR view is that these techniques are grossly underused in the public sector to the tune of billions of pounds wasted and millions of people let down — illnesses not spotted, worklessness prolonged unnecessarily, crimes committed, etc. And since these approaches are now mature and well-codified — if anything they’re the norm in contemporary parts of the private sector — there is really no excuse for why they’re not adopted more widely. The main thing standing in the way is institutional inertia.
3. Spread it like fire
I’ll end by describing a bolder shift away from the medical paradigm, and even away from the basic idea that policy-making is about coming up with solutions to problems.
It’s important, I think, to consider bolder options because there are good reasons to think that the reason the medical paradigm struggles is because policies are fundamentally nothing like medicines.
Why are policy interventions different from medicines?
For one thing, when we’re tackling a sticky social problem, we know that agency is critical in a way it’s not, or not always, for medicine. In fact sometimes agency is the single most important ingredient in addressing a sticky social problem. Think, for example, of the stories people often tell about overcoming drug addiction — they get to a place in their lives where they know they need to quit, and that ‘knowing’ makes all the difference.
Because medicines are prescribed to people, the medical paradigm has a inbuilt blindspot for agency, and indeed it’s often actively disempowering. I often think, for example, of the way we talk about ‘parenting interventions’, a phrase surely designed to make any parent incandescent.
Beyond agency, another reason policy programmes are not like medicines is that the context in which they’re developed is really important.
When you develop a medicine, it doesn’t really matter where the medicine is made, so long as it has the right ingredients. But my reading of why so many promising policy interventions don’t scale is that it often turns out that the context in which the programme was developed is the active ingredient.
An example is the Harlem Children’s Zone, which was effective at least partly because it was developed in Harlem. It drew energy from the place, and the building, and the historical resonance of an inspiring black leader like Geoffrey Canada running an aspirational programme for children in Harlem. Scaling an approach like this by replication is like seeing a flower you like and cutting off its head to take home in your pocket.
So if agency and context — place, history, personality — is so important, how should we think about evidence and scaling for sticky social problems? Can we even talk about evidence and scaling?
I think we can, but only if we switch metaphors, and change where we’re putting our attention. Rather than scaling policies like medicines, we can think about spreading policies like fire, if fire was a good thing.
With fire, the conditions are everything. We don’t talk about ‘scaling’ or ‘replicating’ a fire, we talk about the conditions — heat, dryness, kindling. We ask: why did the fire break out? And of course we have, by now, a pretty sophisticated understanding of this — a science of the conditions.
How could we apply this to public policy? There are now at least three fairly well-developed approaches to spreading good social outcomes more like fire than like mass-produced medicines.
a. Fostering agency
We now have a range of ways to help communities come up with solutions to problems. From the 100 Day Challenge, to the Citizen Incubator model developed by Public Life, to initiatives like Our Future in Grimsby, among a wider menu of techniques. It turns out that the ideas communities come up with are often more creative and effective than ideas imported from outside —the ROI is often many multiples of the ROI of faded Xeroxed programmes. This makes sense when you think about it — as the Prime Miniser said recently on the topic of community agency, ‘people with skin in the game tend to come up with better answers’. And when people come up with great answers, our response doesn’t have to be: ‘let’s replicate that answer elsewhere’. Our response can be: ‘let’s recreate those conditions’.[5]
b. Investing in civic infrastructure
A community’s capacity to solve problems depends on certain enablers we can call civic infrastructure. Civic infrastructure can be physical: a building to host free events in. It can be relational/social: friendships formed in a community garden. It can be technological: a platform that makes it easy to ask neighbours for favours, or to give away unwanted food. And it can come down to people simply having time and headspace to contribute, which is why income security and time policies, such as volunteering leave, are important. Civic infrastructure is to a dynamic civil society what economic infrastructure — roads, railways, contract law — is to a dynamic economy. It enables people to use their ingenuity to solve problems. We can therefore enhance community power by investing more in civic infrastructure, from civic technology platforms to community buildings, events, and associations.
c. Creating institutions that bring out the best in people
This last one is the biggest stretch, and is a more nascent field, but might have the biggest upside potential.
Social outcomes emerge ultimately from the decisions we make every day, from big formal decisions — like how we allocate public money — to small informal decisions about how we treat each other. These decisions are guided by the institutions we live in, from formal insititutions like organisations and laws, to informal ones like social norms. And these institutions overlap and nest within each other, creating what we can think of as action arenas; an example is the action arena of social media. And we all know how differently we behave in diffferent action arenas — posting online, for example, or sitting in a car, versus chatting at the school gate.
The Nobel laureate, Elinor Ostrom, has a useful way of thinking about institutions as rule-based games that get repeated. We move through life in instutions, each trying our best (within constraints like bounded rationality) before repeating, and trying to get better. Ostrom spent decades working to understand institutions, so that we can improve them. Her guiding vision — which feels to me more resonant with every passing day — was that the ultimate goal of government, and of public policy, should be to build institutions ‘that bring out the best in people’.
This might, I know, sound like an unattainable ideal. But institution-building is a craft like any other aspect of public policy — or at least it should be. And I do think we’ve seen many good recent examples — think of the power of formal institutions like the Low Pay Commission that helped build consensus around the UK minimum wage, or informal institutions, like Pride festivals.
This is why it’s great to see organisations like The Institutional Architecture Lab starting to codify the craft of institution-building. This seems to me an important skill we need to strengthen in policy-making.
Where does this final, more enabling set of approaches take us? And how does it relate to the wider push for empiricsm in policy-making?
We might think it means rejecting much of the push for empiricism in policy-making, and adopting quite a different philosophy. We would think of policy-making not as ‘developing solutions to problems’, but as ‘creating the conditions in which good things happen’.
My argument, though, would be different to this. I don’t think we should replace the medical paradigm (or at least the good version I described above) with an approach in which we only work to enable local solutions. The argument I’d make instead is that the medical paradigm is only a small part of the possibility space of policy-development. It’s only one way to discover solutions to social problems.
In a sense, then, the argument I’m making is the same as the one made by Kanjun Qiu and Michael Nielsen, when they advocate for a discipline of metascience — a more systematic attempt to test different models for scientific discovery (different approaches to funding, and different institutions, and different modes of decision-making). Except that I’m making this argument for policy-making, rather than for science.
What I’m saying is that we’ve gotten stuck in a narrow sub-space of the possible approaches to policy-making, as pictured in the diagram, which I’ve adapted from Qiu and Nielsen. The point is not that other approaches, beyond our current ones, are better. Indeed we would expect other approaches — like fostering community agency — to be less mature, since we’ve spent less time developing them. The point is that those other approches underexplored, so we don’t know what we’re missing.
This is why I think these bolder approaches — such as focusing much more squarely on agency — are not a departure from empiricism in policy-making, they are the next chapter in the push for empiricism.
If we truly believe in empiricism in policy-making, we should be testing other ways of discovering good solutions to social problems —just as the UK government is now starting to test other ways of supporting discovery in science, by funding experiments metascience. We should develop — and I wince as I say this — a discipline of meta-policy-making.
Right, that must surely be time to wrap up. Shared, as ever, as food for thought. I’d be interested in critiques and takes on whether I’ve been too harsh, or too soft, on the medical paradigm.
For more on the iterative approach to policy-making, see The Radical How, a report from Public Digital and Nesta. And a post putting The Radical How in the context of missions; See also my earlier post, Move Fast and Fix Things. To stay in touch, you can follow me on Blue Sky, Medium, or Substack.
Footnotes
Remember I’m focusing on a subset of policy-making I describe as ‘sticky problems’, i.e. complex social/behavioural issues like drug addiction, or poor childhood outcomes, where we can’t regulate a solution. Beyond this, there are lots of examples of scaling through regulation, such as the 5p charge for carrier bags. And there are even more examples of improved delivery, not least thanks to the 100s of trials run over a decade and more by BIT.
I appreciate that ‘roll out-ability’ varies a lot by sector, e.g. in education we have levers like the National Curriculum to incorporate proven approaches; in healthcare we have institutions like NICE; in fairly top down systems, like Job Centres, we have operational delivery and policy levers, etc. My point is just that in most cases — childcare, social care, addiction services, domestic violence, reoffending— decision-making is very decentralised, and even where it’s not, we’re nowhere near the simplicity implied by those three little words: ‘roll it out’.
In order to run experiments, we also need to make sure our legal frameworks allows variation, which is often not currently the case, e.g. in tax and benefit law. We need either to rewrite these laws to allow experimentation or we need to create legal regimes of sandboxes — spaces in which we can safely experiment.
We could argue over whether this is still a medical paradigm. On the one hand it’s very different to how we develop, test, and scale most basic medicines. On the other hand, it’s arguably similar to more contemporary approaches to drug development, e.g. developing a portfolio of tailored drugs and treatment regimes, and the kind of iterative drug development we’ve seen with mRNA vaccines.
One reason it’s hard to use enabling approaches like these in government is that they suffer from the problem of obliquity. Policy-making is bad at taking oblique lines of attack; when we see a problem, we design a solution to that problem — e.g. we address loneliness by designing a loneliness intervention. The trouble is, no-one wants to come to your loneliness programme. They do want to come to the Christmas Party with the bawdy comedian, or to grow tomatoes in the community. But the comedian and the tomatoes are harder to fund, despite being more effective than the lonelines programme.