Management writer Simon Caulkin highlights some of the flaws and misconceptions associated with performance-related pay
When Tony Blair remonstrated some years ago, ‘No company would consider managing without targets,’ he was nearly right. Whether organisations using them call it that or not, results-based frameworks for performance management have come to dominate management thinking in many parts of the developed world, in both public and private sectors – particularly in the UK. Broadly speaking, using outcomes-based thinking managers first define the desired results, then cascade the requirements down through the organisation and hold people at different levels accountable for their part in making them happen. The logical extension of this kind of performance management is payment by results (PBR), which has been eagerly adopted in many areas of social policy (NHS, employment, foreign aid). Like so much else in public sector management, its origins are to be found in the private sector, where in the form of bonuses and incentives it is ubiquitous in top management and throughout the financial sector.
Paying people for the results they achieve – it sounds rational and plausible; indeed it sounds like management’s Holy Grail. But if that is the case, why are two of the most enthusiastic proponents of results-based management, the banks and the NHS, conspicuous for producing outcomes that are the opposite of those they were set up to deliver – impoverishing the world rather than enriching it in the case of the banks and killing patients instead of curing them in NHS hospitals? And is there an alternative?
The answers that emerged from a fascinating and disturbing event organised by consultancy Vanguard in Manchester in March were, respectively: because people persist, and have a strong vested interest, in believing that making it work is a technical problem that we can solve if we’re clever enough; and yes, there is an alternative, and not surprisingly it’s the opposite of outcomes-based approaches – starting not from the back (the result) but the front (what’s happening now).
The event was entitled ‘Kittens are evil’, to make the point that like questioning the ‘aaaah-quotient’ of everyone’s favourite pet, casting doubt on outcomes-based management is heresy. Being a heretic can be uncomfortable – see Galileo and Luther – but sometimes there’s no option but to speak out. The Manchester event was the second in a series designed as a rallying-call for campaigners to take the initial steps down that route, the first having taken place in Newcastle on October 2012, organised by the North of England Transformation Network, Newcastle CVS and the KITE Centre at Newcastle University Business School.
The first step in turning today’s heresy into tomorrow’s new paradigm is to unpick the assumptions underpinning the current one. These turn out to be fairly heroic.
Number one, says keynote speaker Toby Lowe, research fellow at Newcastle University Business School’s KITE Centre and chief executive of Helix Arts, is that outcomes are unproblematic to measure (access his articles in The Guardian and Public Money and Management here and here). In fact, they are so context-dependent that in practice accurate measurement for timely management is impossible: ‘Our desire for outcome information outstrips our ability to provide it. Information about outcomes can either be simple, comparable and efficient to collect, or it can be a meaningful picture of how outcomes are experienced… It cannot be both.’
The second assumption is that effect can be reliably attributed to cause. The conceptual flaw here, says Lowe, ‘is that it is based on the idea that outcomes are the result of a linear process from problem through intervention to positive outcome’. But a moment’s thought indicates that attribution can only be done at the price of massive simplification in which a myriad external and contextual factors are weighted away or simply ignored. In combination, these two flaws yield what Lowe calls social policy’s ‘uncertainty principle’: the more we know about the outcome the more complex it becomes and the less we are able to attribute it to a particular cause. Yes, it’s a paradox: the more we measure, the less we understand.
Meanwhile, the side effects of in terms of the distortion of practice and priorities is reflected in almost every day’s news headlines. When managers are tasked with delivering ‘outcomes’ that are beyond their control, notes Lowe, they ‘learn ways to create the required outcomes data by altering the things that are within their capacity to control’ through creaming and other means of making the numbers without improving the actual outcomes. A good example: when a Vanguard team looked at A&E casualties at three NHS hospitals, it discovered that the nearer people got to the four-hour wait limit, the more likely they were to be admitted to hospital until at 3 hours 59 minutes everyone was admitted, irrespective of need.
More subtly, management by results can corrupt behaviour at every step in the chain. One view of targets is that they are a ‘Nelson’s eye’ (‘I see no ships’) game in which governments in effect collude with the gamers by taking reported performance improvements at face value, or alternatively by insisting that gaming is only carried out by ‘a few bad apples’, in both cases preserving the evidence base. A similar thing can happen to front-line workers, with even more worrying results. As Lowe notes, the relationship between worker and client is subtly reversed. The worker no longer asks the client ‘How can I help you achieve your goals?’ Instead, they ask ‘How can you help me achieve my targets?’ ‘Evidence-based policy is sought by government, but mostly the result is policy-based evidence’, is how economist John Kay sums up this corrupting process.
If, as even proponents admit, the real evidence base in favour of results-based management is so thin, and the casebook of distorting behaviour, unintended consequences, and outcomes the opposite of those expected, so thick, why does its hold remain so strong that it is still the default discourse? The answer, says Lowe, is that the acknowledged problems ‘have been treated as practical obstacles which can be overcome when, in fact, they cannot be “solved” because they are intrinsic to the theory itself.’ To denial is added formidable vested interest in the shape of the IT-based performance management systems that govern the way call centres and customer-service organisations operate throughout the UK public and private sectors. A final factor may be the pervasive short-termism afflicting those who report on such matters as well as carry them out, with the result that the ideological underpinnings of such management are never challenged or indeed examined.
As systems guru Russell Ackoff explained, if you are doing the wrong thing, then doing it better makes you wronger, not righter. So the ‘efficiency’ measures and large-scale IT-driven change efforts undertaken as remedies demonstrably make things worse. On the other hand, even if you start off doing the right thing wrong, every small improvement is a step in the right direction. If, as the evidence strongly suggests, outcomes-based approaches are the wrong thing, what is the right one?
The right thing – and the next step to establishing a better, more productive paradigm – is, logically, to reverse the wrong thing and start at the other end. If results, as Lowe puts it, ‘are emergent properties of complex adaptive systems’, so we can’t measure performance against them, what do we use as measures instead? That, says Andy Brogan, the second keynote presenter, depends on the answer to an anterior question: why do we measure?
Organisations can use measures in two ways: to learn and improve; or to create accountability. As with Lowe’s information dichotomy (information about outcomes is either complete on collectable but not both), accountability and learning are mutually exclusive: ‘the minute you use measures to create accountability, you can’t rely on them for learning, says Brogan, ‘because their validity is destroyed’ – a perfect example of Goodhart’s Law in action (‘Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes’, popularly restated as ‘When a measure becomes a target it ceases to be a good measure’, and named after Professor Charles Goodhart, economist and adviser to the Bank of England).
To illustrate the fundamental difference between the two kinds of measure, take the example of a local authority child protection department. It uses two standard measures for assessing children at risk. For those judged to be in imminent danger, it must carry out an initial assessment within seven days in 80 per cent of cases. For a fuller core assessment once the risk level is understood, the standard is 80 per cent within 35 days. Note that the measures are both arbitrary and illogical ¬– seven days could be far too long for those in most danger, and the 80 per cent quota will be little comfort for the 20 per cent not covered. Be that as it may, the unit meets both standards, so under the red-amber-green (RAG) signalling system used to guide management priorities, it rates solid green: no management attention is required. The conversation among managers and workers is about making the numbers within the guidelines laid down in a compendious ‘yellow pages’, ie about demonstrating accountability. The result of success – meeting the standard – in this system is: relax, do nothing.
Now take the same department viewed according to a different measure: the end-to-end time from first referral to assessment completion – which of course is how it is experienced by the child or, say, the primary school teacher who has referred her to social services. The picture that emerges is very different. The ‘urgent’ assessment takes on average 16 days (‘far too long by any standard’) but can predictably take up to seven weeks, while the core assessment averages 55 days, not 35, but can equally take up to 161, or more than five months. Worse, because unbeknown to management the clock for the core assessment only starts when the case is formally opened and not when the initial assessment is completed, the true end-to-end time for the 35-day assessment can be anything up to nine months. ‘Now tell me Baby P and Victoria Climbié were one-offs’, says Andy Brogan, who collected the data, grimly. ‘They weren’t: they were designed in.’
So how could the department say it met the standards? Because the de facto purpose of its workers has become to avoid attracting managers’ attention by making the numbers, which, like customer service organisations everywhere, they have learned to do by recategorising cases and starting and stopping the clock in ways that are legitimate according to the official bible. ‘We’ve dialled down the value base or purpose and replaced it with making the measure,’ says Brogan. ‘It’s “for God’s sake hit 80 per cent because if not we’ll get hauled over the coals by management”. This is what happens when the measure is put there for accountability, not learning.’
To generate learning, a measure must be subordinate, and related, to purpose as defined from the customer or client’s point of view – in this case ensuring the child is safe in the shortest possible time. Then, the office conversation is about the available methods to do that, ie about improvement (recall that under the previous measure, the conversation is, ultimately, about how to do nothing). In all organisations, whether they are aware of it or not, there is a systemic relationship between purpose, measures and method. Where purpose comes first and measures are related to it, the job of workers and managers is to find methods that will meet the purpose better, as calibrated by the measures: a learning system. Where the measure comes first (as standards and specifications mandated by government and inspectors, for instance), it becomes the purpose and methods are correspondingly geared to meeting it, ie demonstrating accountability. Hence the paradox of public service organisations gaining three-star (or whatever the ranking system is) ratings and utterly failing their customers (Haringay social services), or bankers claiming their bonuses while pulling down the world; which in turn explains the more general one that in accountability systems no one is ever accountable (Mid Staffs, the banks), since they have always met the numbers.
Given how much attention is paid them (‘what gets measured gets managed’ is certainly true) it’s astonishing how much of the measurement that goes on in most organisations is useless. It is actually worse than that. As Deming rhetorically asked about targets, ‘What do ‘targets’ accomplish? Nothing. Wrong: their accomplishment is negative’. As in child protection, the wrong measures drive the wrong actions, which actually make matters worse, often by generating huge amounts of failure demand. In isolation static data points, averages, percentages and RAG systems say nothing about context, variation and predictability. Measuring to arbitrary targets and standards, as in the child protection example, keep managers blind to what is really going on. Measures of functional performance count activity, not attainment of purpose. These are all measures owned by the boardroom, of no help to anyone on the front line, where they should be used: why 80 per cent of assessments to be done in seven days? which 80 per cent? Perhaps most damaging of all are measures used as carrots and sticks to ‘motivate’ people – ‘Complete nonsense – I can’t overemphasize how flawed this idea is,’ says Brogan. These are accountability measures on steroids, making absolutely certain that the recipients will concentrate on the numbers, not the purpose. That is why they are called incentives. ‘Either people are motivated by purpose or they are motivated by the wrong thing,’ says Brogan. ‘Incentives aren’t the solution: they’re the problem’.
Organisations are too complex, human rationality too limited and contexts too infinitely variable for management ever to be scientific in the sense that numbers can substitute for judgment. Establishing purpose from which to derive appropriate measures sometimes requires difficult judgment calls. But this does not preclude scientific scrutiny of organisations and what they do as systems, albeit complex adaptive ones, so as to ‘understand and act on causes of performance variation in such a way that we can connect actions with the consequences they are having’, as Brogan puts it. This gives us back the lost idea of management as progress. By relating measures to purpose – what really matters to people? – and testing method against that – what is the current method achieving against purpose and why? Is it a good theory or a bad one? – managers grow the confidence to reject the ‘dangerous idiocies’ and haphazard, ideologically-inspired changes of management based on predetermined results and advance down a learning path in which the best we can do today is a certain step to doing it better tomorrow. ‘Centuries of science tells us that happens,’ says Brogan. It’s heresy today: but so once was the notion that the earth goes round the sun.
(This article originally appeared on Simon Caulkin’s website)