Measures of Success

Many businesses have a need for metrics, ways of measuring what is being done.

You can't manage what you can't measure. This is treated as true, when it is perhaps one the great lies of business.

Despite that, there is a lot to be said for having metrics, provided attention is paid to what they reveal. Or, far more significantly, what they fail to reveal.

What all too often happens is that this metric tail ends up wagging the business dog. The metric—and there may be more than one, but very often there is a KPI, a key performance indicator—makes the life of the managerial staff simple and they fall into the trap of believing that this indicator tells the truth, tells all of the truth and tells all that they need to know. By which time the metric is in charge

When the metric is sufficiently important in an local environment behaviour adapts to perform in accord with the metric. So work which will not be reflected in the metric is skimped or not done; work which will clearly reduce the metric score is dumped on targeted people or dumped altogether; it causes conflict as people fight over the obvious cherries that will add well to the metric. And throughout this, and way too often at no time does anyone dare question whether the metric is as all-seeing and wonderful as management obviously thinks it is.

In the context of the school environment, where the metric is resulting grades, there are fights to persuade the academically able to join your course, to push the disadvantaged and noticeably less able away from your course, so that some subjects become dumping grounds, especially those that are perceived as requiring different skills. So all of those with difficulty in English as a second language are pushed towards courses such as Art, graphics, design and technology, occasionally maths — places where the perception is that language is less important. Whether this deals with the underlying problem is irrelevant, it moves the problem student (this student is a problem for me and my subject) away from your departmental protected space (Chapter reference) and that is enough to protect the measure of success. Someone else's problem.

So this exemplifies several flaws: conflict, gaming the system, masking problems and, often, unintended consequences.

In the same scenario and later in the course, there are efforts to persuade candidates to withdraw so as to protect the subject score; there are delicate nudges to scores around the grade boundaries, there are grade estimates that are played (gamed) in line with departmental policies. These moves demonstrate a different sort of conflict, between protecting an interest (the grade score for the department and the education of the student (including the possible benefit of a failure as prompt into different behaviour) which generally brings into question the ethos of the school, the integrity of the staff and falls into what I call the 'political' class of decision.

No part of such behaviour measures whether the students have learned enough to cope with the changed style of learning required for the next phase of their education. The metric doesn't measure ability to adapt, to work independently or in a team, to have ideas. Instead it measures the ability to pass exams and, often ability to conform and to follow instruction. Which in turn begs the question whether this is what we want from education.

I worked in a quantity surveyors office for several years, but actually the relevance is that this was an office job. At the end of every week, but in practice every month, we each filled in a timesheet that allocated work to jobs. It took a very short time to learn that having time left over was considered very bad and only a little longer to discover that a job on which very few people worked was going to be scrutinised more closely, so in the same way one soon learned to identify certain large contracts as the place where you dumped all the time lost, including the time spent filling in the time sheet. If I worked only on one job that month then obviously I must have spent all of those hours on that job. But this is not true and, chasing that idea and clocking myself as off-task when true, I soon discovered that 90% on-task was hard work —and rare. Time would disappear into coffee breaks, what Americans call water-cooler conversations and the genuinely approvable business-related conversations such as sharing experience, learning what was better performance (training, in effect). So my whole month on Job A was, at best, 90% accurate and probably nearer 75% of a month, after subtracting me helping others with calculations (the only mathematician on the staff, used as substitute calculator) or being sent out for sandwiches, off to do the photocopying and so on. Things improved when the boss was persuaded that perhaps there was some general office administration that we ought to be booking time to, but he visibly viewed this as an admission of failure in some way.

It was later, working on a civil engineering site, that the phrase 'hospital job' cropped up, describing a task which, on the critical path analysis, had such slack in it that the resources allocated to it would cause no problems if they broke. So, for example, the JCB that needed a lot of maintenance would be used to do the support work on the land compulsorily purchased (from farmers, in this case); it didn't matter how long it took. These were known as 'hospital jobs' as if the fault-prone JCB was in hospital Which had me confused, because the jobs I'd learned to throw all the leftover hours that had to be attributed to a big enough job were, you guessed it, on hospitals.

With all such metrics, we should ask, frequently, whether this is correctly measuring what we think it does. If we are clear what it is that is being measured, we are then able to resist attributing the resulting measure with properties it doesn't reflect. For example, in education there is a natural correlation between results gained and the standard of the incoming student. If a 'good' school produces superb results, is that a direct result of the teaching, or do the entrance criteria factor into this?

I worked at a school where a (very small) part of our entrance requirement was an IQ test. We had learned that an IQ below 88 pointed at someone we could not successfully teach. I'm not saying that such people cannot be taught, I'm saying that at this particular school the way we worked failed these students. of course, measures of IQ only form one part of the picture and so one learned to look at the students in school with IQs at the bottom of the spectrum, to wonder what we could do to better their learning (without, ideally, spending more than the matching income). And as a result we showed that we were successful with IQs of 90 and upwards and that IQs of 88-90 consistently gave us problems both in teaching and in learning. At no time did this simple test become the sole factor, but over the years I was on the staff the yardstick was tested often enough to show that the below-88 score meant that this child would not succeed. Which being the place we were, was couched as we will fail as educators, not that the child would fail. Not suitable for here and here not suitable for you, if you like.

At another site and in another country our significant test lay in a correlation between scores in English and Mathematics. We were quite careful to couch the mathematics to test necessary skills and to evaluate as best we could ability to think without depending on fine meaning of language, while in the English we explored the adequacy of language and, again, we were up front about the vocabulary size we needed so as to be able to teach in English. Some people took several tests and showed that they were progressing to the standard we required, so we repeated the test at entry. yes, we wrote many equivalent tests and we worked hard to keep them secure. We then produced a correlation chart for each year-group and for course applicants on which we could see quite clearly the elliptical region that encompassed those with whom we could succeed (in reaching the desired standards, in this case achieving good enough grades to enter university in the US or UK. For a given success in maths there was a minimum of English and vice versa. We has so many students on the edge of the envelope that it became relatively easy to show a fairly fine distinction between probable success and failure, an ability for our processes to cause learning to occur sufficient for the targeted grades and result. No it was not the only criterion, but it have us, like the IQ test above, a bottom edge. Unsurprisingly, the marketing department found it very easy to come up with candidates who fit their requirements (able to pay the fees) but not ours. It mattered not how much we complained, every year we would be overridden and more students below the success line would join us - and time after time would fall away. the very few who changed their behaviour and became successes would be less than 1%. But just one candidate doing so was enough for marketing (really, sales) to produce an extra ten one-percenters the following year. Picture of example chart, perhaps?

I should point out that this correlation chart was produced when trying to identify if there was any correlation between intake standard and result (if you like, exit standard) and in attempting to discover if we could identify early the third of students who quite clearly were not learning in lessons delivered in English. the whole point of the course on offer was to prepare for university study in English, so there was, in my view, no point at all in allowing use of the local language (I'm resisting saying which language) except for the occasional shortcut to understanding, equivalent to the use of a dictionary. I was not prepared to have lessons delivered in 'local' since the objective was to learn overseas and we were advertising that we did not do so. That's integrity and honesty from a different chapter. We learned that we could do much more about English failure (lists of necessary vocabulary) than we could about the maths, which was our route to measuring thinking skills. Later, we spent effort on a third aspect, being able to ask questions, which turned out to have cultural implications.

Omit. After I'd left, the advertised objectives remained much the same, but some 30-50% of lessons were in 'local' and 90% of the staff were local (as opposed to 100% and 50% while I was there); the objective was grades and only grades, to the point where I accidentally caught a previous boss creating certificates. Of course, I reported this behaviour and discussed the cultural lean towards this sort of behaviour with the next visit of (UK exam board) inspectors at some length.

Suppose your office has a timesheet whose objective is to record what projects (jobs, chargeable accounts, label-able tasks) you have worked on and how much time you spent on these tasks. What checks occur? If no checks are possible, are there jobs that you are not allocated to to which you may have contributed? Are there jobs we might call 'sensitive' that, if you record time against them, will cause questions from on high? Are there jobs in the opposite direction ('hospital jobs') to which it is understood that loads of otherwise useless or unproductive time will be allocated? Is it permissible to record that you spent ANY time being unproductive, or working on something personal, or looking at a task that we might call speculative?

So if you're not allowed to be idle, you are indirectly required to find a labelled task to which you can attribute useless time. Let's further assume that you're not allowed to quit for the day even when you have nothing to do. Of course you can find things to do, but is it acceptable to find something against which to charge this time?

Examples and case studies to find

Repetition; can it be turned into useful repetition?. There are several issues attached to metrics:

(i) they are susceptible to gaming.

(ii) they don't measure what management thinks they measure

(iii) there is sufficient imprecision in what is recorded that the information produced is not actually useful. In such cases, action taken on the basis of what is recorded is going to be flawed.

Yes, Fred is here for 35 hours every week. That tells you how long he is onsite, not what he did. if he worked only on Project Alpha, then that doesn't recognise the time spent doing other stuff, such as internal administration, personal administration and office habits like coffee and chatter. Management might well need to recognise that the effective time onsite is quite considerably lower. This supports the argument for reduction of the working week, or for changing the methods of working more radically. If Fred's job is to work on Project Alpha until that project is complete, with incentives to produce quality and timeliness (without censure, etc) then perhaps Fred (and the team to which he contributes) is capable of causing the project to be done more effectively, or with fewer hours (or wider, fewer resources).

Schools are often measured on exam grades. There is little dispute what the grades are. But these things are capable of manipulation; students can and are encouraged to 'drop' subjects at which they will not 'succeed', where, very often this 'success' is something that reflects on the school or department or teacher, such that these individuals have opinions at odds with the interests of the student. This has multiple effects and causes (or should cause) recognition of these conflicts. So the grades can be 'improved' by entering only those students who will produce whatever is judged as 'success'. Some subject are obligatory, such as Maths and English, but there may be wriggles the school can offer, such as English as a foreign language, or commercial arithmetic, so that the possibility of a 'better' grade occurs and at the same time some vaunted metric ('90% top three grades at Key Stage 4'; '98% pass rate', 'no failures') is protected.

If it can be measured, it must be measured. Great lies of business.

_____________

Some things cannot be measured. That does not make them valueless. In Britain, 5-19 education is free; just because it is valued at zero cost does not make it without value. Some of the things that make a job worth doing and some of the things that a business purports to do can quite easily be difficult to measure. Customer satisfaction for example: people are 'happy' when performance exceeds expectation. If this is taken as true, then the quickest route to happiness is to lower your expectations. If my business claims that 90% of customers rate us as 'excellent', that will fairly soon result in having expectations rise above what is deliverable. You don't promise what you cannot deliver.

For those who agree that they 'can't manage what they can't measure' it is very tempting to attempt a reversal of the logic and try to say that only what can be measured can be managed. In a post-covid world many people have rediscovered all sorts of well-being, itself very difficult to provide measures for. But if well-being is among those things that are consistently ignored by managers (because they cannot measure it) then gives gives great strength to all those who want to continue working from home as much as possible. I agree with everyone who wants to enjoy their work; I do not demand that all work be enjoyable, nor do I deny that there are bits of many jobs that are difficult to like, but that does not meant that there cannot be satisfaction in a difficult job done well, which can sometimes be a joy and sufficient reward itself.

Short version:

Metrics mask problems

Metrics create conflict, they lack credibility and they lead to unintended consequences

People focus on the metric, Staff game the metric. Performance goes out the window because it has become irrelevant. Not just managers focus on the metric.

from [2] less is more and know thyself

[1] Lies, damn lies and metrics Source. Mitchell Osak.

[2] here, essay 84, but also 82, 85, 211, 232, 233 238

other sources to look at.

Academic-style papers 1 (paywall) 2

Scoins.net

Site Navigation[Skip]