Borderline Statistics | Scoins.net | DJS

## Borderline Statistics

If you have a test, such as a medical test then the test is supposed to be positive (you have the thing tested for, which is usually not good) or negative ( you haven't got it, which is generally a good thing). That situation is bad enough, making +ve test result bad and -ve test good.

But...

But the test can fail. In two more ways; the result can be positive but untrue (a false positive) or it can be negative and untrue, a false negative. So there are four values that you might well be interested in.

Let's take an example. Whatever example I pick, someone is going to claim offence, so (i) hard luck and (ii) this is the real world and we have to learn about it. The reason for looking at this is that it is very easily turned into what is only a mess. The underlying issue is that this sort of thing is not definite, it is instead uncertain.

Let's suppose I have a test for 'stupidity', whatever that is. This is a binary state, in that you don't recover from it. Not all illnesses behave this way, but it makes the example a little more straightforward.  My test works well 90% of the time; by which I mean that it somehow gives a correct answer 90 times out of 100. Suppose we have lots of evidence that the proportion of the population that fits the label 'stupid' (having caught stupidity) is 2%. This is enough information to create a sort of truth table. To make the numbers a little easier to work with, let's imagine we've done an awful lot of tests of samples of 1000 people so that our numbers work exactly. For each 1000 people tested, we'd expect 20 (2% of 1000) to be stupid, but for the test to find only 18 of them. Similarly, we expect 980 to be not stupid and for the test to work 90% of the time, therefore identifying 882 correctly as not stupid. So we could draw up a table like this:

The table supports the information, that it gives a positive result (shown to be 'stupid') 90`% of the time.  So the false negatives are easiest to identify (test negative, result untrue) - that's just two people, tested as not stupid but actually so. The false positives, where the test says they are stupid, but this is not so, is 98. This is something of a disaster then, because we have now labelled 100 people incorrectly; in a sense, we already knew this, because the test 'works well 90% of the time'.  Doing the test again will help, but the likelihood of these second-cycle results also being wrong will leave us with 10 people still misidentified, on average. Really not very good.

Perhaps we need for the test to be very much better. We can't change that 2% of people have caught stupidity (whatever the label means), but we might improve the test to giving a right answer 98% of the time, giving 20/1000 faulty results, but should find all the stupid people most of the time. Even so, in every 1000 people, this 'better' test labels 40 people as having caught stupidity when half of these will subsequently shown to be clear of the infection. You might correctly guess that the reason it is 'half' (it isn't exactly) is because the 2% failure rate of the test is the same as the infection rate.

All of which, because stupidity is something safe, in the sense that anyone reading this is, by their own lights and mine, not at all stupid, is very much a 'so what' issue.

Now suppose this is a screening test for cancer. Suddenly this is very serious, because one's reaction to a result is so much more extreme. Common cancers for men and women are prostate cancer and breast cancer, respectively. Many people will agree with a suggestion that screening saves lives. Is this true? All screening does really is operate a test to people who otherwise might well not 'present' themselves to a medical professional until they have themselves noticed that something is wrong and noticed enough to go ask about their symptoms. So screening might well catch early cases. But cancers can be benign or aggressive and the available data says that around ⅔ of these cancers are benign. That generally means that the medics would rather take no action but perhaps repeat tests at intervals, to check that the benign lump remains inactive. But it also means that screening produces an awful lot of 'positive' results, so how everyone deals with being told they have a positive test for cancer is obviously a big deal. That position occurs without anyone having discussed, usually, whether the test itself has issues. I found (quite easily) that the absolute risk of dying from breast cancer is about 1 in 1000 women. With similar ease I found that prostate cancer has an absolute risk very much higher, at a bit over 4%. Looking only at white men (ethnicity makes a dramatic difference) the absolute risk of dying from prostate cancer is about 1 in 25. ¹

Let's look at the male problem first because it is so large. Part of the problem, particularly when comparing sources from different countries, is that there is so little reliability that what looks the same actually is the same. The common screening test looks for an enzyme, literally a 'prostate specific antigen' (now there's calling a spade a spade), contracted to PSA. This is a simple blood test, but having a higher level of PSA (a positive test) does not necessarily mean there is a cancer, since several other conditions might elevate PSA. These cancers generally grow very slowly so there are issues with over-diagnosis and over-treatment (both of which are a waste of resources, both of which have their own attached health risks). What we need to know is that about 25% of those who test positive for the PSA test enough to be sent for a biopsy turn out to actually have prostate cancer. A large scale trial in the US showed that the group of men given frequent tests had a higher incidence of prostate cancer, but exactly the same level of death. So the screening did not change the end result, though quite a few men were given treatment (which in the US is expensive). What it changed was the level of information about who has a cancer. In Britain the question has to be whether the diagnosis helped in any measurable way. What is useful information is a change in the measured level of PSA. ²

So our test here has a success rate of 25%. 25% of those that test positive are true positives. Can we assume that if we test 1000 likely men 1 in 8 is diagnosed? And that of these 125 only 31 have the cancer? Meanwhile if we think 1 in 25 will have the cancer (which is a fudge, I know, not least because it depends on how we choose the sample 1000 men to test) then we'd have a table like this, which, if nothing else makes us start to ask the right sort of questions. And on these figures, faulty though they are, we have around 10% with a wrong result; that's the 94 in 1000 likely to have a serious scare (also not good for their health) and the group for whom the test didn't identify as having a problem.

This goes quite some way to explaining why, in the USA, screening occurs while in the UK men are encouraged to be aware and to present themselves to medics if they show symptoms. The end result is very much the same in terms of deaths from this cause. The UK approach produces far fewer intrusive operations that prove in some sense unnecessary. Would the position be different if you could function without a prostate and no great inconvenience?

If you can find me better figures for the very real case of prostate cancer, particularly what I identify as 'not good enough', the number of men in our selected thousand who would have cancer at the time of testing, I would appreciate it.

One way to relate to your own concerns is to see how sensitive your model is to change. Suppose the false negatives are very low, so that maybe the bottom line reads 35 965 1000; this is better, because only 4 men have not been found who (probably) need treatment. But if the bottom line moved the same amount the other way, to 45 955 1000, then we have 14 missed, a dramatic increase in the proportion of people we think ought to be having treatment.

1  Figures from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4520076/. In 2011 there were 22.2M men in England and 10,800 died with this as the recorded cause in 2012. Which is 1 in 2056, but that's just one year, when you'd expect something like 370,000 men to die (1/60th), more like 1 in 34. Between 2008 and 2010 (table 3 of the linked paper) 639,000 white men died, 25,400 of these from prostate cancer, which is 1 in 25 in close agreement with the lifetime risk of 4.3%.Quite different from this is being diagnosed as having this cancer which turns out to be 1 in 8 as a lifetime risk. [Best estimate of 13.3%] in the period 2008-10 96,500 men were diagnosed as having prostate cancer, which is 1 in 230, while the 1 in 8 figure refers to a diagnosis as a lifetime risk. Does that mean that around 75% of those diagnosed survive having the cancer?