Testing 101: Sensitivity & Specificity

Medical tests are primarily judged by their accuracy, and this comprises two factors: sensitivity and specificity. The perfect test scores 100% on both.

The sensitivity of a test tells us how well it detects the presence of the pathogen in an infected sample, so naturally, this is where our analysis should begin. If the sensitivity is 90%, then the test will report 90 positive results and 10 negative results out of every 100 infected samples. The 10 negative results out of 100, give the test a false negative rate of 10%.

Sensitivity = 1 – Prob [false negative]

The ‘specificity’ of a test tells us how effectively the test can detect the absence of the disease in an uninfected sample. If samples were taken from 100 uninfected people, and the test reported 99 negative results, then it would have a specificity of 99% and a corresponding rate of false positives of 1%. Specificity is higher than sensitivity (and usually close to 1) because it is a lot easier to miss something that exists, than it is to find something that doesn’t exist.

Specificity = 1 – Prob [false positive]

Sensitivity tells us how good the test is – the rate of true positives. Specificity tells us how bad it isn’t – the rate of true negatives. We want our tests to be good, but we also want them to be not bad, so to speak.

The reason that we have two criteria for judging a test is that it will perform differently depending on whether the underlying sample is infected or uninfected. There are two possible states of the sample, and the sensitivity and specificity tell us how well the test performs at identifying each.


Figure 1: Medical Tests


Lying With Statistics

It is essential to consider the sensitivity and specificity of a test together.

If a test has a sensitivity of 100%, then it is successfully identifying every infected sample. Based on that information, you would come to the conclusion that this was an excellent test. However, you could just as easily come to the opposite conclusion: that the test is about as useful as a stopped clock. Absent any other information, both outcomes are equally likely.

How so?

The test could be 100% accurate, as described above. Or, it could simply be reporting a positive result for every sample, regardless of whether it is infected or not, like a stopped clock. Both scenarios will lead to a sensitivity of 100%, but we’ll need to see their specificities to determine whether they are worth anything.

If the specificity of the test was also 100%, then the test would indeed be perfect, as 100% of the infected samples would report positive results and 100% of the uninfected samples would report negative results. There would be no false positives and no false negatives.

If the test was useless, however, then it’s specificity would be 0% as none of the uninfected samples would produce a negative result. None of the samples would produce a negative at all, result would be positive and of those positives some would be true positives (sensitivity) and the rest would be false positives (1 – specificity = 1 or 100%).

This is one of the ways that people can lie with statistics, so keep an eye out for it. If someone is bragging about the sensitivity of their test, don’t come to any conclusions until you have seen its specificity too.


Figure 2: Criminal Trial


Not All Errors Are Created Equal

The previous example shows us that there are relationships that balance these outcomes. A test that produces more positives will have a higher sensitivity and a higher rate of false positives, purely by virtue of producing more positive results. It will also have a lower rate of false negatives, and that might sound like a good thing until you remember that this reflects an overall lack of negative results, not a higher specificity.

Perhaps the easiest way to understand the interaction between these outcomes is by thinking of a court case, where an individual stands accused of a crime. A criminal trial has the same profile as a medical test. The individual is either guilty or not, and the court will either find him guilty or not. There are four outcomes, and two are good and two are bad. The good outcomes are that the innocent are set free and the guilty go to jail. The bad outcomes are that either the guilty go free or the innocent go to jail. They key point is that those two bad outcomes are related.

We can guarantee that no innocent person ever goes to jail. In a liberal society, that would be a very desirable outcome. It’s quite simple too: send no one to jail. This policy ensures that 0% of innocent people will ever have a minute of liberty taken away from them. Unfortunately, it also guarantees that 0% of guilty people will ever be convicted. (Autocracies do the opposite: they maximise the likelihood that guilty people go to jail by convicting everyone.)

By minimising one error, we maximised the other. This trade-off is inescapable. It’s like a see-saw: if we decrease the likelihood of one error, we will inevitably increase the likelihood of the other.

So how do we resolve the matter?


Not All False Positives Are Created Equal

The answer will depend on the context. In the criminal trial, we are trading-off between sending an innocent person to jail and letting a guilty person go free. We live in a liberal society, so the former (the false positive) is clearly the worse error. We therefore bias in favour of negative results and that means we have more false negatives.

In the context of a contagious outbreak, it’s the other way around.

False negatives are a bigger problem because they present a higher risk to the health of the individual and to the health of their society. The individual’s condition may deteriorate, and they may not get the medical treatment they need in time to prevent serious health damage. And as long as they remain contagious and undiagnosed, they risk spreading the virus further.

A false positive on the other hand may inconvenience the individual, their contacts, and those around them, but it is very unlikely to lead to negative health outcomes for anyone.


Conclusion

When we think about testing, we should start by assessing the overall accuracy of a test. While accuracy is the single most important factor, it is certainly not the only one that counts. The cost, turnaround time, and ease of use also contribute to the overall quality of a testing regimen and as a result, they will also determine the quality of the health care we can offer our people. Accuracy is where the conversation should start, not where it should stop.

The PCR has come to be regarded as the best test among the policymakers because it has the highest accuracy. In coming to that conclusion, the policymakers have ignored every other characteristic of the test, and that has come at great cost to the health of the nation. We will discuss that topic in another post.