Winning against a Gold Standard

What is a diagnostic Gold Standard (sometimes also called reference standard)? It is the “best available test under reasonable conditions” meaning it is a test that is accepted by the medical community to be the best to identify a specific disease^[1].

Now imagine your startup has found a way to test for a disease that is better or cheaper than the current gold standard. The medical community should be ecstatic and immediately switch over to this new and improved test, right? Well…

The first thing you need to do is convince everyone that your test is better than the gold standard. But the gold standard defines who has the disease, and who doesn’t. There is no reliable data about its actual sensitivity and specificity and even the information about prevalence in a population is based on the results of Gold Standard. What does that mean for your new test?

Let us take a look at a fictional disease, Mensura Aurea Syndrome, or MAS for short. Since we are making it up, we can also define the relevant numbers. Prevalence is 10%, and the sensitivity and specificity of the diagnostic Gold Standard for MAS are 90% and 95% respectively^[2]. If you apply the Gold Standard test to 1000 patients, you will get:

90 patients correctly identified as having MAS (90% sensitivity, i.e. 90 out of 100 in that population that have MAS)
45 incorrectly identified as having MAS (95% specificity, or 5% false positives in 900 that do not have MAS)

According to the Gold Standard, there are 135 patients in the 1000 tested that have MAS.

You believe your new test has the same performance as the Gold Standard. But it will cost half. And you are willing to go head to head with the Gold Standard. How is that going to play out?

You will test the same 1000 patients and compare your results to the Gold Standard results.

In the 135 patients that originally tested positive for MAS (90 true positive and 45 false positive – but nobody knows that), your test identifies^[3]

90% of the 90 true positives = 81 with MAS
5% of 45 false positives = 2 with MAS
83 patients with MAS total

Compared to the Gold Standard, you “missed” 53 patients.

In the 865 patients that originally tested negative (10 false negatives, 855 true negatives), your test identifies

90% of the 10 false negatives = 9 with MAS
5% of 855 true negatives = 43 with MAS
52 patients with MAS total

You just “wrongly” identified 52 patients as diseased.

The reaction to those results? You miss too many patients with MAS and the added cost of treating all those "incorrectly" diagnosed is not worth the savings of your cheaper test.

But if you don’t make the test cheaper but more accurate instead? That should help, right? You believe that your improved test sensitivity is 95% (instead of 90%) and your specificity is 99% (instead of 95%).

Back to the original 1000 patients with the Gold Standard test results. Here is how your test fares:

In the original 135 positive cases (90 true, 45 false), you now identify

95% of 90 true positives = 86 with MAS
1% of 45 false positives = 0
86 patients with MAS total

In the 865 negative cases (10 false, 855 true), you identify

10 (95% of 10) = 10 with MAS
1% of 855 = 9 with with MAS
19 patients with MAS total

You “missed” 49 patients with MAS, and falsely diagnosed 19 “healthy” patients with MAS. Slightly better, but definitely not convincing anyone.

By now you probably realize the sad truth – you can’t “win” comparing to a gold standard. Since, by definition, the gold standard is right, any deviating results are “wrong” even if actually correct. Now what?

We need to take a step back and ask: what is the purpose of a diagnostic test? Its purpose is not to be right, its purpose is to help identify the appropriate treatment to make a patient get better. The key to success here is not accuracy, it is outcomes.

Comparing a new diagnostic test to a gold standard does nothing for the patient (and, as you have just seen, is a losing proposition). But if you can show that patients treated based on your test results have more positive outcomes vs. those treated based on gold standard results, then your new test will get some well deserved attention. Additional cost savings, as fewer patients without the disease are unnecessarily treated, also won’t hurt. Unfortunately, this will mean a clinical trial, likely even a randomized one. But that is the price you will have to pay.

Please bear in mind that it will still not be a “slam dunk” as it takes time to change the thinking in the medical community. But now you actually stand a fighting chance.

^[1] The gold standard might even vary between medical specialties as the paper by Bachman et al. (Lucas M Bachmann, Peter Jüni, Stephan Reichenbach, Hans-Rudolf Ziswiler, Alfons G Kessels, Esther Vögelin, Consequences of different diagnostic ‘gold standards’ in test accuracy research: Carpal Tunnel Syndrome as an example, International Journal of Epidemiology, Volume 34, Issue 4, August 2005, Pages 953–955, https://doi.org/10.1093/ije/dyi105) describes

^[2] A reminder for those a bit rusty: Prevalence: proportion of a population who have the disease so 10 in a 100 patients will have it, Sensitivity: proportion of those with the disease correctly identified by the test, Specificity: proportion of those without the disease correctly identified by the test.

^[3] All results rounded to full patients

Google Sites

Report abuse