Table A1

 Oxford Centre for Evidence-based Medicine: levels of evidence

LevelTherapy/prevention/ aetiology/harmPrognosisDiagnosisDifferential diagnosis/ symptom prevalence studyEconomic and decision analyses
SR, Systematic review; RCT, randomised controlled trial.
*Homogeneity means a systematic review that is free from worrisome variations (heterogeneity) in the results between individual studies.
†Clinical decision rules are algorithms or scoring systems leading to a diagnostic category or prognostic estimation.
‡All patients died before the therapy became available, but some survive now on it, or some died before therapy became available, but none now die on it.
¶Validating studies test the quality of a diagnostic test, based on prior evidence. An exploratory study collects information and (for example, using a regression analysis) identifies which factors are significant
§Good, better, bad, and worse refer to the comparison between treatments in terms of their clinical benefit and risks.
**Poor quality cohort study is one that failed to define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded) objective way in both exposed and non-exposed individuals, and/or failed to identify and control for confounders and/or to complete long follow up. Poor quality case control study is one that failed to define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded) objective way in both cases and controls, and/or failed to identify and control for confounders.
††Poor quality prognostic cohort study is one with biased sampling in favour of patients who already had the target outcome, or outcomes were measured in <80%, or outcomes were determined in an unblended non-objective way, or there was no correction for the confounders.
‡‡An “absolute SpPin” is a diagnostic finding whose specificity is so high that a positive result confirms the diagnosis. “Absolute SnNout” is a diagnostic finding whose sensitivity is so high that negative results rule out the diagnosis.
¶¶Split sample validation is achieved by collecting all the information in a single tranche and then dividing this into “derivation” and “validation” samples.
§§Good follow up is >80%, with adequate time for alternative diagnosis to emerge (for example, 1–8 months acute, 1–5 years chronic).
***Better value treatments are clearly as good, but cheaper or better at the same or reduced cost. Worse value treatments are as good and more expensive, or worse and equally/more expensive
1aSR (with homogeneity*) of RCTsSR (with homogeneity*) of inception cohort studies; CDR† validated in different populationsSR (with homogeneity*) of level 1 diagnostic studies; CDR† with 1b studies from different clinical centresSR (with homogeneity*) of prospective cohort studiesSR (with homogeneity*) of level 1 economic studies
1bIndividual RCT (with narrow confidence interval)Individual inception cohort study with ⩾80% follow up; CDR† validated in a single populationValidating¶ cohort study with good§ reference standards; or CDR† tested within 1 clinical centreProspective cohort study with good follow up§§Analysis based on clinically sensible costs or alternative systematic reviews of the evidence and including multi-way sensitivity analyses
1cAll or none‡All or none case seriesAbsolute SpPins and SnNouts‡‡All or none case seriesAbsolute better value or worse value analysis***
2aSR (with homogeneity*) of cohort studiesSR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTsSR (with homogeneity*) of level >2 diagnostic studiesSR (with homogeneity*) of level 2b and better studiesSR (with homogeneity*) of level >2 economic studies
2bIndividual cohort study (including low quality RCT (<80% follow up)Retrospective cohort study of follow up of untreated controls in an RCT; Derivation of CDR† or validation on split samples¶¶ onlyExploratory¶ cohort study with good§ reference standards; CDR† after derivation; or validated only on split samples¶¶ or databasesRetrospective cohort study, or poor follow-upAnalysis based on clinically sensible costs or alternatives; limited reviews of the evidence, or single study; and including multi-way sensitivity analysis
2c“Outcomes” research, ecological studies“Outcomes” researchEcological studiesAudit or “outcomes” research
3aSR (with homogeneity*) of case control studiesSR (with homogeneity*) of 3b and better studiesSR (with homogeneity*) of 3b and better studiesSR (with homogeneity*) of 3b and better studies
3bIndividual case control studyNon-consecutive study, or without consistently applied reference standardsNon-consecutive study, or very limited populationAnalysis based on limited alternatives or costs, poor quality estimates of data, but including sensitivity analyses incorporating clinically sensible variations
4Case series (and poor quality cohort and case-control studies**)Case series (and poor quality prognostic cohort studies††)Case control study, poor or non-dependent reference standardsCase series or supervised reference standardsAnalysis with no sensitivity analysis
5Expert opinion without explicit critical appraisal or based on physiology, bench research or “first principles”Expert opinion without explicit critical appraisal or based on physiology, bench research, or “first principles”Expert opinion without explicit critical appraisal or based on physiology, bench research or “first principles”Expert opinion without explicit critical appraisal or based on physiology, bench research or “first principles”Expert opinion without explicit critical appraisal or based on physiology, bench research or “first principles”