Level | Therapy/prevention, aetiology/harm | Prognosis | Diagnosis | Differential diagnosis/symptom prevalence study | Economic and decision analyses |
1a | SR (with homogeneity*) of RCTs | SR (with homogeneity*) of inception cohort studies; CDR# validated in different populations | SR (with homogeneity*) of level 1 diagnostic studies; CDR# with 1b studies from different clinical centres | SR (with homogeneity*) of prospective cohort studies | SR (with homogeneity*) of level 1 economic studies |
1b | Individual RCT (with narrow CI) | Individual inception cohort study with >80% follow-up; CDR† validated in a single population | Validating‡ cohort study with good§ reference standards; or CDR† tested within one clinical centre | Prospective cohort study with good follow-up¶ | Analysis based on clinically sensible costs or alternatives; systematic review(s) of the evidence; and including multi-way sensitivity analyses |
1c | All or none** | All case series or none | Absolute SpPins and SnNouts†† | All or none case series | Absolute better-value or worse-value analyses |
2a | SR (with homogeneity*) of cohort studies | SR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTs | SR (with homogeneity*) of level >2 diagnostic studies | SR (with homogeneity*) of 2b and better studies | SR (with homogeneity*) of level >2 economic studies |
2b | Individual cohort study (including low-quality RCT; for example, <80% follow-up) | Retrospective cohort study or follow-up of untreated control patients in an RCT; derivation of CDR†or validated on split sample‡‡ only | Exploratory‡ cohort study with good§ reference standards; CDR† after derivation, or validated only on split-sample‡‡ or databases | Retrospective cohort study, or poor follow-up | Analysis based on clinically sensible costs or alternatives; limited review(s) of the evidence, or single studies; and including multi-way sensitivity analyses |
2c | ‘Outcomes’ research; ecological studies | ‘Outcomes’ research | Ecological studies | Audit or outcomes research | |
3a | SR (with homogeneity*) of case–control studies | SR (with homogeneity*) of 3b and better studies | SR (with homogeneity*) of 3b and better studies | SR (with homogeneity*) of 3b and better studies | |
3b | Individual case–control study | Non-consecutive study; or without consistently applied reference standards | Non-consecutive cohort study, or very limited population | Analysis based on limited alternatives or costs, poor-quality estimates of data, but including sensitivity analyses incorporating clinically sensible variations | |
4 | Case series (and poor-quality cohort and case–control studies§§) | Case series (and poor-quality prognostic cohort studies¶¶) | Case–control study, poor or non-independent reference standard | Case series or superseded reference standards | Analysis with no sensitivity analysis |
5 | Expert opinion without explicit critical appraisal, or based on physiology, bench research or ‘first principles’ | Expert opinion without explicit critical appraisal, or based on physiology, bench research or ‘first principles’ | Expert opinion without explicit critical appraisal, or based on physiology, bench research or ‘first principles’ | Expert opinion without explicit critical appraisal, or based on physiology, bench research or ‘first principles’ | Expert opinion without explicit critical appraisal, or based on economic theory or ‘first principles’ |
*Homogeneity means a systematic review (SR) that is free from worrisome variations (heterogeneity) in the directions and degrees of results between individual studies. Not all SRs with statistically significant heterogeneity need be worrisome, and not all worrisome heterogeneity need be statistically significant.
†CDR, clinical decision rule (algorithms or scoring systems which lead to a prognostic estimation or a diagnostic category).
‡Validating studies test the quality of a specific diagnostic test based on prior evidence. An exploratory study collects information and trawls the data (eg, using a regression analysis) to find which factors are ‘significant’.
§Good reference standards are independent of the test, and applied blindly or objectively to all patients. Poor reference standards are haphazardly applied, but still independent of the test. Use of a non-independent reference standard (where the ‘test’ is included in the ‘reference’, or where the ‘testing’ affects the ‘reference’) implies a level 4 study.
¶Good follow-up in a differential diagnosis study is >80%, with adequate time for alternative diagnoses to emerge (eg, 1–6 months acute, 1–5 years chronic).
**Met when all patients died before the treatment became available but some now survive while receiving it; or when some patients died before the treatment became available but none now die while receiving it.
††An ‘absolute SpPin’: a diagnostic finding whose Specificity is so high that a Positive result rules in the diagnosis. An ‘absolute SnNout’: a diagnostic finding whose Sensitivity is so high that a Negative result rules out the diagnosis.
‡‡Split-sample validation is achieved by collecting all the information in a single tranche, then artificially dividing this into ‘derivation’ and ‘validation’ samples.
§§Poor-quality cohort study: one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded) objective way in both exposed and non-exposed individuals and/or failed to identify or appropriately control known confounders and/or failed to carry out a sufficiently long and complete follow-up of patients. Poor-quality case–control study: one that failed to clearly define comparison groups and/or failed to measure exposures and outcomes in the same (preferably blinded) objective way in both cases and controls and/or failed to identify or appropriately control known confounders.
¶¶Poor-quality prognostic cohort study: one in which sampling was biased in favour of patients who already had the target outcome, or the measurement of outcomes was accomplished in <80% of study patients, or outcomes were determined in an unblinded non-objective way, or there was no correction for confounding factors.