Article Text

Download PDFPDF
Reliability and agreement studies: a guide for clinical investigators
  1. Ruben Hernaez
  1. Correspondence to Dr Ruben Hernaez, Division of Gastroenterology and Hepatology, Department of Internal Medicine, The Johns Hopkins School of Medicine, 600 N. Wolfe Street, Blalock 439, Baltimore, Maryland 21287, USA; rhernae1{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Setting the framework: the difference between reliability and agreement

On a daily basis, clinicians and researchers face the challenge of measuring multiple outcomes. From responses to therapies and assessments of disease activity, to certainty of diagnoses and innovation of cutting-edge diagnostic tools, it is essential within every field that outcome measurement be valid, reproducible and reliable.1 At first glance, validity, reproducibility, reliability and agreement may seem similar; however, there are fundamental differences among these concepts that are important for study design and execution, and for methodology and statistical analyses. Alvan Feinstein saw that problem and introduced the term clinimetrics, or, “the methodologic discipline focusing on measurement issues in clinical medicine”.2 The concept of clinimetrics is not new; on the contrary, it has been considered a subset of psychometrics.3 Terwee, de Vet, Mokkink and Knol, among others,4 developed tools to assess and evaluate health measurement instruments in clinical medicine. It is, therefore, why the backbone of this paper will rely on the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) initiative.

The COSMIN initiative is a multidisciplinary, international consensus which aimed to create standards to evaluate the methodological quality and design and preferred statistical analyses of a study on measurement properties.5 The initiative primarily focused on Health Related Patient-Reported Outcomes (HR-PRO) due to the complexity of these outcomes measurements; however, these concepts still apply to other type of outcomes and will be followed here.4 For reader clarification, HR-PRO is defined by Mokkink et al4 as “any aspect of a patient's health status that is directly assessed by the patient, that is, without the interpretation of the patient's responses by a physician or anyone else”; examples include self-administered or computer-administered questionnaires.

The COSMIN taxonomy in the evaluation of a measurement instrument shows three main quality domains: reliability, validity …

View Full Text


  • Correction notice This article has been corrected since it was published Online First. The formula under the heading ‘Reliability parameters for continuous measurement variables’ has been corrected.

  • Acknowledgements The author thanks Sandeep Mahajan, MD and Navneet K Jaswal, JD for editing the current manuscript.

  • Competing interests None.

  • Provenance and peer review Commissioned; externally peer reviewed.