Article Text

Download PDFPDF

Original article
Outcome measures in coeliac disease trials: the Tampere recommendations
  1. Jonas F Ludvigsson1,2,
  2. Carolina Ciacci3,
  3. Peter HR Green4,
  4. Katri Kaukinen5,6,
  5. Ilma R Korponay-Szabo7,8,
  6. Kalle Kurppa9,10,
  7. Joseph A Murray11,
  8. Knut Erik Aslaksen Lundin12,13,
  9. Markku J Maki14,15,
  10. Alina Popp16,17,
  11. Norelle R Reilly18,19,
  12. Alfonso Rodriguez-Herrera20,
  13. David S Sanders21,
  14. Detlef Schuppan22,23,
  15. Sarah Sleet24,
  16. Juha Taavela25,
  17. Kristin Voorhees26,
  18. Marjorie M Walker27,
  19. Daniel A Leffler28
  1. 1 Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
  2. 2 Department of Pediatrics, Örebro University Hospital, Örebro, Sweden
  3. 3 Coeliac Center at Department of Medicine and Surgery, Scuola Medica Salernitana, University of Salerno, Salerno, Italy
  4. 4 Celiac Disease Center at Columbia University, New York, USA
  5. 5 Celiac Disease Research Center, Faculty of Medicine and Life Sciences, University of Tampere, Tampere, Finland
  6. 6 Department of Internal Medicine, Tampere University Hospital, Tampere, Finland
  7. 7 Coeliac Disease Centre, Heim Pál Children’s Hospital, Budapest, Hungary
  8. 8 Department of Paediatrics, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
  9. 9 Celiac Disease Research Center, Faculty of Medicine and Life Sciences, University of Tampere, Tampere, Finland
  10. 10 Department of Paediatrics, Tampere University Hospital, Tampere, Finland
  11. 11 The Mayo Clinic, Rochester, Minnesota, USA
  12. 12 Institute of Clinical Medicine and K.G. Jebsen Coeliac Disease Research Centre, Faculty of Medicine, University of Oslo, Oslo, Norway
  13. 13 Department of Gastroenterology, Oslo University Hospital, Oslo, Norway
  14. 14 Science Center, Tampere University Hospital, Tampere, Finland
  15. 15 Tampere Centre for Child Health Research, Faculty of Medicine and Life Sciences, University of Tampere, Tampere, Finland
  16. 16 Institute for Mother and Child Health Bucharest, University of Medicine and Pharmacy ’Carol Davila', Bucharest, Romania
  17. 17 Tampere Centre for Child Health Research, University of Tampere, Tampere University Hospital, Tampere, Finland
  18. 18 Division of Pediatric Gastroenterology, Columbia University Medical Center, New York, USA
  19. 19 Celiac Disease Center, Department of Medicine, Columbia University Medical Center, New York, USA
  20. 20 Grupo IHP Pediatria, Sevilla, Spain
  21. 21 Academic Unit of Gastroenterology, Royal Hallamshire Hospital, University of Sheffield, Sheffield, UK
  22. 22 Celiac Center, University Medical Center, Johannes-Gutenberg University, Mainz, Germany
  23. 23 Division of Gastroenterology, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
  24. 24 Coeliac UK, Buckinghamshire, UK
  25. 25 Tampere Centre for Child Health Research, University of Tampere, Tampere University Hospital, Tampere, Finland
  26. 26 Continuum Clinical, Northbrook, Illinois, USA
  27. 27 Faculty of Health and Medicine, School of Medicine and Public Health, University of Newcastle, Newcastle, New South Wales, Australia
  28. 28 Celiac Center, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
  1. Correspondence to Dr Jonas F Ludvigsson, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 171 77, Sweden; jonasludvigsson{at}


Objective A gluten-free diet is the only treatment option of coeliac disease, but recently an increasing number of trials have begun to explore alternative treatment strategies. We aimed to review the literature on coeliac disease therapeutic trials and issue recommendations for outcome measures.

Design Based on a literature review of 10 062 references, we (17 researchers and 2 patient representatives from 10 countries) reviewed the use and suitability of both clinical and non-clinical outcome measures. We then made expert-based recommendations for use of these outcomes in coeliac disease trials and identified areas where research is needed.

Results We comment on the use of histology, serology, clinical outcome assessment (including patient-reported outcomes), quality of life and immunological tools including gluten immunogenic peptides for trials in coeliac disease.

Conclusion Careful evaluation and reporting of outcome measures will increase transparency and comparability of coeliac disease therapeutic trials, and will benefit patients, healthcare and the pharmaceutical industry.

  • celiac disease
  • gluten
  • gluten free diet
  • clinical trials

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Significance of this study

What is already known about this subject?

  • A gluten-free diet is the only treatment option of coeliac disease, but recently an increasing number of trials have begun to explore alternative treatment strategies.

  • A large number of trials of non-dietary treatments for coeliac disease are ongoing or under way.

  • There is no consensus on outcome measures in coeliac disease trials.

What are the new findings?

  • After an extensive literature review, 17 researchers and 2 patient representatives from 10 countries reviewed the use and suitability of histology, serology, clinical outcome assessment (including patient-reported outcomes), quality of life and immunological tools that comprised gluten immunogenic peptides for trials in coeliac disease.

  • In this paper, we make expert-based recommendations for use of these outcomes in coeliac disease therapeutic trials.

How might it impact on clinical practice in the foreseeable future?

  • Following the outlined recommendations of this paper, will increase transparency and comparability of coeliac disease therapeutic trials with benefit to patients, healthcare and the pharmaceutical industry.


Coeliac disease (CD) is an immune-mediated disease triggered by gluten exposure.1 Although characterised by small intestinal inflammation, consequences are widespread and linked to diverse manifestations that include osteoporosis,2 lymphoma,3 4 pneumonia5 and increased mortality.6 Symptoms vary, with some patients having diarrhoea and malabsorption (often termed ‘classical CD’), others suffering from constipation, fatigue and depression (non-classical CD) and some are asymptomatic (subclinical CD).7 The global prevalence of CD is about 1%–2%,8 9 but seems to be increasing.10 11

Lifelong adherence to a gluten-free diet (GFD) is the only available treatment for CD.1 For several reasons, patients find the GFD to be exceedingly burdensome,12 that is, it is socially restrictive13 and more expensive than ordinary food.14–16 Patients differ in their ability to adapt psychologically to CD. Some people have little difficulty in adopting the GFD, whereas for others, living with CD is a daily struggle.13 In addition to the burden of treatment, patients with CD frequently have ongoing symptoms and mucosal healing is slow and often incomplete. For these reasons, there is a need for alternative treatments of CD, as suggested by the intensive research efforts undertaken in different laboratories.17 Potential targets for treatment include glutenases, modified or pretreated gluten, gluten sequestrants, neutralising antibodies, inhibitors of intestinal permeability, lymphocyte blockers, including anti-interleukin-15, tissue transglutaminase (TG2) inhibitors, immune tolerance induction, exposure to hookworms and DQ2-blocking peptide analogues.17 18 Several of these drugs are now being tested in phase I or phase II trials, and a recent study found that novel therapies attract the interest of patients with CD more than any other disease-related topic.19

Of importance is that treatment effects are measured against robust standards. A recent systematic review20 identified six histological CD activity indices,21–26 five patient-reported outcomes (PROs)27–31 and four indices for endoscopic CD activity32–35 that have been used for coeliac trials.

In the present paper, we have explored clinical, serological, histological and immunological outcome measures for performing clinical trials in CD, and importantly, have included the patient perspectives concerning recommendations for their use.


Task force

Coauthors were invited by JFL and DAL with the aim to obtain a group with knowledge, experiences and interests that reflect the heterogeneity of outcome measures used in trials of CD. Most of the participating researchers were adult gastroenterologists (PHRG, CC, DSS, KKa, DS, JAM, JT, KEAL, DAL) but our group also included six paediatricians (JFL, NRR, KKu, MJM, AP, IRK-S), one pathologist (MMW), one basic scientist (AR-H) and two representatives of patient organisations (SS and KV). Members of this diverse collaboration originated from 10 countries. Most of the coauthors participated in the CD meeting organised by MJM in Tampere, Finland on 24–25 November 2016 (which provides the motivation for the subtitle of this paper).

Literature review

Coauthors were divided into seven teams of three to four individuals who jointly reviewed five domains: serology, histology, immunology, PROs and other outcome measures. The Karolinska Institutet library carried out literature searches for relevant papers up until 1 December 2016 (see online supplementary appendix). This search yielded 10 062 references. After reviewing titles and abstracts for these 10 062 references (online supplementary appendix), there remained 941 publications that were considered potentially relevant for this review and subsequently read in detail.

Supplementary file 1

Supplementary file 2

In this paper, we issued a number of recommendations. Where appropriate these were graded according to the method suggested by the Oxford Centre or Evidence-based Medicine,36 where grade A evidence represents the highest level of evidence and grade D the lower available evidence (generally based on the opinion of experts but with no preceding randomised trials, cohort or case-control studies). The appendix contains a detailed description of grade A–D. All recommendations were subject to a post hoc voting on a five-level scale (strongly disagree, disagree, not agree or disagree, agree and strongly agree).

Manuscript draft

JFL wrote the first draft of the paper. The text was then extensively revised by the coauthors. JFL and DAL supervised these revisions, but all authors contributed and agreed on the conclusions and the final wording of the paper.

This is a series of expert-based recommendations and did not meet the requirements of a formal systematic review (eg, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement37). We aimed to highlight the state-of-the-art of designing intervention trials in CD.



Clinicopathological correlation is key to the diagnosis of CD in adults and children. In adults, confirmation of the diagnosis by duodenal biopsy is the gold standard.7 38 39 In Europe, a ‘biopsy-sparing’ protocol (with defined limitations for use) has been adopted for symptomatic children defined by an anti-TG2 titre ≥10 times the upper limit of normal, positive endomysial antibody (EMA) on a second blood draw and positivity for human leucocyte antigen haplotypes HLA-DQ2 and/or HLA-DQ8.40 To evaluate effective treatment for CD, quantitative histological assessment (morphometry) outperforms qualitative histology (eg, the Marsh score) in a trial setting.20 In clinical trials, optimised biopsy protocols should be followed for assessment of mucosal damage or healing.

Well-known classifications in histological assessment are described by Marsh and modified by Oberhuber7 38 and Corazza and Villanacci.41 Although grouped classifications are practical in clinical work, recent studies have shown imperfect reproducibility and reliability.26 42

Recently published recommendations for biopsy diagnosis of CD in adult patients for the number of biopsies and sites are available,38 39 with optimal laboratory processing alongside structured reports to include validated morphometric analysis.26 38

It is recommended to take at least five duodenal biopsies, one or two from the duodenal bulb (D1) and four from the second part of the duodenum (D2).38 39 These biopsies should be taken across circular folds to avoid a crushing artefact.43 Endoscopists should take one biopsy specimen per pass of the forceps in that a single-biopsy technique improves the yield of well-oriented duodenal biopsy specimens.44 Biopsies from D1 and D2 should be reviewed separately by a pathologist.45

When processed in the laboratory, biopsies should be oriented correctly and sectioned at three levels. In trial settings it is appropriate to always count intraepithelial lymphocytes (IELs) and state the number present/100 enterocytes (normal counts are ≤25/100 enterocytes).7 26 39 45 46 IELs can be counted in H&E stained sections; however, immunohistochemistry with CD3 is preferred by some pathologists.26 47 Frozen tissue specimens have been used to evaluate T-cell receptor gamma delta positive (γδ+) T cells,48 but new antibodies for use on paraffin-embedded specimens are now available.49 50 Identification of a high density of γδ+ T cells is relatively specific for CD and can be useful when histological diagnosis remains equivocal.48 49

Immunohistochemistry to show deposited immunoglobulin A targeting TG2 in the small bowel mucosa is accurate in the detection of CD if patients are on a gluten-containing diet.51 These deposits have had 100% sensitivity52 and a mean specificity of 94%53 for CD. The deposits have been used in several gluten challenge studies to measure gluten reactivity.52 This technique requires the use of frozen tissue. The use of frozen tissue in clinical trials has shown variable results, which mirrored serum TG2 antibodies with increased deposits.54

Morphometry, in which continuous variables such as the villous height-to-crypt depth ratio and IEL density are measured separately, overcomes certain problems encountered when using grouped classifications.26 55 Of note, a threshold change of 0.4 represents a measurable and likely clinically relevant difference between villous height:crypt depth ratio measurements. A villous height:crypt depth ratio of <2 is indicative of atrophy, active disease. Patients with treated CD have values above 3. Similarly, ≥30% change in T-cell IEL densities is considered clinically significant.26

Importantly, morphometry, which has produced excellent reproducibility and reliability,26 has a significant role in clinical trials in which reliable and accurate measurements are a requirement.56 Whichever classification is used, two blinded observers should read the histology to ensure reliability in clinical trials.26 It should be noted that CD can be patchy and there is some intrasubject and even intrabiopsy variability in villous architecture and lymphocyte numbers contributing to sampling error and difficulties in interpretation. Given this, and that only a small proportion of the proximal small intestinal mucosa is evaluated by conventional biopsy review, new tools are needed for assessment of mucosal health. The optimal timing of biopsy to evaluate healing should be a clinicopathological decision dependent on treatment offered and taking into account possible sampling errors by following protocols for biopsy sites.

In CD, there may be concurrent upper GI pathologies (eg, Helicobacter pylori infection,57 58 lymphocytic gastritis59 and eosinophilic oesophagitis/oesophageal eosinophilia60) that should be assessed at initial endoscopy if clinical history is suggestive, and if present, reassessed post-treatment because these may contribute to ongoing symptoms not related to small intestinal damage.

Patients included in studies for CD therapy must have had an initial robust diagnosis. Occasionally, patients will have been diagnosed without histological confirmation.61 These patients should not be included in gluten challenge studies but instead included in trials of active CD treatment. Patients with a study entry biopsy confirming villous atrophy (VA) and a record of positive serology and permissive HLA status should be eligible for treatment studies in CD.

As per the current European Society for Paediatric Gastroenterology Hepatology and Nutrition guidelines,40 we are reluctant to suggest timelines for control biopsies for children, although a recent paper found that mucosal healing may not be as complete as previously assumed.62 For now, follow-up biopsy in children should be dictated by clinical needs.

Table 1 summarises changes in mucosal histopathology, serology and symptoms in clinical trials with gluten challenge.

Table 1

Coeliac disease: results of clinical trials with gluten challenge, changes in mucosal histopathology, serology and symptoms

Recommendations: Histology is an essential outcome measure in any trials of CD treatment (grade B). Histology should be performed both before and at the end of the trial when healing or histological relapse is the primary outcome measure (grade B). In a gluten challenge study, successful treatment may be characterised as no change, or, in a treatment study, as histological improvement by a significant increase in the villous height:crypt ratio (>0.4 being considered relevant) and/or a ≥30% change in IEL densities (grade D). Additionally, histology may be useful as a criterion for study inclusion in which participants in gluten challenge studies should have a high villous height:crypt depth of >2–3 and participants in treatment trials should have a decreased villous height:crypt depth ratio of <2–3 (grade D). Histological evaluation should follow a priori histology protocols using quantitative measures.

Vote: agree: 7; strongly agree: 12.


Serology is a cornerstone in the diagnostic workup for CD.38 39 63 64 IgA auto-antibodies to TG2 and IgG antibodies to deamidated gliadin peptides (DGPs) are central diagnostic tests for active CD. In IgA-sufficient patients, IgA-anti-TG2 is the most predictive and reproducible single test, although IgA EMA performs similarly well in some expert laboratories and is often used as a confirmatory test. IgG anti-DGP displays similar sensitivity as IgA anti-TG2 but has lower specificity. Selection of optimal serological tests is mandatory because not all commercial assays perform equally well.65 66 Importantly, calculation of results and thus numerical values for the same samples59 also may differ and only tests with a multipoint calibration curve give values proportional to serum antibody concentration.40 Differences in assays can make interpretation in and comparisons between clinical trials difficult. This difficulty is in part due to different epitopes that are detected by different tests, different calibration or to antibodies with lower avidity and specificity.67–69 Thus, an important shortcoming when using serology to evaluate the outcome of gluten challenge is the wide range of response. When the UK National External Quality Assessment consortium tested the same positive samples in 14 commercial anti-TG2 assays, large differences in antibody levels were found, consistent with substantial variability for antibodies used in the diagnosis of CD.65

Overall, a correlation exists between IgA-anti-TG2 antibody titres and the severity of mucosal damage by histology, as well as the histological outcome on a GFD.70–73 Yet, a recent meta-analysis found that serum TG2 and EMA often underestimate the degree of VA.74 Antibody titres below diagnostic cut-offs, thus, do not predict a normal or near-normal (Marsh I) histology. The one caveat is that biopsies usually sample a short segment of the (descending) duodenum, whereas active CD can affect large portions of the small intestine that occasionally extend down to the ileum.75 Therefore, a patient may be in clinical and serological remission with residual inflammation in the proximal duodenum but which is quantitatively much less extensive than before.74 Thus, in one of the largest studies to date,76 IgA anti-TG2 failed to detect 44% of persistent VA (Marsh III) in patients with CD on a GFD for >1 year.

Normal serology is generally required for entry into a gluten challenge study to ensure that participants do not have severe VA prior to gluten challenge. Conversely, serology above diagnostic cut-offs has not been used as an inclusion criterion for treatment studies in non-responsive CD because many people with a normal serology will still have VA and ongoing symptoms.74 Participants with elevated serologic tests may respond better to some therapies. While this remains to be confirmed, stratification by CD antibody levels at study entry should be considered.

In clinical trials, serology may be used in assessing change during gluten challenge or to monitor longer treatment studies. The antibody response to gluten challenge depends on four factors: duration of the previous GFD, daily amount of ingested gluten,54 duration of gluten intake and individual factors.

Patients with CD may tolerate different levels of gluten exposure. When low (1–3 g/day) or moderate (3–5 g/day) amounts of gluten were administered to 25 Finnish patients with CD in remission for 12 weeks, only 67% of the patients with CD showed signs of mucosal inflammation and 43% developed positive IgA-anti-TG2 antibodies.54 However, in a US study of 20 adult patients with CD in remission challenged with 3 or 7.5 g gluten/day for 2 weeks, Marsh III histology developed in 68% of the patients with CD, whereas anti-TG2 and anti-DGP antibodies increased in only 25% for anti-TG2% and 30% for anti-DGP. Remarkably, positivity for both antibodies increased to 55% and 45% 2 weeks after the end of gluten challenge.77 A recent study from Norway showed even lower proportion of patients responding serologically after 2 weeks.78 Therefore, the histology at week 2 and serology at week 4 combined showed a gluten response in nearly 90%, with no difference between both doses.77 Notably, some rare patients who had been on a GFD for years may develop a tolerance to gluten ingestion that may last for several years.79 Overall, high serologic titres, or significant increases in serologic titres, are predictive of VA, but substantial mucosal changes may occur without a significant change in serology.

It is important to note that the serological tests were developed for the diagnosis of CD. Currently, the Food and Drug Administration (FDA) have only cleared use of coeliac serologies as an aid for diagnosis of patients with suspected CD.80 This restriction limits how serologies can be used in regulatory trials, although they are routinely used for monitoring in clinical practice. To date, no manufacturer has submitted a claim for use of serological tests for disease monitoring and the FDA is only able ‘to review submitted claims’.80 Nonetheless, well validated IgA anti-TG2 and IgG anti-DGP tests will be important assets to clinical studies by helping to monitor CD activity during short and especially longer duration gluten challenge trials. Their further validation in ongoing trials may lead to FDA approval as secondary or combined (with histology or symptoms) primary end points in phase II and III clinical studies (table 1).

Recommendations: Although serology is not approved for use as a primary pivotal clinical trial outcome by the FDA, IgA TG2 and IgG DGP should, at a minimum, be measured at study entry and at completion in trials of CD (grade B). For entry into a gluten challenge study, participants should have near-normal titres, whereas for treatment studies, titres may be either elevated or normal, with stratification by serologic titre at study entry possibly as an a priori analysis (grade B). Choice of assay should be made with care and attention be paid to dynamic range, especially around or below the cut-off for normal ranges. Preferably, an assay with a calibration curve should be applied. Although cut-off ranges for diagnosis may not be optimal for monitoring response or predicting VA, any significant increase during a trial suggests increasing CD activity and may be used as a key outcome in some studies.

Vote: agree: 2; strongly agree: 17.

Immunological tools to measure treatment outcomes

Known innate, and particularly adaptive, immune mechanisms in CD, are prime candidates to measure treatment outcomes with and without gluten challenge. These may be non-invasive blood tests, duodenal biopsies with histological assessment by immunocytochemistry, gene expression signatures or in vitro culture.

With in vivo gluten challenge, there is rapid immune activation in the duodenum.81–83 One study found that interferon (IFN)-γ was increased both at baseline and with gluten challenge and for this reason does not appear to be a useful measure of disease activity.82 A whole-blood IFN-γ release assay is a much more promising measure for identifying immune responsiveness to gluten.84 85

Treated patients with CD and healthy individuals show highly variable differences in serum cytokines and chemokine levels.86 Gluten challenge, however, induces a wave of cytokine release.78

In the lamina propria of the duodenal mucosa gluten peptides are taken up by dendritic (antigen-presenting) cells with surface HLA-DQ2 or DQ8 MHC molecules to stimulate gluten peptide reactive CD4+ T cells.87–90 On day 6 of postgluten challenge an increase of active disease gluten responsive T cells was seen in peripheral blood.91 92 These T cells can be demonstrated in Enzyme-Linked ImmunoSpot (ELISPOT) assays of cytokines when rechallenged with gluten ex vivo. Overall, 80%–90% of treated patients with CD in remission will show a positive response on challenge. T cells can also be demonstrated by binding to HLA-DQ-gliadin peptide tetramers, a construct consisting of multimers of HLA-DQ2 or HLA-DQ8 molecules bound to a gluten peptide and a reporter molecule giving a signal in flow cytometry.93 94 Here, also 80%–90% of challenged patients with CD in remission will have a positive test.95 However, both the ELISPOT and the HLA-DQ2-gluten tetramer tests are confounded by large interindividual differences and small number of studied patients. Recently, the HLA-DQ2-gluten tetramer technology has demonstrated disease-specific T cells in the peripheral blood even without a gluten challenge.96 This, together with the demonstration of restricted T-cell receptors,97 may lead to new outcome measurements.

Serum IgA anti-TG2 antibody levels as biomarkers of disease activity seem to be a useful tool (see the ’Serology' section). Peripheral blood B cells may also prove to be a potential source of future biomarkers.98 The local mucosal production of antibodies targeting extracellular TG2 in vivo has shown potential in diagnosis,51 but was not informative beyond standard histology in a clinical drug trial with gluten challenge.99

Recommendations: Although several immunological markers are under development as potential outcome measures, they have not been validated for therapeutic trials. At this point, they should only be used as exploratory outcomes in phase II and III clinical trials (grade D).

Vote: agree: 2; strongly agree: 17.

Gluten immunogenic peptides as a compliance measure

Symptom monitoring, serology and histology are at best indirect measures of GFD adherence with imperfect overall accuracy.100 Similarly, diet questionnaires are poor predictors of gluten exposure.100

Gluten immunogenic peptides (GIPs), including the 33-mer peptide from α2-gliadin, are resistant to GI digestion.101 102 Because of this resistance, GIP can be detected in faeces or urine and thus provides direct evidence and likely quantitation of intake.103 A clinical trial examined correlations between faecal GIPs and traditional methods to monitoring the GFD.104 The majority (85.7%) of children with CD under 3 years of age had faeces negative for GIPs. Among those aged ≥13 years, faecal positivity for GIPs rose to 39.2%. More males were positive for GIPs in faeces compared with females (60% vs 31.5%, P=0.034). Serum IgA anti-TG2 was found negative in 40 of the 56 patients with GIP-positive stools. Today’s data suggest that GIP testing may be superior to questionnaires or anti-TG2 antibodies.104 Furthermore, a strong correlation has been demonstrated between the absence of GIPs in urine and healing of the intestinal epithelium.105 The first therapeutic clinical trials using the technology are ongoing (NCT02637141, NCT02633020). Whereas coeliac trial investigators and sponsors had to previously guess whether patients were consuming gluten or adhering to the GFD, this technology decreases the guesswork with data to accurately interpret results and outliers. The assay could possibly be developed for quantitative detection of the rate of glutamine residue deamidation in trials aiming at interfering with transglutaminase activity.106

Recommendations: GIP testing is a promising tool for evaluating and selecting patients for clinical trials in CD aimed at reducing toxicity related to gluten exposure (grade D). Hence, it should be considered in future trials, especially for trials in non-responsive CD for therapies that are designed to prevent symptoms because of inadvertent gluten exposure (grade D).

Vote: agree: 4; strongly agree: 15.

Clinical trial end points

The ideal clinical trial end point should be clearly linked to an outcome important to patients, reliable, responsive to treatment, clinically or physiologically proximal to the outcome of interest and efficient and scalable for use in diverse clinical trial settings. Although in some areas biomarkers may be used as primary outcomes in clinical trials, these rarely have sufficient data for regulatory acceptance.107 In contrast, clinical outcome assessments can be more easily linked to patient well-being: clinical outcome assessments are grouped into PROs, clinician-reported outcomes, observer-related outcomes and performance outcomes. Clinician-reported outcomes, such as physician global assessment, are of limited value as they do not directly assess patient status and generally do not correlate well with PROs. Observer-related outcomes can be vital in specific populations, such as young children, where direct response is not possible. Symptom-focused PROs are the main clinical outcome assessment in use in CD and in gastroenterology overall and are the focus of this section, although other clinical outcomes will also be discussed.

From a clinical/practical perspective, PROs can be helpful to monitor patient status and target quality improvement initiatives.108 A growing number of digital devices allow patients to track and transmit symptom data to their physician; however, for these to be useful for practising clinicians, PROs must be easy to administer and interpret, as well as to allow feasible clinical interventions.108 In research and clinical trials, the key features of PROs that should be considered are high responsiveness to change and low participant burden.

For patients, the ideal PRO must focus on the symptoms or disease attributes most meaningful to them while again minimising time and complexity of use. In CD, this is particularly important given the significant impact on emotional, mental and social well-being due to the constant vigilance required by the GFD. Finally, from a regulatory standpoint, the ability to use a PRO in a pivotal clinical trial to support a labelling claim depends on whether its characteristics (eg, concept being measured, content validity, conceptual framework, intended population, format, scoring) are satisfactory and clearly documented in a regulatory dossier, which is now available for only a few PROs.109

Presently available PROs frequently used in CD or developed for CD include the Gastrointestinal Symptom Rating Scale (GSRS),110 111 the Celiac Symptom Index (CSI),31 the Celiac Disease PRO (CeD-PRO),27 the Celiac Disease Symptom Diary (CDSD)112 and the Celiac Disease Assessment Questionnaire (CDAQ).113 Of these, the GSRS has been used most frequently, ranging from natural history to the effects of the GFD to clinical trials of novel therapeutics.110 114–116 The GSRS was developed originally for peptic disease and irritable bowel syndrome,117 but because the symptoms of many GI disorders overlap, it has proven useful for a variety of GI disorders, including CD.118 However, it is not optimised for CD and would not be appropriate for use in pivotal trials. Conversely, the CSI was developed specifically for CD and has been used in many cross-sectional and interventional studies.119 120 However, its development was completed before the 2009 FDA guidance109 and thus the CSI lacks much of the documentation necessary for regulatory clearance. Conversely, both the CDSD and the CeD-PRO were developed specifically for regulatory approval of CD therapeutics and are preferred instruments for this purpose.

The CDAQ was recently developed and assesses a variety of domains: symptoms, dietary burden, worry, stigma, and social isolation.121 As such, it is a hybrid of the health-related quality of life (HRQoL) tools discussed below and a symptom PRO. It is unclear if this instrument were developed and documented in line with regulatory guidance: in the CDAQ both constipation and diarrhoea are evaluated in a single combined question that may make changes in these important symptoms difficult to assess. However, the incorporation of these or similar HRQoL domains related to disease burden is critical in ensuring that outcomes are relevant and meaningful to patients.

Across these instruments, there is significant overlap, which is expected given the limited number of GI symptoms in general (table 2): common to all the PROs are diarrhoea, abdominal pain, bloating, and nausea. It should be acknowledged that while PROs in general may be developed and tested in specific diseases, they will not discriminate between diseases and therefore scores for patients with different GI disorders will overlap.122 There is also a poor correlation between symptoms, histology, and serology80 123 that is due in part to a different time to response after gluten exposure or coexisting symptoms due to irritable bowel syndrome or undetected food allergy.124 Thus, it is also clear from recent clinical trials that many symptomatic patients have no histological or serologic evidence of active CD and many patients with significant enteropathy have few or no symptoms.78 80 112 116 125 Meanwhile complete recovery of small intestinal lesions is very rare in adult CD patients despite symptom disappearance.126

Table 2

Symptoms assessed across PROs in CD

In addition, PRO use in CD can be challenging because of symptom heterogeneity (eg, asymptomatic or paucisymptomatic patients) and variable extraintestinal manifestations, for which no PROs are available. In contrast to disorders such as chronic constipation or headache in which one symptom defines the condition, symptoms in CD can vary substantially between individuals and hence careful attention to PRO use is mandatory. Moreover, responsiveness to change of non-symptom-based dimensions in PROs can vary significantly and must be carefully assessed in relation to the intervention under investigation. For example, measures of quality of life (QoL) may be less amenable to change if the intervention does not reduce dietary burden, social isolation and stigma, which are closely linked to managing the GFD. Comparing overall PRO mean scores at baseline and postintervention may dilute the treatment effect if not all domains change or if changes cancel out each other (eg, diarrhoea improves but results in constipation by disrupting bowel movements). Another option is to compare the means only in prespecified domains (eg, include diarrhoea but not constipation). However, this approach may result in a highly selected population that is not representative. A more sophisticated approach is to limit primary assessment of the effect of intervention to the symptoms most bothersome to a specific patient and then to include all individuals with bothersome symptoms in the final PRO assessment. Even with this approach, for treatment trials, only patients with symptoms measured by the primary PRO outcome can be enrolled. Given the heterogeneity of CD, it is likely that sequential trials will need to be performed with different outcomes in order to understand the utility of a particular therapy. For example, a therapy found to be useful in improving GI symptoms in CD could be assessed in a later trial assessing itch in patients with dermatitis herpetiformis.

Particularly noteworthy is that while great progress has been made in PROs in CD, there is limited experience using the more recent PROs in languages other than English and none has been validated for use in paediatric populations. Although it is expected that the PROs developed for CD in adults will be applicable for paediatric populations, this still requires validation. For example, young children and teens may define improved QoL differently because of unique challenges in school, social settings and peer relationships. Furthermore, improvement in extraintestinal symptoms, including behavioural changes, may be more relevant among paediatric patients. Development of formats suitable for caregivers of children unable to independently complete the questionnaires must also be advanced. Development of responder definitions and minimal clinically important change are additional areas requiring development to realise the potential of CD PROs in clinical research.

Given that the relative SD of histology is substantially smaller than that of symptoms, for a study where several hundred patients are needed to adequately detect differences in symptoms, fewer patients may be needed for the histology end point. Under such circumstances, researchers may choose to perform histology on a random sample of study participants provided that they have sufficient study power for the histology end point.

Recommendations: Clinical end points must be included in trials of CD and generally PROs should be a primary outcome in studies of treatment of active CD, generally late phase II and III (grade D). Thus far, there is insufficient evidence to recommend one specific scale, although the CDSD and the CeD-PRO appear most likely to meet regulatory requirements (grade D). Given the heterogeneity of symptoms in patients with CD, it is adequate to limit analyses to certain domains, either in the study overall or to allow for participant-specific symptom assessments. However, such decisions should be made at study entry and rigorously documented.

Vote: agree: 3; strongly agree: 16.

Health-related quality of life assessment

Clinical trials must also consider the ongoing psychological burden of CD to better understand the outcomes. Many studies in different settings suggest that CD has a considerable impact on HRQoL.12 127 In untreated CD, GI symptoms and extraintestinal issues, such as isolated anaemia, fatigue and malaise, may be responsible for reduced HRQoL.128 In general, the treatment of CD results in significant improvement in the HRQoL of symptomatic patients.129 Even in patients with silent or asymptomatic screening-detected CD, improvement in both symptoms and HRQoL has been shown in numerous studies.116 130–132

Impairment in HRQoL may also contribute to and be impacted by psychological disorders (eg, anxiety and depression) seen in patients with CD.12 133 Whereas anxiety generally improves after diagnosis and treatment CD, depression may exist before and persist after diagnosis.134 Moreover, anxiety and depression alone or through their impact on HRQoL can impair dietary compliance.12 135 The interaction between mood disorders, GFD adherence and HRQoL is incompletely understood and should be addressed in future trials. Additionally, the burden of a GFD may lead to ongoing HRQoL impairment despite symptom relief and improved physical well-being after dietary intervention.13 19

Because of small numbers of studies and variations in study designs, populations and HRQoL measures, we lack a complete understanding of degrees and drivers of HRQoL in individuals with untreated and treated CD. There are also few studies about the effect of CD treatment in the outcome of depression and anxiety. Therefore, further studies are required if we wish to know more about this specific aspect of CD.

Although HRQoL is generally not accepted by regulatory agencies as a primary outcome in pivotal trials, this is a key outcome for both patients and clinicians and may determine whether a therapy succeeds or fails. Arguably, a main goal of therapeutics in CD, in addition to histological and symptom improvement, is to improve HRQoL. Indeed, it is probable that some patients in histological and symptomatic remission could have a significant HRQoL benefit from pharmacological therapy owing to a reduction in burden of the GFD and anxiety regarding potential exposures. There are several HRQoL scales that have been specifically developed for or used in CD (table 3).

Table 3

Quality of life instruments relevant to coeliac disease

Recommendations: HRQoL in CD is complex and multidimensional and may be a more relevant concern to patients than specific symptoms. In any trial aiming to improve CD control (as opposed to gluten challenge studies that aim to prevent worsening), measurement of HRQoL should be considered a critical end point that may help to determine the overall value of a therapy or intervention to both patients and payors (grade D).

Vote: agree: 4; strongly agree: 15.


Strengths and limitations

In this review, a large number of authors reviewed the literature on treatment outcomes in coeliac trials to issue recommendations for future trials. Our research team was multidisciplinary and the treatment outcomes we have evaluated in the paper reflect the expert views of the authors.

We performed an extensive literature review of more than 10 000 papers, and based our deliberations on personal experiences and expertise from our treatment of patients with CD with clinical trials and our research in CD. There are already a number of guidelines for reporting treatment outcome in CD.136 Hence, our paper is not meant to give authoritative advice on the study design, which is not yet possible because of the developing nature of the field, but to complement available literature with our expertise with a focus on how to measure response to non-GFD treatment.

CD is a lifelong disease in which the GFD is the only treatment option. However, we suspect that soon other alternative treatments will become commonplace.

Regulatory agencies are responsible for evaluating new therapies based on risks and benefits to patients in how they feel, function or survive.80 These aims are of intuitive value to patients and clinicians and sufficiently broad that they should form the foundation of any interventional clinical trial. Despite this, the precedent in many fields, including gastroenterology, has been the use of poorly validated outcomes of limited applicability to clinical practise. Although CD adversely impacts survival6 137 138 and function,139–143 these outcomes are generally not feasible to assess in clinical trials because of low prevalence and long latency. This leaves the options of measuring how patients feel—mainly using PROs, histology and serology—to assess changes in risk of future adverse events. Ideally, a treatment should result in improvement in more than one outcome measures (PROs, histology and serology), and that is possible using a coprimary end point. However, coprimary end points should be used with caution since they decrease study power and the number of patients is often limited.

There are currently several well-designed and partially validated PROs developed for CD that should be considered standards for trials in the future. Assessment of extent (degree) of enteropathy (intestinal architecture and IEL assessment) should be considered as a critical outcome in clinical trials of CD. However, it is recognised that technical limitations of duodenal biopsy as a reflection of overall small intestinal mucosal disease limit the potential value of histology as an end point. Development techniques that better evaluate enteropathy across the small intestine are applicable in clinical practise and relevant to patients.



  • Contributors JFL and DL initiated the study. JFL coordinated the study and wrote the first draft, the text was then extensively revised by the coauthors. All authors contributed to the literature searches, contributed to the writing of the manuscript and approved the final version of the manuscript.

  • Funding JFL was supported by the Swedish Research Council (522-2A09-195) and the Swedish Society of Medicine while writing the draft of this paper. DS received research support from the German Research Foundation (DFG), the Ministry of Research and Development (BMBF) and the Leibniz Foundation. DSS received an educational grant from Biocard and Simtomax to undertake an investigator-led research study on point-of-care tests and an educational grant from Dr Schär (a gluten-free food manufacturer) to undertake an investigator-led research study on gluten sensitivity. KKa was supported by the Academy of Finland, the Competitive Research Funding of the Tampere University Hospital and The Sigrid Juselius Foundation. IRK-S was supported by the Hungarian Research Fund (Grant NKFI 120392). MJM was supported by the Competitive State Research Financing of the Tampere University Hospital (grant 9U038).

  • Competing interests PHRG: scientific advisory board of Alvine Pharmaceuticals, ImmunogenX and ImmusanT. JAM: serves on the advisory board of Celimmune, was a consultant to BioLineRx, GlaxoSmithKline (GSK), Genentech, UCB Biopharma and Glenmark Pharmaceuticals Ltd and is a consultant to ImunnosanT, Institute for Protein Design (PvP Biologics), Takeda Pharmaceutical Company, Ltd., Innovate Biopharmaceuticals, Inc., Intrexon, 2GPharma Inc., Boeringer-Ingelheim and ImmusanT. KEAL: ImmusanT, Regeneron and Alvine Pharmaceuticals. DSS: holds a patent and receives royalties for the TG2-antibody assay, has received an educational grant from Coeliac UK, Biocard, Simtomax and Dr Schär to undertake an investigator-led research study on CD and/or gluten sensitivity. NRR: clinical advisory board for ImmusanT. DAL: Medical Director for Takeda Pharmaceuticals AR-H: coauthor detecting gluten peptides in human fluids (Patent No. US 20170160288 A1), consultant for Vircell. IRK-S: patent on rapid coeliac antibody detection licensed by the University of Tampere to Labsystems Oy, Finland. MJM: serves on the Advisory Board of Celimmune, USA, ImmusanT, USA and Innovate Pharmaceuticals Inc, USA; is consultant to FinnMedi Oy, Finland and Jilab Oy, Finland via his own company Maki HealthTech Oy, Finland; is an inventor in the patent Methods and Means for Detecting Gluten-Induced Diseases, USA (Patent No. 7,361,480—USA, European Patent No. 1390753). The patent resulted in a commercial product from FinnMedi at the Tampere University Hospital and the University of Tampere, a coeliac disease point-of-care test, Biocard Celiac Test, licensed to Labsystems Diagnostics Oy (former AniBiotech Oy), Finland.

  • Provenance and peer review Not commissioned; externally peer reviewed.