Article Text

Original research
Efficacy of biological therapies and small molecules in moderate to severe ulcerative colitis: systematic review and network meta-analysis
  1. Nicholas E Burr1,2,
  2. David J Gracie2,
  3. Christopher J Black2,
  4. Alexander C Ford2,3
  1. 1 Department of Gastroenterology, Mid Yorkshire Hospitals NHS Trust, Wakefield, UK
  2. 2 Leeds Gastroenterology Institute, Leeds Teaching Hospitals NHS Trust, Leeds, UK
  3. 3 Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK
  1. Correspondence to Professor Alexander C Ford, Leeds Gastroenterology Institute, Leeds Teaching Hospitals NHS Trust, Leeds, Leeds, UK; alexf12399{at}yahoo.com

Abstract

Objective Biological therapies and small molecules continue to be evaluated in moderate to severely active ulcerative colitis, but are often studied in placebo-controlled trials, meaning their relative efficacy and safety is unknown. We examined this in a network meta-analysis.

Design We searched the literature to October 2021 to identify eligible trials. We judged efficacy using clinical remission, endoscopic improvement, or clinical response, and according to previous exposure or non-exposure to antitumour necrosis factor (TNF)-α therapy. We also assessed safety. We used a random effects model and reported data as pooled relative risks (RRs) with 95% CIs. Interventions were ranked according to their P-score.

Results We identified 28 trials (12 504 patients). Based on failure to achieve clinical remission, upadacitinib 45 mg once daily ranked first versus placebo (RR 0.73; 95% CI 0.68 to 0.80, P-score 0.98), with infliximab 5 mg/kg and 10 mg/kg second and third, respectively. Upadacitinib ranked first for clinical remission in both patients naïve to anti-TNF-α drugs (RR 0.69; 95% CI 0.61 to 0.78, P-score 0.99) and previously exposed (RR 0.78; 95% CI 0.72 to 0.85, P-score 0.99). Upadacitinib was superior to almost all other drugs in these analyses. Based on failure to achieve endoscopic improvement infliximab 10 mg/kg ranked first (RR 0.61; 95% CI 0.51 to 0.72, P-score 0.97), with upadacitinib 45 mg once daily, second, and infliximab 5 mg/kg third. Upadacitinib was more likely to lead to adverse events, but serious adverse events were no more frequent, and withdrawals due to adverse events were significantly lower than with placebo. Infections were significantly more likely with tofacitinib than placebo (RR 1.41; 95% CI 1.03 to 1.91).

Conclusion In a network meta-analysis, upadacitinib 45 mg once daily ranked first for clinical remission in all patients, patients naïve to anti-TNF-α drugs and patients previously exposed. Infliximab 10 mg/kg ranked first for endoscopic improvement. Most drugs were safe and well tolerated.

  • ulcerative colitis
  • meta-analysis

Data availability statement

No data are available.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Significance of this study

What is already known on this subject?

  • Ulcerative colitis follows a relapsing and remitting course, with intermittent flares of disease activity, some of which may be moderate to severe.

  • These are usually treated with corticosteroids, which have potentially serious adverse effects, so biological therapies and small molecules have been developed and licensed for this indication.

  • Although previous network meta-analyses have compared their efficacy and safety, this is a rapidly moving field, and there are already several newer drugs that have shown efficacy in phase III clinical trials that were not considered in these.

What are the new findings?

  • In terms of clinical remission and clinical response, upadacitinib 45 mg once daily ranked first in all patients, in patients previously exposed to antitumour necrosis factor (TNF)-α therapies, and in patients naïve to these drugs.

  • In terms of endoscopic improvement infliximab 10 mg/kg ranked first, followed by upadacitinib 45 mg once daily, and infliximab 5 mg/kg.

  • However, for endoscopic improvement again upadacitinib 45 mg once daily ranked first in patients previously exposed to anti-TNF-α therapies, and in patients who were anti-TNF-α naïve.

  • None of the drugs studied were more likely to lead to serious adverse events than placebo.

  • Vedolizumab 300 mg was the least likely drug to lead to infections, which were significantly more likely with tofacitinib 10 mg two times per day than with either placebo or vedolizumab 300 mg.

Significance of this study

How might it impact on clinical practice in the foreseeable future?

  • These data are useful for informing treatment decisions for patients with moderate to severely active ulcerative colitis and can be incorporated in future updates of evidence-based management guidelines.

  • The results of this network meta-analysis could also be used to inform a cost-effectiveness analysis to help guide future treatment selection.

  • It is important to point out that the trials of upadacitinib are yet to be published in full.

Introduction

Ulcerative colitis (UC) is a chronic inflammatory disorder of the bowel that causes continuous mucosal inflammation commencing in the rectum and extending proximally for a variable extent.1 It is estimated that UC affects 2.5 million people in Europe,2 and the disease follows a relapsing and remitting course, with intermittent flares of disease activity, some of which may be moderate to severe. Management of these is medical, for the most part, with surgery reserved for patients with refractory disease. Although 5-aminosalicylates are efficacious for mild to moderate disease activity,3–6 more severe flares are usually treated with corticosteroids.7 However, these have potentially serious adverse effects and a substantial proportion of patients may become either dependent on them to maintain remission,8 or refractory to them.9 As a result, over the last 20 years novel drugs, with more precise modes of action, based on mechanisms of disease identified in genome wide association studies,10 have been developed.

The first of these agents was infliximab, a drug targeting the pro-inflammatory cytokine tumour necrosis factor-α (TNF-α), which demonstrated efficacy in clinical trials in moderate to severe UC.11 Since then, other drugs against TNF-α have been tested, such as adalimumab and golimumab.12 13 In addition, newer biological therapies targeting α or β integrins, which are involved in migration of immune cells to inflamed intestinal mucosa, such as vedolizumab or etrolizumab,14 15 or acting against other proinflammatory cytokines implicated in the pathogenesis of UC, such as ustekinumab,16 have been tested. However, even these more selective drugs do not work in all patients, there may be risks associated with their use,17–19 and the fact that they are administered either intravenously or subcutaneously, may be inconvenient for patients. The search for alternative agents for the treatment of UC has, therefore, continued.

In the last 10 years, small molecules, which can be administered orally and on a daily basis, have also been evaluated in moderate to severe UC. These include janus kinase inhibitors, such as tofacitinib,20 and the sphingosine-1-phosphate receptor modulator, ozanimod.21 The comparative efficacy and safety of all these drugs has been assessed in prior network meta-analyses.22 23 These demonstrated that infliximab was ranked highest overall for efficacy, and ustekinumab and tofacitinib were ranked highest in patients with previous anti-TNF-α exposure. However, this is a rapidly moving field, and there are already several newer drugs that have shown efficacy in phase III clinical trials that were not considered in these network meta-analyses.24–26 We have, therefore, performed an updated network meta-analysis to evaluate the efficacy of all biological therapies and small molecules that have progressed on to phase III trials, compared with each other or with placebo, in terms of induction of remission, endoscopic improvement and clinical response, as well as safety, in patients with moderate to severely active UC.

Methods

Search strategy and selection criteria

We searched MEDLINE (1946 to 2 October 2021), EMBASE and EMBASE Classic (1947 to 2 October 2021), and the Cochrane central register of controlled trials. In addition, we searched ClinicalTrials.gov for recently completed trials or online supplemental file 1 for potentially eligible randomised controlled trials (RCTs). We handsearched conference proceedings (Digestive Diseases Week, American College of Gastroenterology, United European Gastroenterology Week and the Asian Pacific Digestive Week) between 2001 and 2021 to identify trials published only in abstract form. Finally, we used bibliographies of all obtained articles to perform a recursive search.

Supplemental material

Eligible RCTs examined the efficacy of biological therapies (anti-TNFα antibodies (infliximab, adalimumab or golimumab), anti-integrin antibodies (vedolizumab or etrolizumab) or anti-interleukin-12/23 antibodies (ustekinumab)) or small molecules (janus kinase inhibitors (tofacitinib, filgotinib or upadacitinib) or sphingosine-1-phosphate receptor modulators (ozanimod)) at the doses taken through into testing in phase III clinical trials. Studies had to recruit ambulatory adults (≥18 years) with moderate to severely active UC (online supplemental table 1), and compared biological therapies or small molecules with placebo, or with each other. We required a minimum follow-up duration of 6 weeks.

Two investigators (NEB and ACF) conducted independent literature searches. We identified studies on UC with the terms: inflammatory bowel disease, colitis or UC (both as medical subject headings and free-text terms). We combined these using the set operator AND with studies identified with the following terms: infliximab, remicade, adalimumab, humira, golimumab, simponi, vedolizumab, entyvio, etrolizumab, ustekinumab, stelara, tofacitinib, xeljanz, filgotinib, upadacitinib or ozanimod, and applied a clinical trials filter. There were no language restrictions. Two investigators (NEB and ACF) evaluated all abstracts identified by the search independently. We obtained potentially relevant papers and evaluated them in more detail, using predesigned forms, to assess eligibility independently and according to the predefined criteria. We translated foreign language papers, where required. We resolved disagreements between investigators by discussion.

Outcome assessment

We assessed the efficacy of biological therapies or small molecules, compared with placebo or each other, in terms of failure to achieve clinical remission, failure to achieve endoscopic improvement or failure to achieve clinical response, at last point of follow-up of the induction of remission phase of the trial. Other outcomes assessed included adverse events (total numbers of adverse events, as well as serious adverse events, infections and adverse events leading to study withdrawal), if reported.

Data extraction

Once agreement on eligibility was reached, two investigators (NEB or CB, and ACF) extracted data from all eligible studies independently from each other onto a Microsoft Excel spreadsheet (XP professional edition; Microsoft, Redmond, Washington, USA) as dichotomous outcomes (clinical remission or no clinical remission, endoscopic improvement or no endoscopic improvement, clinical response, or no clinical response). We assessed efficacy according to the proportion of patients failing to achieve (1) clinical remission; (2) endoscopic improvement and (3) clinical response. We also extracted the following data for each trial, where available: country of origin, number of centres, disease extent, proportion of patients who were naïve to anti-TNF-α therapy, dose and treatment schedule of active therapy and placebo, and duration of follow-up. When judging efficacy we extracted data as intention-to-treat analyses, with drop-outs assumed to be treatment failures (ie, no response to biological therapy, small molecule or placebo), wherever trial reporting allowed. If this was not clear from the original article, we performed an analysis on all patients with reported evaluable data. When judging safety, we used the number of patients receiving at least one dose of the study drug, wherever possible. We compared results of the two investigators’ data extraction and all discrepancies were highlighted and resolved by discussion between the four investigators.

Quality assessment and risk of bias

We used the Cochrane risk of bias tool to assess this at the study level.27 Two investigators (NEB or CB, and ACF) performed this independently, resolving any disagreements by discussion. We recorded the method used to generate the randomisation schedule and conceal treatment allocation, as well as whether blinding was implemented for participants, personnel and outcomes assessment, whether there was evidence of incomplete outcomes data, and whether there was evidence of selective reporting of outcomes.

Data synthesis and statistical analysis

We performed a network meta-analysis using the frequentist model, with the statistical package ‘netmeta’ (V.0.9–0, https://cran.r-project.org/web/packages/netmeta/index.html) in R (V.4.0.2). We reported this according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension statement for network meta-analyses,28 to explore direct and indirect treatment comparisons of the efficacy and safety of each intervention. Network meta-analysis results can give a more precise estimate, compared with those from standard, pairwise analyses,29 30 and can be used to rank interventions to inform clinical decisions.31

We examined the symmetry and geometry of the evidence by producing a network plot with node size corresponding to number of study subjects, and connection size corresponding to number of studies. We also produced comparison adjusted funnel plots to explore publication bias or other small study effects, for all available comparisons, using Stata V.16 (StataCorp). This is a scatterplot of effect size versus precision, measured via the inverse of the SE. Symmetry around the effect estimate line indicates absence of publication bias, or small study effects.32 We used a pooled relative risk (RR) with 95% CIs to judge efficacy of each comparison tested, using a random effects model as a conservative estimate. We used an RR of failure to achieve each of the endpoints of interest (clinical remission, endoscopic improvement or clinical response). This approach is more stable, compared with RR of improvement, or using the OR, for some meta-analyses.33 As there were direct comparisons between some active therapies we were able to perform consistency modelling to check the correlation between direct and indirect evidence across the network.34 These network heat plots have grey squares representing the size of the contribution of the direct estimate of one study design in columns, compared with the network estimate in rows.35 The coloured squares around these represent the change in inconsistency between direct and indirect evidence in a network estimate in the row after relaxing the consistency assumption for the effect of one design in the column. Red squares indicate ‘hotspots’ of inconsistency, whereas cooler blue colours indicate that the direct evidence of the design in the column supports the indirect evidence in the row.

Many meta-analyses use the I2 statistic to measure heterogeneity, which ranges between 0% and 100%.36 This statistic is easy to interpret and does not vary with the number of studies. However, the I2 value can increase with the number of patients included in the meta-analysis.37 We, therefore, assessed global statistical heterogeneity across all comparisons using the τ2 measure from the ‘netmeta’ statistical package. Estimates of τ2 of approximately 0.04, 0.16 and 0.36 are considered to represent low, moderate and high levels of heterogeneity, respectively.38

We ranked all biological therapies and small molecules, versus placebo or each other, according to their P-score, which is a value between 0 and 1. P-scores are based solely on point estimates and SEs from the network estimates, and measure the mean extent of certainty that one intervention is better than another, averaged over all competing interventions.39 Higher scores indicate a greater probability of the intervention being ranked as best,39 but the magnitude of the P-score should be considered, as well as the rank. The mean value of the P-score is always 0.5, so if individual interventions cluster around this value they are likely to be similarly efficacious. However, when interpreting the results, it is also important to take the RR and corresponding 95% CI for each comparison into account, rather than relying on rankings alone.40 In our primary analyses, we pooled data for all patients, but we also performed a priori subgroup analyses for each efficacy endpoint according to whether or not patients had been exposed to anti-TNF-α drugs previously.

Results

The search strategy generated 3371 citations, 81 of which appeared relevant and were retrieved for further assessment. Of these, we excluded 58 that did not fulfil eligibility criteria, with reasons provided in online supplemental figure 1), leaving 23 eligible articles, reporting on 28 RCTs. Twenty-seven of these trials were published, in 22 separate articles,11–13 16 20 21 24–26 41–53 and the results of one RCT was posted on ClinicalTrials.gov (NCT01551290). These 28 trials recruited 12 504 patients, allocated to active therapy or placebo as described in online supplemental table 2. Agreement between investigators for trial eligibility was excellent (kappa statistic=0.83). Detailed characteristics of individual RCTs are provided in online supplemental table 3. Risk of bias for all included trials is reported in online supplemental table 4. Nine RCTs were at low risk of bias across all domains.20 21 42 44 45 48 50 51 Endpoints used in each trial are provided in online supplemental table 5.

Clinical remission

All 28 trials reported data for this endpoint at between 6 and 14 weeks11–13 16 20 21 24–26 41–53 (NCT01551290). The network plot is provided in online supplemental figure 2. When data were pooled, there was low heterogeneity (τ2=0.0021), and the funnel plot appeared symmetrical (online supplemental figure 3). All drugs, other than adalimumab 160/160 mg, adalimumab 80/40 mg, and filgotinib 100 mg once daily were superior to placebo. However, upadacitinib 45 mg once daily ranked first for efficacy (RR of failure to achieve clinical remission=0.73; 95% CI 0.68 to 0.80, P-score 0.98) (figure 1A), meaning that the probability of upadacitinib 45 mg once daily being the most efficacious drug was 98%. Infliximab 5 mg/kg ranked second (RR 0.78; 95% CI 0.72 to 0.84, P-score 0.92), infliximab 10 mg/kg third (RR 0.80; 95% CI 0.72 to 0.89, P-score 0.84) and tofacitinib 10 mg two times per day fourth (RR 0.86; 95% CI 0.80 to 0.93, P-score 0.64). The network heat plot had no red ‘hotspots’ of inconsistency (online supplemental figure 4). After direct and indirect comparison, upadacitinib 45 mg once daily was superior to all other drugs, except infliximab 5 mg/kg and 10 mg/kg (table 1). Infliximab 5 mg/kg was superior to ozanimod 1 mg once daily, vedolizumab 300 mg, ustekinumab 130 mg and 6 mg/kg, etrolizumab 105 mg, filgotinib 200 mg and 100 mg once daily and adalimumab 160/80 mg and 80/40 mg. Infliximab 10 mg/kg was superior to adalimumab 160/80 mg, adalimumab 80/40 mg and filgotinib 100 mg once daily.

Figure 1

(A) Forest plot for failure to achieve clinical remission: all patients. The P-score is the probability of each intervention being ranked as best in the network. (B) Forest plot for failure to achieve clinical remission: Patients naïve to anti-TNF-α therapies. The P-score is the probability of each intervention being ranked as best in the network. (C) Forest plot for failure to achieve clinical remission: Patients exposed to anti-TNF-α therapies previously. The P-score is the probability of each intervention being ranked as best in the network. RR, relative risk; TNF-α, tumour necrosis factor-α.

Table 1

League table for failure to achieve clinical remission: all patients

Eleven trials reported clinical remission in a subset of patients naïve to anti-TNF-α therapies,16 24 26 43 45–47 49 53 including 1 trial of adalimumab,43 and another 12 trials only recruited patients naïve to these drugs11–13 25 41 42 44 50 51 (NCT01551290). Therefore, in total, there were 23 separate RCTs, recruiting 7702 patients. When data were pooled, there was low heterogeneity (τ2=0.0030). In patients naïve to anti-TNF-α therapies all drugs, other than ustekinumab 130 mg, golimumab 200/100 mg, ustekinumab 6 mg/kg, filgotinib 100 mg once daily, and adalimumab 80/40 mg, were superior to placebo. Upadacitinib 45 mg once daily ranked first for clinical remission (RR of failure to achieve clinical remission=0.69; 95% CI 0.61 to 0.78, P-score 0.99) (figure 1B), with infliximab 5 mg/kg second (0.78; 95% CI 0.72 to 0.84, P-score 0.87), infliximab 10 mg/kg third (RR 0.80; 95% CI 0.71 to 0.90, P-score 0.77) and vedolizumab 300 mg fourth (RR 0.84; 95% CI 0.76 to 0.92, P-score 0.65). On direct and indirect comparison again upadacitinib 45 mg once daily was superior to all other drugs, except infliximab 5 mg/kg and infliximab 10 mg/kg (online supplemental table 6).

Eleven RCTs reported on clinical remission in a subset of patients exposed to anti-TNF-α therapies previously,16 24 26 43 45–47 49 53 and two trials recruited only patients with previous exposure to these drugs.25 48 There were 3690 patients included in these 13 trials, and low heterogeneity between them (τ2=0.0015). In this analysis upadacitinib 45 mg once daily, ustekinumab 6 mg/kg, tofacitinib 10 mg two times per day, ustekinumab 130 mg and etrolizumab 105 mg were superior to placebo, with upadacitinib ranked first (RR of failure to achieve clinical remission=0.78; 95% CI 0.72 to 0.85, P-score 0.99) (figure 1C). On direct and indirect comparison upadacitinib 45 mg once daily was superior to all other drugs except ustekinumab 6 mg/kg (online supplemental table 7).

Endoscopic improvement

In total, 27 RCTs reported data for this endpoint at 6–14 weeks,11–13 16 20 21 24–26 41–49 52 (NCT01551290) including 11 733 patients. There was low heterogeneity between studies (τ2=0), but the funnel plot appeared asymmetrical (online supplemental figure 5). This was driven by a small RCT of tofacitinib 100 mg once daily,20 and disappeared with its exclusion from the analysis. All drugs, other than adalimumab 80/40 mg were superior to placebo. Infliximab 10 mg/kg ranked first for efficacy (RR of failure to achieve endoscopic improvement=0.61; 95% CI 0.51 to 0.72, P-score 0.97) (figure 2A). Upadacitinib 45 mg once daily ranked second (RR 0.65; 95% CI 0.61 to 0.70, P-score 0.93) and infliximab 5 mg/kg third (RR 0.67; 95% CI 0.60 to 0.73, P-score 0.90). The network heat plot had no red ‘hotspots’ of inconsistency (online supplemental figure 6). After direct and indirect comparison, infliximab 10 mg/kg was superior to all other drugs, except upadacitinib 45 mg once daily and infliximab 5 mg/kg (table 2). Upadacitinib 45 mg once daily was superior to all drugs except infliximab 5 mg/kg, and infliximab 5 mg/kg was superior to all other drugs except golimumab 400/200 mg.

Figure 2

(A) Forest plot for failure to achieve endoscopic improvement: all patients. The P-score is the probability of each intervention being ranked as best in the network. (B) Forest plot for failure to achieve endoscopic improvement: patients naïve to anti-TNF-α therapies. The P-score is the probability of each intervention being ranked as best in the network. (C) Forest plot for failure to achieve endoscopic improvement: patients exposed to anti-TNF-α therapies previously. The P-score is the probability of each intervention being ranked as best in the network. RR, relative risk; TNF-α, tumour necrosis factor-α.

Table 2

League table for failure to achieve endoscopic improvement: all patients

Eight trials reported endoscopic improvement in a subset of patients naïve to anti-TNF-α therapies,16 26 43 45 46 49 including 1 trial of adalimumab,43 and 12 trials only recruited patients naïve to these drugs11–13 25 41 42 44 50 51 (NCT01551290). Therefore, data from 20 separate RCTs, recruiting 6610 patients, were pooled, with low heterogeneity between studies (τ2=0). All drugs, other than filgotinib 100 mg once daily and adalimumab 80/40 mg were superior to placebo, but upadacitinib 45 mg once daily ranked first (RR 0.58; 95% CI 0.51 to 0.66, P-score 0.97), with infliximab 10 mg/kg (RR 0.61; 95% CI 0.51 to 0.72, P-score 0.93) and infliximab 5 mg/kg (RR 0.66; 95% CI 0.60 to 0.73, P-score 0.85) second and third, respectively (figure 2B). On direct and indirect comparison, both upadacitinib 45 mg once daily and infliximab 10 mg/kg were superior to all other drugs, except infliximab 5 mg/kg and vedolizumab 300 mg, and infliximab 5 mg/kg was superior to all other drugs, except vedolizumab 300 mg and golimumab 400/200 mg (online supplemental table 8).

Finally, eight RCTs reported on endoscopic improvement in a subset of patients exposed to anti-TNF-α therapy previously,16 26 43 45 46 49 and two trials recruited only patients previously exposed to these drugs.25 48 There were 3282 patients included in these 10 trials, and low heterogeneity between them (τ2=0.0009). Upadacitinib 45 mg once daily (RR 0.71; 95% CI 0.65 to 0.77, P-score 1.00), tofacitinib 10 mg two times per day (RR 0.82; 95% CI 0.76 to 0.89, P-score 0.78), ustekinumab 6 mg/kg (RR 0.84; 95% CI 0.76 to 0.94, P-score 0.69), ustekinumab 130 mg (RR 0.87; 95% CI 0.79 to 0.97, P-score 0.56), and filgotinib 200 mg once daily (RR 0.90; 95% CI 0.82 to 0.99, P-score 0.47) were superior to placebo, with upadacitinib 45 mg once daily ranked first (figure 2C). On direct and indirect comparison, upadacitinib was superior to all other drugs (online supplemental table 9).

Clinical response

Clinical response was reported by all 28 trials at 6–14 weeks11–13 16 20 21 24–26 41–49 52 53 (NCT01551290). There was low heterogeneity between studies (τ2=0.0088), and the funnel plot appeared symmetrical (online supplemental figure 7). All drugs, other than adalimumab 80/40 mg, were superior to placebo, but upadacitinib 45 mg once daily ranked first (RR of no clinical response=0.36; 95% CI 0.29 to 0.43, P-score 1.00), followed by infliximab 10 mg/kg (RR 0.55; 95% CI 0.43 to 0.69, P-score 0.84), ustekinumab 6 mg/kg (0.56; 95% CI 0.44 to 0.71, P-score 0.81), and infliximab 5 mg/kg fourth (RR 0.57; 95% CI 0.49 to 0.66, P-score 0.81) (figure 3A). The network heat plot had red ‘hotspots’ of inconsistency related to study designs comparing vedolizumab 300 mg and adalimumab 160/80 mg directly and those comparing vedolizumab 300 mg and placebo directly (online supplemental figure 8). This reflects the disparity between the direct comparison of adalimumab 160/80 mg and vedolizumab 300 mg from the VARSITY trial,53 compared with the indirect estimate generated from trials comparing either adalimumab 160/80 mg or vedolizumab 300 mg with placebo.12 43–46 This is highlighted in table 3. Upadacitinib 45 mg once daily was superior to all other drugs (table 3). Infliximab 10 mg/kg, ustekinumab 6 mg/kg, infliximab 5 mg/kg and filgotinib 200 mg once daily were superior to filgotinib 100 mg once daily, etrolizumab 105 mg and adalimumab 160/80 mg and 80/40 mg.

Figure 3

(A) Forest plot for failure to achieve clinical response: all patients. The P-score is the probability of each intervention being ranked as best in the network. (B) Forest plot for failure to achieve clinical response: patients naïve to anti-TNF-α therapies. The P-score is the probability of each intervention being ranked as best in the network. (C) Forest plot for failure to achieve clinical response: patients exposed to anti-TNF-α therapies previously. The P-score is the probability of each intervention being ranked as best in the network. RR, relative risk; TNF-α, tumour necrosis factor-α.

Table 3

League table for failure to achieve clinical response: all Patients

Eight trials reported on clinical response in a subset of patients naïve to anti-TNF-α therapies,16 20 26 43 45 46 53 including 1 trial of adalimumab,43 and 12 trials only recruited patients naïve to these drugs11–13 25 41 42 44 50 51 (NCT01551290). Therefore, data from 20 separate RCTs, recruiting 6778 patients, were pooled. There was low heterogeneity between studies (τ2=0.0103), and overall upadacitinib 45 mg once daily ranked first (RR 0.30; 95% CI 0.23 to 0.40, P-score 1.00), followed by ustekinumab 6 mg/kg (RR 0.52; 95% CI 0.37 to 0.72, P-score 0.81), and infliximab 10 mg/kg (0.54; 95% CI 0.43 to 0.69, P-score 0.78) (figure 3B). Upadacitinib 45 mg once daily was superior to all other drugs (online supplemental table 10), with both ustekinumab 6 mg/kg and infliximab 10 mg/kg superior to etrolizumab 105 mg, and adalimumab 160/80 mg and 80/40 mg.

Finally, eight RCTs reported on clinical response in a subset of patients exposed to anti-TNF-α therapy previously,16 20 26 43 45 46 53 and two trials recruited only patients previously exposed to these drugs.25 48 There were 2850 patients randomised in these 10 RCTs. Overall, there was low heterogeneity between studies (τ2=0.0224), and upadacitinib 45 mg once daily was again ranked first (RR 0.39; 95% CI 0.30 to 0.51, P-score 0.97), with filgotinib 200 mg once daily second (RR 0.57; 95% CI 0.41 to 0.79, P-score 0.76), and ustekinumab 6 mg/kg third (0.58; 95% CI 0.41 to 0.83, P-score 0.74) (figure 3C). No other drugs were superior to placebo. The league ranking is provided in online supplemental table 11. Upadacitinib 45 mg once daily was superior to ustekinumab 130 mg, filgotinib 100 mg once daily, etrolizumab 105 mg, vedolizumab 300 mg and adalimumab 160/80 mg. Both filgotinib 200 mg once daily and ustekinumab 6 mg/kg were superior to adalimumab 160/80 mg.

Adverse events

In terms of total number of adverse events, 27 RCTs reported these data in 11 840 patients11–13 16 20 21 24–26 41–51 53 (NCT01551290). Heterogeneity was low between studies (τ2=0), with ustekinumab 130 mg the least likely drug to lead to adverse events (RR of adverse events=0.86; 95% CI 0.72 to 1.03, P-score 0.89) and upadacitinib 45 mg once daily. the most likely (RR 1.56; 95% CI 1.16 to 2.09, P-score 0.01) (online supplemental figure 9). Upadacitinib 45 mg once daily was more likely to lead to adverse events than all other drugs, except adalimumab 80/40 mg (online supplemental table 12). None of the drugs were more likely to lead to a serious adverse event than placebo in 27 trials11–13 16 20 21 24–26 41–51 53 (NCT01551290). The RR of serious adverse events was significantly lower with vedolizumab 300 mg and golimumab 200/100 mg, which was ranked first (RR 0.45; 95% CI 0.21 to 0.97, P-score 0.80), with etrolizumab 105 mg ranked last (RR 1.18 95% CI 0.79 to 1.76, P-score 0.10) (online supplemental figure 10). Serious adverse events were more likely with etrolizumab 105 mg than with golimumab 200/100 mg, ustekinumab 6 mg/kg, vedolizumab 300 mg, and infliximab 5 mg/kg (online supplemental table 13). In terms of infections, in 23 RCTs,11–13 16 20 24 25 41–46 48–51 53 (NCT01551290) tofacitinib 10 mg two times per day was ranked last, and infections were more likely than with placebo (RR of infection=1.41; 95% CI 1.03 to 1.91, P-score 0.11) (online supplemental figure 11), with vedolizumab 300 mg ranked first and significantly less likely to lead to infections than tofacitinib 10 mg two times per day (online supplemental table 14). There were no other significant differences between drugs, and no other drug was more likely than placebo to lead to infections. Finally, in 24 trials,11–13 20 21 24–26 41–44 46 48–51 53 (NCT01551290) withdrawals due to adverse events were significantly less likely with upadacitinib 45 mg once daily than with placebo (RR of withdrawal due to an adverse event=0.26; 95% CI 0.12 to 0.57, P-score 0.92) (online supplemental figure 12), which was ranked first, but there were no other significant differences between individual drugs and placebo. Among individual drugs, etrolizumab 105 mg was ranked last (RR 1.12; 95% CI 0.56 to 2.23, P-score 0.23). Upadacitinib 45 mg once daily was less likely to lead to withdrawal due to an adverse event than infliximab 5 mg/kg and 10 mg/kg, vedolizumab 300 mg, adalimumab 160/80 mg or 80/40 mg, filgotinib 200 mg once daily, and etrolizumab 105 mg (online supplemental table 15).

Discussion

We conducted a contemporaneous systematic review and network meta-analysis of biological therapies and small molecules for moderate to severely active UC. This has incorporated data from 28 RCTs and over 12 500 patients. Overall, in terms of clinical remission and clinical response at 6 to 14 weeks, upadacitinib 45 mg once daily ranked first in all patients, in patients previously exposed to anti-TNF-α therapies, and in patients naïve to these drugs. In terms of endoscopic improvement infliximab 10 mg/kg ranked first, followed by upadacitinib 45 mg once daily and infliximab 5 mg/kg. However, again upadacitinib 45 mg once daily ranked first in patients previously exposed to anti-TNF-α therapies, and in patients who were anti-TNF-α naïve. In terms of safety, upadacitinib 45 mg once daily ranked last for total number of adverse events, and ustekinumab 130 mg first. However, none of the drugs studied were more likely to lead to serious adverse events than placebo, although etrolizumab 105 mg was more likely to lead to serious adverse events than golimumab 200/100 mg, ustekinumab 6 mg/kg, vedolizumab 300 mg, and infliximab 5 mg/kg. Vedolizumab 300 mg was the least likely drug to lead to infections. Infections were significantly more likely with tofacitinib 10 mg two times per day than with either placebo or vedolizumab 300 mg. Finally, withdrawals due to adverse events were significantly less likely with upadacitinib 45 mg once daily than with placebo. Upadacitinib 45 mg once daily was significantly less likely to lead to withdrawals due to an adverse event than infliximab 5 mg/kg and 10 mg/kg, vedolizumab 300 mg, adalimumab 160/80 mg or 80/40 mg, filgotinib 200 mg once daily, and etrolizumab 105 mg. Applying Grades of Recommendations, Assessment, Development and Evaluation criteria to our estimates of effects, certainty in the quality of evidence would be high.

There are some limitations. Only 9 of 27 trials were at low risk of bias across all domains. Given the timespan of included studies, there is the possibility that trials of newer drugs included patients with refractory UC who had failed multiple other therapies. However, some of these more recent trials only recruited patients who were naïve to anti-TNF-α therapies, and many other trials reported efficacy data in subsets of patients who had, or had not, been exposed to these drugs. It is important to point out that comparisons in the latter group of trials may not be protected by randomisation. Few trials reported efficacy according to concomitant immunomodulator use, and the earlier trials of infliximab excluded such patients.11 Given combination therapy has been shown to be superior to either monotherapy in one trial,54 this may have underestimated efficacy of some drugs. Endpoints differed slightly between trials, as well as the timepoints at which these were assessed, although all RCTs provided data at between 6 and 14 weeks. The judging of efficacy at an earlier time point for a drug where response to treatment may be slightly longer, or where dose adjustment is subsequently found to be required, may underestimate efficacy. One trial only reported endoscopic improvement at 52 weeks,53 and this study was therefore excluded from this analysis. Two trials used an adapted Mayo score to assess clinical response or remission.26 This removes the physician’s global assessment component of the Mayo score, which may lead to less subjectivity in judging disease activity at study entry, as well as clinical remission rates at the end of treatment, which may inflate treatment efficacy. In fact, use of an adapted Mayo score has been recommended in Food and Drug Administration guidance for industry.55 Some of the RCTs of newer drugs, including etrolizumab, tofacitinib, ozanimod, filgotinib and upadacitinib used more stringent endpoints to define clinical remission,24–26 48–51 incorporating a rectal bleeding score of zero. This may have led to an underestimation of their efficacy versus trials of infliximab, adalimumab or vedolizumab. However, some of these trials also reported remission rates according to an identical endpoint to that used in older trials and a subgroup analysis based on this definition did not alter our results (data not shown). Nevertheless, upadacitinib ranked first in many analyses, although it is important to point out that these trials are yet to be published in full and have not been subject to rigorous peer review. Unlike tofacitinib, upadacitinib is a preferential janus kinase-1 inhibitor, although given filgotinib also has increased selectivity for janus kinase-1 this cannot be the sole reason for upadacitinib’s higher ranking. Despite these limitations, the results of our study are still useful for informing treatment decisions for patients with moderate to severely active UC and can be used in future updates of evidence-based management guidelines.

An initial network meta-analyses by Singh et al demonstrated infliximab to be the most efficacious drug for patients naïve to biological therapies in terms of induction of clinical remission and endoscopic improvement, with vedolizumab ranked second.22 In patients exposed to biological therapies tofacitinib ranked first for both clinical remission and endoscopic improvement. An update to this work from 2020, including data from head-to-head trials of vedolizumab and adalimumab, as well as phase III placebo-controlled trials of ustekinumab, demonstrated again that infliximab was ranked first for induction of clinical remission and endoscopic improvement in biologic-naïve patients, with ustekinumab and tofacitinib ranked highest in patients previously exposed to biologics.23 This later network meta-analysis included 14 induction of remission trials, recruiting almost 5500 patients, although the Japanese trial of infliximab versus placebo in 208 patients reported by Kobayashi et al was not included.42 In contrast to these previous network meta-analyses, our results provide hope that some novel drugs, which are likely to come to market soon, are potentially more efficacious for moderate to severely active UC than existing licensed therapies.

Our results confirm that all available drugs, other than adalimumab 160/160 mg, adalimumab 80/40 mg, and filgotinib 100 mg once daily were more efficacious than placebo for the treatment of moderate to severe UC, across all endpoints studied at 6–14 weeks. All drugs were safe and well tolerated, with no significant increase in serious adverse events or adverse events leading to withdrawal over the rates seen in the placebo arms, although the RR of infection was significantly higher with tofacitinib 10 mg two times per day than with placebo. However, their longer-term comparative efficacy, in terms of maintenance of remission and achievement of long-term corticosteroid-free remission cannot be judged from the RCTs included in this meta-analysis, because most trials did not perform rerandomisation of participants to active drug or placebo after induction of remission. Selection of individual drug therapy should be guided by patient choice, which may be influenced by route of administration and tolerability, as well as costs in some healthcare systems. Although the advent of biosimilars has reduced the costs associated with biological therapies substantially, use of newer small molecules is likely to have greater financial implications. Whether an inferior, but cheaper, drug should be used to treat moderate to severe UC is not the subject of the current study. The results of this network meta-analysis could, however, be used to inform a cost-effectiveness analysis to help guide future treatment selection.

In summary, this systematic review and network meta-analysis has demonstrated that all biological therapies and small molecules, other than adalimumab 160/160 mg, adalimumab 80/40 mg and filgotinib 100 mg once daily were superior to placebo for induction of remission of moderate to severe UC, and all drugs, other than adalimumab 80/40 mg, were superior to placebo in terms of endoscopic improvement. Among biological therapies, infliximab ranked highest for all endpoints in all patients, and in patients naïve to anti-TNF-α drugs, with ustekinumab ranked highest for all endpoints in patients exposed to anti-TNF-α therapy. In terms of small molecules, upadacitinib ranked highest across all endpoints, irrespective of whether patients had or had not been exposed to anti-TNF-α drugs and was ranked above infliximab in most of our analyses. Although there are more trials of infliximab published, the number of patients in the RCTs of upadacitinib is comparable. Adverse event reporting was complete in most trials, and all drugs were safe and well-tolerated. The only safety signal was a higher risk of infection with tofacitinib, compared with both placebo and vedolizumab. Future trials should better elucidate the impact of these drugs on long-term and corticosteroid-free clinical remission in patients with moderate to severe UC.

Data availability statement

No data are available.

Ethics statements

Patient consent for publication

Acknowledgments

We are grateful to Peter DR Higgins for providing extra information about his study.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • CJB and ACF are joint last authors.

  • Twitter @Nick_Burr1, @DrCJBlack

  • Contributors Guarantor: ACF is guarantor. He accepts full responsibility for the work and the conduct of the study, had access to the data and controlled the decision to publish. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. Specific author contributions: Study concept and design: NEB, CB, DJG and ACF conceived and drafted the study. NEB, CB and ACF analysed and interpreted the data. ACF drafted the manuscript. All authors have approved the final draft of the manuscript. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.