Statistics from Altmetric.com
- Autoimmune pancreatitis
- lymphoplasmacytic sclerosing pancreatitis
- idiopathic duct centric pancreatitis
- immunoglobulin G4
- endoscopic retrograde pancreatogram
- autoimmune disease
- endoscopic retrograde pancreatography
- pancreatic cancer
- pancreatic disease
Significance of this study
What is already known about this subject?
Autoimmune pancreatitis (AIP) is a distinct form of chronic pancreatitis.
AIP most often mimics pancreatic cancer in its presentation.
Multiple diagnostic criteria exist for AIP.
What are the new findings?
There are specific endoscopic retrograde pancreatography (ERP) features in AIP that are helpful in the diagnosis.
The ability to diagnose AIP based on ERP features alone is limited but can be improved with knowledge of some key features.
We have proposed an algorithm to diagnose AIP as well as to differentiate it from pancreatic cancer.
How might it impact on clinical practice in the foreseeable future?
This is the first multicentre, international study addressing the role of ERP in AIP. Thus this study will be one of the building blocks while developing international consensus criteria for AIP.
Autoimmune pancreatitis (AIP) is a rare but increasingly recognised form of chronic pancreatitis that predominantly affects men in their fifth and sixth decades.1 Since it often presents with obstructive jaundice and pancreatic enlargement, the chief differential diagnosis of AIP is pancreatic cancer (PaC).1–3 As AIP is a relatively uncommon disease, its diagnosis in patients with suspected PaC can be quite challenging.2 3 In addition, distinguishing AIP from usual chronic pancreatitis (CP; both clinically and radiologically) is also difficult.2
Since Yoshida et al coined the term ‘autoimmune pancreatitis’ in 1995, the diagnostic value of imaging features of AIP has been recognised.4 The characteristic pancreatic imaging features include parenchymal features seen on cross-sectional imaging and ductal features seen on endoscopic retrograde pancreatography (ERP). In fact when the Japan Pancreas Society (JPS) proposed diagnostic criteria for AIP, it considered the presence of both parenchymal and ERP features mandatory for diagnosis of AIP.5 More recently, the HISORt criteria for diagnosis of AIP were proposed by clinicians at the Mayo Clinic, Rochester, USA. While recognising the value of imaging features, the HISORt criteria do not mandate their presence for diagnosis of AIP.6 7
Investigators worldwide have come together to form an Autoimmune Pancreatitis International Cooperative Study group (APICS) to better study this rare disease. Since there is substantial disparity in the use of ERP to diagnose AIP between the Asian and US criteria, one of the first goals of the APICS was to evaluate the performance characteristics (sensitivity, specificity and interobserver agreement) of ERP for diagnosis of AIP under the auspices of an international multicentre trial. Here we report the results of the first study proposed by this group.
Materials and methods
The study was approved by the Mayo Clinic's institutional review board. The study was done in two phases. A total of 21 physicians from four centres in Asia (Japan and Korea), the UK and the USA participated in interpretation of ERP images (readers) in both phases of the study. The lead and senior authors (AS and STC) did not interpret images. A blinded randomised trial design was used for both phases (figure 1). The diagnosis of AIP was made using HISORt and Japanese criteria, depending on the centre where the patient was diagnosed (Supplementary table 4).5 7 Approximately 35% of patients did not have immunoglobulin G4 (IgG4) levels recorded. Many of these patient were diagnosed with AIP before IgG4 was found to be associated with AIP, but all these patients had the usual clinical course of AIP. The patients with PaC had a confirmed tissue diagnosis and/or the typical clinical course. The diagnosis of CP was made with a combination of the clinical presentation with pancreatitis and characteristic ERP findings (Cambridge III or IV).8 9
ERPs (n=164) from histologically and/or clinically confirmed cases of AIP, CP and PaC were solicited from three participating centres (the USA, Japan and the UK).The ERP images were screened for quality, including brightness, adequate contrast enhancement and adequacy of duct filling by an experienced endoscopist (MJL) unaware of clinical diagnoses. A final set of 48 pancreatograms (20 AIP, 10 CP, 10 PaC and 8 duplicates) were selected by the senior author (STC). We ensured that all three centres had equal representation in the final set of images selected for interpretation. To ascertain intraobserver variability, eight duplicate images (4 AIP and 2 each of PaC and CP) were included in the set of 48 pancreatograms. A random number generator was used to generate numbers for the 48 images. These images were then shuffled. The final set of randomised and shuffled images was sent to 21 physicians in the USA, the UK, Japan and South Korea on a compact disc. These readers were not privy to clinical data or the underlying diagnoses of the images sent.
Each physician viewed the final set of images on a computer. Most of them completed the interpretation in one sitting and filled out a data abstraction sheet (Supplementary table 1) for each image. This sheet enumerated ERP features of CP, AIP and PaC which have been previously described.3 10 11 The physicians were asked to provide the most probable diagnoses based on ERP findings. Up to three diagnoses could be listed by percentage confidence (Supplementary table 1). The sum of the confidences had to add up to a 100%. Thus acceptable percentage confidence combinations were 95%–5%, 75%–25%, 50%–50%, and 50%–25%–25%. To be considered a correct interpretation, a given pancreatogram had to be read with at least 75% confidence for that condition. The data abstraction sheets were then collated to compute the performance characteristics of ERP for diagnosis of AIP (table 1).
Creation of a teaching module
From the phase I data it was apparent that the overall sensitivity and specificity for the entire group (N=21) was poor, but there were four physicians (not all from Asia) whose interpretations had a high sensitivity and specificity for diagnosing AIP based solely on an ERP. These physicians were asked to identify key ERP features which aided them in correctly diagnosing AIP. This led to the identification of four key ERP features (figures 2 and 3). A Power Point teaching module was then created highlighting these observations and the overall (not individual) phase I results. The teaching module did not include any images that were used in phase I
After a 3 month washout period, we requested all readers in phase I to interpret the same 40 pancreatograms used in phase I (20 AIP, 10 CP and 10 PaC) without the eight duplicate images. A random number generator was again used to randomise the 40 images. These images were then shuffled. The final set of randomised and shuffled images was sent to 21 physicians in the USA, the UK, Japan and South Korea on a compact disc. All 21 physicians were asked to review the teaching module before reanalysing the pancreatograms. Apart from the information requested from the readers in phase I, they were specifically asked to comment on the presence or absence of the four key ERP features identified in phase I. In addition to computing the overall performance characteristics of ERP to diagnose AIP (table 2), we also calculated the performance characteristics of the four key ERP features in diagnosing AIP (table 3).
We used JMP Version 8 for computing the sensitivity, specificity and κ statistic for interobserver agreement in both phase I and phase II. A 2×2 table was constructed for each reader in both phases for this purpose. The sensitivity and specificity are with reference to the external gold standard (Asian and HISORt criteria) and the κ statistic was computed to determine the agreement between physicians from each centre. A κ statistic for intraobserver agreement was computed in phase I alone. A p value <0.05 was considered statistically significant. In addition, in phase II, the sensitivity and specificity of the four ERP features in AIP in differing combinations was computed (table 4).
The overall sensitivity, specificity and interobserver agreement of ERP to diagnose AIP were 44, 92 and 0.23, respectively, across all participating centres (table 1). There was wide variation in the sensitivity of the interpreters to diagnose AIP based on ERP features across the four participating centres. The sensitivity among Asian physicians was significantly higher than among non-Asian physicians. However, the difference in specificity across the centres was not significant. In addition to computing the interobserver variability, we also computed the intraobserver variability (from the eight duplicate films), but this was not statistically significant. The stratified results based on the specialty of the reader are shown in Supplementary table 2. Of the 21 reviewers, we identified four top performing physicians who had high sensitivities and specificities (Supplementary table 2). In the opinion of the top performing physicians the following four ERP features were most helpful to diagnose AIP (figures 2, 3 and table 3) (i) long (>1/3 the length of the pancreatic duct) stricture; (ii) lack of upstream dilatation from the stricture (<5 mm); (iii) multiple strictures; and (iv) side branches arising from a strictured segment.
The overall sensitivity, specificity and interobserver agreement of ERP to diagnose AIP were 71, 83 and 0.45%, respectively across all participating centres (table 2). We also computed the individual performance characteristics of the four ERP features identified in phase I (table 3). Of these four features the presence of a long narrow stricture or multiple strictures was the most specific (≥97%) but the least sensitive (≤38%) We then computed the performance characteristics of these four ERP features in various permutations and combinations. The top five combinations are presented in (table 4).
AIP is a relatively new and uncommon disease. Kamisawa et al first recognised it as the pancreatic manifestation of a systemic fibroinflammatory disease called IgG4-associated systemic disease.12–14 It was not until 2002 that the first established diagnostic criteria were published.15 Since then many other diagnostic criteria have been published, including the JPS 2006, Mayo Clinic HISORt criteria and the Korean criteria.7 15–17 These criteria rely on a combination of imaging of the pancreas (ERP, CT and MRI), histology of the pancreas, serology, other organ involvement (bile ducts, salivary glands, retroperitoneum) and a dramatic response to steroid treatment to diagnose AIP. The differences in the specific criteria often reflect regional variations in practice and familiarity with the various diagnostic modalities (eg, ERP vs magnetic resonance cholangiopancreatography (MRCP) in the Japanese vs Korean criteria). The role of an ERP in the diagnosis of AIP is often debated.18 The Japanese criteria mandate the use of an ERP to diagnose AIP, the Korean criteria allow use of MRCP for ductal imaging, and HISORt criteria do not mandate pancreatography (MCRP or ERP).19 To date, the performance characteristics of ERP in AIP have not been systematically studied.
In phase I of the study we found that the performance characteristics of ERP to diagnose AIP were poor, with significant differences between Asian and non-Asian physicians. While Asian physicians performed very well in terms of sensitivity and specificity, the sensitivity of most non-Asian physicians was poor. Even though endoscopic retrograde cholangiograms (ERCs) are frequently performed in the West in patients with obstructive jaundice, injection of the pancreatic duct (ERP) is carefully avoided to minimise the risk of pancreatitis. Thus, we believe the inability of non-Asian readers to identify AIP on ERP is mainly due to lack of familiarity with ERP changes in AIP. This represents an example of ‘the eye does not see what the mind does not know’.
There were four physicians (not all from Asia) who were highly adept at diagnosing AIP solely based on the ERP features. They identified four features (figures 2 and 3) on ERP which they found helpful to diagnose AIP. We asked the next logical question: Can these features help other physicians to identify AIP on an ERP? Thus a teaching module illustrating the four features was created.
In phase II of the study all participating centres and physicians showed an improvement in sensitivity with a modest, but statistically insignificant, drop in specificity. None of the four ERP features by themselves was diagnostic of AIP. The presence of all four features was highly specific (91%) but only modestly sensitive (52%) for AIP (table 4). The presence of pancreatic duct strictures (single or multiple) without upstream dilatation (<5 mm) had the highest specificity for AIP (>90%).
These data suggest that there are features on an ERP which are specific for AIP, and we believe that making readers aware of these features can augment their ability to diagnose AIP. We have observed a similar phenomenon in regards to histological features of AIP. Until the histological features of AIP were highlighted by observant pathologists, it was generally assumed that the aetiology of CP could not be ascertained by histology.2 3 10
The main strengths of our study are its design—that is, it was a randomised, blinded, multicentre study with many physicians participating. This study demonstrates that knowledge from a few experts is potentially transferable to others. The limitation of our study is that readers were not provided with any other information about the patients, including findings on the cholangiogram, but this was by choice. Supplementary tables 3 and 4 illustrate the clinical profile of the patients with AIP who participated in our study. In real life, physicians would potentially have access to such patient information, including clinical presentation and findings on CT scan and cholangiograms. However, despite the lack of such data, adept readers were able to identify AIP with a high degree of sensitivity.
Our study shows that there are specific ERP features of AIP which are potentially useful. However, since physicians in the West do not perform pancreatograms in the setting of obstructive jaundice, where does ERP fit in the strategy to distinguish AIP from pancreatic cancer? Recently two diagnostic strategies for distinguishing AIP from PaC have been published and compared.2 3 10 One relies on ERP and CT features2 and the other uses CT features but does not use ERP at all.3 We believe the ideal strategy would be to use a combination of these strategies, with the use of ERP being tailored to findings on cross-sectional imaging. If there is diffuse ‘sausage-shaped’ enlargement of the pancreas without ductal dilatation in the presence of elevated serological markers of AIP, especially IgG4, then pancreatography probably has little added benefit for the diagnosis of AIP.2 However, if findings on cross-sectional imaging are not typical, ERP may be helpful in identifying AIP in those in whom a diagnosis of PaC is suspected and the other studies are non-diagnostic. Based on these principles, we propose an algorithm incorporating ERP features to differentiate AIP from pancreatic cancer (figure 4 and Supplementary figure 1).
In summary, this multicentre study shows that the ability to diagnose AIP based on ERP features alone is limited but can be improved with the knowledge of some key radiographic features. We have identified four key ERP features and determined their performance characteristics in diagnosing AIP.
Online only appendix
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.