Article Text


Proposal of a new prognostic model for hepatocellular carcinoma: an analysis of 403 patients
  1. R Tateishi1,
  2. H Yoshida1,
  3. S Shiina1,
  4. H Imamura2,
  5. K Hasegawa2,
  6. T Teratani1,
  7. S Obi1,
  8. S Sato1,
  9. Y Koike1,
  10. T Fujishima1,
  11. M Makuuchi2,
  12. M Omata1
  1. 1Department of Gastroenterology, University of Tokyo, Tokyo, Japan
  2. 2Department of Hepato-Biliary-Pancreatic Surgery, University of Tokyo, Tokyo, Japan
  1. Correspondence to:
    Dr H Yoshida
    Department of Gastroenterology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8655, Japan;


Background: The prognosis of hepatocellular carcinoma (HCC) is highly dependent on tumour extension and liver function. Recently, two new prognostic scoring systems—the CLIP score, developed by Italian investigators and the BCLC score, developed in Barcelona—have been widely used to assess prognosis in patients presenting with hepatocellular carcinoma. Each system has its own relative limitations.

Aims: To create a new prognostic scoring system which is simple, easy to calculate, and suitable for estimating prognosis during radical treatment of early HCC.

Methods: A total of 403 consecutive patients with HCC treated by percutaneous ablation at the Department of Gastroenterology, University of Tokyo Hospital, between 1990 and 1997 were used as the training sample to identify prognostic factors for our patients and used to develop the Tokyo score. As a testing sample, 203 independent patients who underwent hepatectomy at the Department of Hepato-Biliary-Pancreatic Surgery were studied. Prognostic factors were analysed by univariate and multivariate Cox proportional hazard regression.

Results: The Tokyo score consists of four factors: serum albumin, bilirubin, and size and number of tumours. Five year survival was 78.7%, 62.1%, 40.0%, 27.7%, and 14.3% for Tokyo scores 0, 1, 2, 3, and 4–6, respectively. The discriminatory ability of the Tokyo score was internally validated by bootstrap methods. The Tokyo score, CLIP score, and BCLC staging were compared by Akaike information criterion and Harrell’s c index among training and testing samples. In the testing sample, the predictive ability of the Tokyo score was equal to CLIP and better than BCLC staging.

Conclusions: The Tokyo score is a simple system which provides good prediction of prognosis for Japanese patients with HCC requiring radical therapy.

  • HCC, hepatocellular carcinoma
  • TNM, tumour node metastasis
  • CLIP, Cancer of the Liver Italian Program
  • BCLC, Barcelona Clinic Liver Cancer
  • AIC, Akaike information criterion
  • PEIT, percutaneous ethanol injection therapy
  • PMCT, percutaneous microwave coagulation therapy
  • TAE, transcatheter arterial embolisation
  • AST, aspartate aminotransferase
  • AFP, α fetoprotein
  • HCV, hepatitis C virus
  • HVPG, hepatic venous pressure gradient
  • hepatocellular carcinoma
  • prognostic score
  • CLIP score
  • BCLC staging system
  • percutaneous ethanol injection therapy
  • hepatic resection

Statistics from

The prognosis of hepatocellular carcinoma (HCC) depends on tumour extension as well as liver function. Worldwide, most patients with hepatocellular carcinoma have cirrhosis caused by chronic viral hepatitis (hepatitis C (HCV), hepatitis B virus).1 Assessment of tumour related factors in isolation, such as the tumour node metastasis (TNM) staging,2 does not accurately predict the prognosis of patients who have HCC and cirrhosis.3 The Child-Pugh classification has been widely used to evaluate liver function in cirrhotic patients, and has a relatively good correlation with prognosis,4 but cannot be used to predict survival in patients with HCC.

Okuda staging of HCC, established in 1985, is based on tumour size and liver function, as assessed by three of the four factors used in the Child Pugh score—namely, serum albumin, bilirubin, and the presence of ascites—and for some time has been used as the gold standard for prognostic assessment of HCC patients.5 However, this prognostic system was established by analysing patients mostly at an advanced stage of HCC, with a median survival of 4.1 months. With current advances in clinical practice, survival of HCC patients is now much longer, and the Okuda staging is unable to accurately predict prognosis in these patients.

Recently, several groups from Italy,6 Spain,7 France,8 Austria,9 and China10 proposed new prognostic systems for HCC. While the latter three were developed using a sample of patients with advanced disease (median survival 4–8 months), the CLIP and BCLC systems developed scores based on patients with early disease. The CLIP (Cancer of the Liver Italian Program) score showed a good correlation with the prognosis of HCC patients receiving various treatments, including surgery, percutaneous ablation, transarterial chemoembolisation, and liver transplantation.6,11,12 Along with liver function, as assessed by Child-Pugh stage, three tumour factors were included. Tumour morphology was divided into three categories: a single tumour ⩽50% of the size of the liver (score 0); multinodular HCC ⩽50% of the size of the liver (score 1); and massive HCC or tumour >50% of the size of the liver (score 2). As in Okuda staging, tumour size was broadly divided at 50% of liver volume. Seropositivity for α fetoprotein (AFP) and the presence of portal vein thrombosis were also included. However, due to advances in liver imaging techniques, especially ultrasound and computed tomography, HCC can now be detected at a much smaller size, usually smaller than 5 cm in diameter, and tumours smaller than 2 cm are frequently diagnosed. Tumour size is associated with the pathological grade of HCC, the probability of vascular invasion, and also with the prognosis of HCC patients after potentially curative treatments such as surgical resection and medical ablation.13,14 However, it is not known whether a HCC of 2 cm is a determinant of prognosis as previous models have not discriminated between large and small tumours.

More recently, another staging system, the BCLC (Barcelona Clinic Liver Cancer), was developed and based on both advanced and early HCC, dividing HCC into four early stages (A1–A4) and three more advanced ones (B–D). It contains elements of both the Okuda and Child-Pugh classifications. Subclassification of early stages requires formal measurement of hepatic venous pressure gradient (HVPG), which is not applicable in all patients, although clinical parameters (splenomegaly, etc) are now frequently applied.7 In the present study, we sought to establish a new scoring system that provides a more precise prediction of prognosis in patients with early stage HCC.


Training sample

Between January 1990 and December 1997, 403 patients with naïve HCC received medical ablation, either percutaneous ethanol injection therapy (PEIT) or percutaneous microwave coagulation therapy (PMCT), at the Department of Gastroenterology, University of Tokyo Hospital. Their prognosis was followed up until August 2001, and survival data were used as the training samples in this study. HCC was detected with ultrasound and/or computed tomography, and confirmed histopathologically by percutaneous tumour biopsy. Inclusion criteria for ablation were as follows: total bilirubin <3 mg/dl; platelet count >4×105/mm3; prothrombin activity >35%; and no intractable ascites. Although most investigators performed PEIT in early stage HCC, such as for a single nodule of ⩽5 cm in diameter or less than three nodules ⩽3 cm in diameter,15,16 we did not limit the indication for ablation to tumour size alone. Patients received ablation therapy because surgery was not an option in terms of impairment in liver function, or they voluntarily chose ablation after informed consent although surgery was also possible. The ablation procedures have been described previously.17,18

The following variables obtained at initial ablation therapy were used: age; sex; treatment modality (PEIT, PMCT, with or without transcatheter arterial embolisation (TAE)); tumour factors, including size, number of nodules, lobar distribution, and presence of extrahepatic metastasis; clinical manifestations, including ascites and hepatic encephalopathy; laboratory data, including albumin, bilirubin, prothrombin activity, aspartate aminotransferase (AST), alanine aminotransferase, platelet count, and AFP; positivity for viral markers (hepatitis B surface antigen and anti-hepatitis C antibody); and alcohol consumption. Okuda stage, CLIP score, and BCLC staging were also calculated using these variables (tables 1–3). We substituted the presence of either oesophageal varices or splenomegaly with platelet count less than 100 000/mm3 for HVPG ⩾10 mm Hg, as described by Llovet and colleagues.7

Table 1

 Definition of the Okuda staging system for hepatocellular carcinoma

Table 2

 Definition of the Cancer of the Liver Italian Program (CLIP) scoring system for hepatocellular carcinoma

Table 3

 Definition of the Barcelona Clinic Liver Cancer (BCLC) staging for hepatocellular carcinoma

All patients were placed under strict observation for recurrence of HCC, with regularly repeated ultrasound, computed tomography, and determination of serum tumour markers. If recurrence of HCC was detected, patients received additional treatments whenever possible—tumour ablation, TAE, or systemic chemotherapy. For survival analysis, the end point was death, and survival was censored on 31 August 2001.

Testing sample

Between October 1994 and December 1999, 203 patients with naïve HCC underwent hepatectomy at the Department of Hepato-Biliary-Pancreatic Surgery, University of Tokyo Hospital, and their survival data served as the testing sample. Indications for hepatectomy and selection of the area for resection were previously published.19 Briefly, the surgical procedure was determined according to residual liver function, as determined by the severity of ascites, serum level of bilirubin, and indocyanine green retention rate at 15 minutes. Patients with a bilirubin level >2 mg/dl or with intractable ascites were contraindicated for hepatectomy. Patients were followed up as described above, and survival was censored on 31 August 2001.

Establishing a new prognostic score

We sought to construct a new prognostic model based on the following principles.

  1. It is preferable to have two break points for continuous variables such as tumour size or serum albumin concentration because their distribution is wide and a single break point may not be optimal.

  2. Variables must be those commonly assessed in clinical practice to enable comparison between different institutions.

  3. The model should not include established classifications because they may be modified in the future, as was the case for TNM staging, and different versions may be confused.

We used survival time as the only end point in this analysis. Firstly, we performed univariate Cox proportional hazard regression to assess the statistical significance of each candidate potential factor, and the factor was retained if a significance level of p<0.05 was attained. Polychotomous categorical data were represented by corresponding binary dummy variables. Continuous variables, such as serum concentration of albumin and size (diameter) of the tumour, were transformed into categorical variables. We divided each of these continuous variables into two or three levelled categorical data by setting one or two break point(s), respectively, which were then represented by one or two binary variable(s); p values were calculated for each set of break points with univariate or multivariate Cox proportional hazard regression, and the set of break points showing the lowest p value was retained if the value reached significance.

Factors showing statistical significance as a predictor were further analysed using a multivariate Cox proportional hazard regression model with stepwise selection of variables based on the Akaike information criterion (AIC). AIC is a measure of the goodness of fit (log likelihood) with a “penalty score” for the complexity of the model (number of variables included), defined as

AIC  =  −2 × (maximum log likelihood) + 2 × (total number of parameters),

and the optimum (that is, simplest effective) model gives the lowest AIC value.20

A new prognostic score, designated the Tokyo score, was established, assigning ordinal scores (0, 1 and 2) to each of the selected factors according to the estimated regression coefficient in the final model.

Internal validation

We used the bootstrap method for internal validation of the Tokyo score system.21 Bootstrap validation is a method of random re-sampling from a given set of samples to simulate the effect of drawing samples from the same population. A re-sampled data set of the same size as the original (training) data set was obtained by random sampling with replacement—in other words, each sample can be drawn more than once or not at all. Differences in three and five year survival rates were calculated between each pair of contiguous stages (for example, between Tokyo scores 1 and 2) using Kaplan-Meier estimation. Mean and 95% confidential interval of the difference in three and five year survival rates between the stages were determined by 2000 times itineration of such re-sampling.

External validation

We validated the Tokyo score in the testing sample as well as in the training sample with AIC and Harrell’s c index.22 Firstly, AIC was calculated in a Cox proportional regression model containing Tokyo, CLIP, and BCLC stages. Then, AIC was recalculated after removing each one of the scores, and the changes in AIC were compared. The c index is equivalent to the area under the receiver operator characteristic curve, and ranges from 0.0 to 1.0. A c index of 1 indicates perfect concordance between the two variables (that is, the order of survival time and magnitude of prognostic score in the current study) while an index of 0.5 indicates a chance association.


Data are expressed as mean (SD) unless otherwise specified. All statistical analyses were performed with S-plus 2000 (MathSoft Inc., Seattle, Washington, USA). Statistical significance was set at p<0.05.


Patient profiles in the training sample

The training sample contained data from 293 male and 110 female patients. Baseline characteristics of the patients are shown in table 4. Median age was 64 years, with 25% and 75% percentiles at 59 and 69 years, respectively. The majority (83.4%) were HCV positive. The observation period was 3.9 (2.1) years, during which 250 patients died. Estimated 50% survival time was 4.75 years. Only eight patients (2%) were lost to follow up.

Table 4

 Baseline characteristics of the training sample (n = 403)

Selection of predictive factors

Univariate Cox proportional hazard analysis of the training data set revealed that 15 factors were significantly associated with prognosis of HCC patients (table 5). We included AST, which showed marginal significance (p = 0.079), and performed multivariate analysis on a total of 16 factors with stepwise selection of variables using the AIC. There were four variables which retained significance as independent predictors—namely, serum concentration of albumin, as ranked by 3.5 g/dl and 2.8 g/dl, bilirubin concentration (1 mg/dl and 2 mg/dl), size of the tumour (diameters of 2 cm and 5 cm), and number of tumour nodules (1–3 v >3) (table 6). Scores were assigned to each of the four factors according to the estimated regression coefficient in the final model (table 7) and the Tokyo score was defined as the sum of each score.

Table 5

 Univariate analysis

Table 6

 Multivariate analysis

Table 7

 Tokyo score

Internal validation

Among the training sample, 55, 126, 104, 78, 30, 9, and 3 patients were classified as Tokyo scores 0, 1, 2, 3, 4, 5, and 6, respectively. Observed cumulative survival of patients grouped by Tokyo score was calculated using the Kaplan-Meier method (fig 1). Prognosis was well distributed among the groups based on the Tokyo score. Five year survival rates for Tokyo scores 0, 1, 2, 3, and 4–6 were 78.7%, 62.1%, 40.0%, 27.7%, and 14.3%, respectively. This was confirmed by internal validation where differences in three and five year survival rates were calculated, along with 95% confidence interval, between each pair of two contiguous stages using the bootstrap method. The lower confidence limit of difference between each pair of two contiguous stages was greater than zero, indicating that all differences were statistically significant (table 8). The Tokyo score was therefore shown to be highly robust in estimating prognosis in distinct groups.

Table 8

 Pairwise comparisons of three and five year survival rates (with 95% confidence interval (CI)) between each stage of the Tokyo score

Figure 1

 Kaplan-Meier estimated survival curves by Tokyo score (TKY).

External validation

Baseline characteristics of the patients in the testing sample, who underwent surgical resection, are shown in table 9. Median age and sex proportions were similar to those in the training sample while these patients had better liver function reserve and the average tumour size was larger. Sixty five patients died during the observation period and median survival time was 5.7 years. Tokyo, CLIP, and BCLC stages were calculated according to the variables obtained from each patient.

Table 9

 Baseline characteristics of the testing sample (n = 203)

The Tokyo Score, CLIP score, and BCLC staging were compared in both the training and testing samples by evaluating the AIC on Cox proportional hazard regression models. Goodness of fit of the model estimated by AIC was improved by removing BCLC from the model containing Tokyo, CLIP, and BCLC. AIC was greater in the model with either the Tokyo score or CLIP score alone than in the model containing both (table 10). These results indicate that the Tokyo and CLIP scores complement each other whereas addition of BCLC resulted in no improvement to the model. However, the increment was smaller when the CLIP score was removed, indicating that the model with the Tokyo score was more informative than that with the CLIP score. The c indices for the Tokyo score, CLIP score, and BCLC staging were 0.733, 0.707, and 0.657 in the testing sample and 0.737, 0.758, and 0.710 in the training sample, indicating that the Tokyo score was steadily effective in patients from different backgrounds.

Table 10

 Comparison of the Tokyo score with the Cancer of the Liver Italian Program (CLIP) score and Barcelona Clinic Liver Cancer (BCLC) staging in the training and testing samples


The aim of this study was to create a novel, simple, prognostic scoring system that would provide a precise prediction of prognosis for patients who were candidates for radical therapy, such as percutaneous ablation or surgical resection. In addition, this scoring system would be used to stratify patients to enable comparison of the efficacy of distinct treatments or among different institutions. Hence the ideal staging system should provide maximal discrimination of outcomes between different stages of disease while keeping the variability of outcomes within each stage to a minimum. The Okuda staging system is not applicable to HCC patients at an early stage of disease as these patients are now eligible for potentially curative treatments, such as medical ablation or surgical resection. As such, while it was useful when first devised, the Okuda scoring or staging system is now generally considered obsolete. Thus 75% of the patients in the training sample and 90% in the testing sample were classified as Okuda stage I. Further stratification of the group is clearly needed. Recently proposed systems from France,8 Austria,9 and China10 were developed in patients with advanced disease with a median survival of only 4–8 months. We calculated the scores using the French, Austrian, and Chinese criteria in the training sample but all of these systems classified patients into only two stages, and most patients belonged to stage I (data not shown).

One of the outstanding merits of Okuda staging is the fact that it consists of four simple parameters—namely, tumour size, ascites, serum albumin, and bilirubin. In constructing the Tokyo score, we followed this simplistic approach. We examined only those parameters that are easily obtainable and avoided criteria that are not generally available. Moreover, we used the AIC in the selection of parameters to obtain a simple model without too many independent parameters or complicated computation. We believe this quality to be very important, especially in projecting retrospective analysis without omitting any patient because of missing values.

We adopted two tumour factors, size and number, in the final model. Tumour size was divided into three groups with break points of 2 cm and 5 cm in diameter. Several studies have identified tumour size in this range as significant,7,13,14,23,24 and a strong correlation with microvascular invasion and pathological grade of malignancy has been demonstrated.23,25 However, previously proposed scoring or staging systems, except for BCLC staging, did not include tumour size,8 or else divided the tumour broadly into massive and other, usually using the break point of half the volume of the liver.5,6,9 Thus differences in the present and previous scoring systems are primarily seen in the characteristics of the target populations, as reflected by our objective of defining a prognostic scoring system which would discriminate between the relatively early stages of HCC. By the same token, we did not adopt tumour related symptoms, which were included in the BCLC and French systems,7,8 as almost all of our patients were asymptomatic.

The number of tumour nodules is also known to be associated with intrahepatic spread of malignant cells. Some authors reported a major difference in prognosis between solitary and multinodular HCC after surgical resection.13,26,27 However, we found that the number of nodules was best divided dichotomously between 1–3 and ⩾4. The suggestion is that the presence of two or three nodules might often be the result of simultaneous independent carcinogenesis rather than intrahepatic metastasis in patients with advanced cirrhosis.

Evident vascular invasion such as portal vein tumour thrombus is an absolute predictor of ominous prognosis.3,13,26 Overt metastases to extrahepatic organs or lymph nodes are also associated with poor prognosis. As the testing sample contained few patients with these two manifestations, they were not selected after stepwise variable selection. It should be noted that the Tokyo score may not be predictive for advanced disease.

We selected two factors, albumin and bilirubin, as indicators of liver function. Both are included in the Child-Pugh classification, together with prothrombin time, ascites, and encephalopathy, and thus were also included in the CLIP score as a factor in the Child-Pugh classification. However, the latter three were not selected after stepwise variable selection because they were strongly correlated with the former two. Thus liver function is represented by two parameters, which we believe is preferable for the sake of model simplicity.

Portal hypertension is accepted as a strong predictor of poor prognosis. Among our candidate factors, ascites, encephalopathy, and platelet count were considered as related to portal hypertension and all were significant in the univariate analysis. Bruix et al reported that HVPG was a significant predictor of decompensation after hepatic resection28 and it is included in the BCLC staging system. However, HVPG is a special examination which is not routinely carried out in daily practice. We substituted the presence of oesophageal varices and platelet count less than 100 000/mm3 for HVPG ⩾10 mm Hg, as described by the author, but the substitution may have impaired the prognostic power of the BCLC staging in the samples.

Prognostic scores can be divided into two groups: those based on expert opinion, such as TNM staging, and those developed through regression analysis of actual data, such as the CLIP and Tokyo scores. We applied bootstrap methods to avoid possible overfitting bias that often accompanies regression analysis. Nevertheless, the fact that the Tokyo score fitted better in the training sample than in the testing sample may indicate that there remained some overfitting bias. Another possible reason why the Tokyo score did not surpass the CLIP score in the testing sample was the presence of AFP in the latter but not in the former. Over 20% of patients in the testing sample had AFP levels >400 ng/ml compared with 10% in the training sample. It is reasonable to assume that AFP plays a more important role in advanced disease.

BCLC staging, developed from several independent studies on both early and advanced patients, includes treatment strategy, indicating that a single HCC without portal hypertension should be resected and that patients with no more than three nodules not exceeding 3 cm in diameter have indications for ablation therapy. Recently, Cillo et al found that BCLC was the best among staging systems, including CLIP, in patients treated with radical therapies.29 One possible reason why BCLC staging did not show greater ability in the testing and training samples may be that our patients were not always treated according to the strategy.

The relative prognostic importance of each factor depends on the features of the patients in the training sample, as do the independent variables remaining after stepwise selection. Child-Pugh classification and the model for end stage liver disease (MELD) are suitable for assessing the prognosis of patients with severely impaired liver function30 while only tumour related factors are relevant in assessing outcome after liver transplantation for HCC patients.31,32 Similarly, the applicability of the Tokyo score is limited by the fact that it was established and validated on the basis of HCC patients treated by medical ablation or surgical resection. However, with the growing realisation of high risk groups for HCC and rapid advances in imaging techniques, an increasing number of patients are being diagnosed at an earlier stage, and qualify for potentially curative treatments, such as medical ablation and surgical resection.

In conclusion, we established the Tokyo score by analysing survival time among HCC patients treated with medical ablation, and validated it in patients who underwent surgical resection. The Tokyo score may be useful in predicting the prognosis of HCC patients who are candidates for these curative treatments.


View Abstract


  • Conflict of interest: None declared.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles