Computational modellingELDA: Extreme limiting dilution analysis for comparing depleted and enriched populations in stem cell and other assays
Introduction
A limiting dilution assay is an experimental technique for quantifying the proportion of biologically active particles in a larger population (Finney, 1952, Fazekas de St. Groth, 1982, Taswell, 1987). It is a type of dose–response experiment in which each individual culture allows a negative or positive response. Replicates are conducted which vary in the number of active particles tested. The process of dilution of the dose is typically continued to extinction of the response, or close to it. The rate of positive and negative responses at each dose allows the frequency of biologically active particles to be inferred.
Limiting dilution assays have been actively used in a wide variety of biological and scientific contexts for more than a century, most notably for quantifying bacteria (Phelps, 1908), immunocompetent cells (Makinodan and Albright, 1962) or stem cells (Breivik, 1971). In immunology, limiting dilution assays were popularized by the work of Lefkovits and Waldmann (1979) as a systematic technique for the study of B-cells and T-cells and their interactions. In different application areas an individual assay can take on different forms. In stem cell or cancer research, an assay might actually consist of, for example, an in vivo transplantation or injection. In this article, we will use the term “culture” to refer to an individual assay, regardless of the application area.
We use the term limiting dilution analysis (LDA) to refer to the statistical analysis of data from limiting dilution assays. LDA typically assumes the Poisson single-hit model, which assumes that the number of biological active particles in each culture varies according to a Poisson distribution, and a single biologically active cell is sufficient for a positive response from a culture (Greenwood and Yule, 1917, Taswell, 1981). As a statistical technique, LDA applies equally to a range of experimental scenarios which produce dose–response data whether or not these are limiting dilution assays in the strict sense. From this wider point of view, the main requirements are that the cultures are independent and that the frequency of biologically active particles is constant.
The classical aim of LDA is to estimate the active cell frequency (Finney, 1952, Lefkovits and Waldmann, 1979, Taswell, 1981). Fisher (1922) showed that the estimator with the best possible precision can be derived by the statistical method of maximum likelihood estimation (MLE). The same estimation strategy was outlined even earlier by McCrady (1915). An efficient computational algorithm for MLE was worked out by Mather (1949) and Finney (1951). The MLE computations for LDA became available in general purpose statistical software after they were shown to fall within the framework of generalized linear models (GLM) by Nelder and Wedderburn (1972) and McCullagh and Nelder (1989). Free open-source GLM software has been available through the R project (www.r-project.org) since the late 1990's, although this software is designed for mathematicians and statisticians rather than biologists or immunologists. The GLM approach to LDA frequency estimation is also implemented in the Microsoft Windows software application L-Calc (Stem Cell Technologies, www.stemcell.com), and this version has proved highly popular (Omobolaji et al., 2008, Bowie et al., 2007, Chen et al., 2008, Eirew et al., 2008, Huynh et al., 2008, Janzen et al., 2006, Kent et al., 2008, Liang et al., 2007, Maillard et al., 2008, Oostendorp et al., 2008, Sambandam et al., 2005, Schatton et al., 2008, Walkley et al., 2007).
MLE is not the only efficient estimation strategy for LDA. Taswell (1981) showed that minimum chisquare (MC) estimation has equal or even better accuracy MLE in certain situations, when the number of distinct doses is small but the number of replicates is large. Strijbosch et al. (1987) argued that MLE could be further improved by incorporating a jackknife correction for bias. However the difference in performance between these methods is small. MLE remains our method of choice because it provides the most flexible and powerful framework for confidence intervals and hypothesis testing as well as estimation. Unfortunately, Lefkovits and Waldmann (1979) recommended a more statistically naïve method for LDA based on least square regression (LS). Taswell (1981) showed LS to be an order-of-magnitude less accurate than either MLE or MC. While LS gives acceptable results when the number of replicate cultures is very large (Lefkovitz and Waldman recommend a minimum of 60 replicate cultures per dose), it proves dangerously unreliable in the common situation that the data is less plentiful (Taswell, 1981).
There are at least two other distinct scientific aims which LDA might have, apart from the classical aim of estimating the active cell frequency. A second common aim is to check the validity of the single-hit hypothesis. A third possible aim, which has so far received little attention, is to compare the active cell frequency between different cell populations. Understanding these aims has a profound influence on the experimental design.
In stem cell research, a very common aim, perhaps the key aim, is to isolate as pure a population of stem cells as possible. In pursuit of this aim, it is common to sort cells according to different markers, and test for stem cell enrichment in the sorted subpopulations. In this process, a precise estimate of stem cell frequency may not be required in populations which are clearly depleted for these cells. Indeed, when an effective stem cell marker is discovered, the sorting process leads naturally to subpopulations which contain no stem cells, and hence give no positive cultures at any dose in an dilution assay (Vaillant et al., 2008). In this situation, it is of interest to establish that the subpopulation is significantly depleted relative to the enriched population or, even better, to place an upper bound on the stem cell frequency which could reasonably be in the depleted population. Pursuing a precise estimate of the active cell frequency would be meaningless. The converse situation also arises. Quintana et al. (2008) show that cancer stem cells are more comment than previously appreciated, and present many assays with 100% positive results. In this situation it is of interest to place a lower bound on the stem cell frequency. LDA methods have not so far covered these situations.
Having good statistical power to check the single-hit model requires that wide range of different dilutions are used, with a moderate to large number of replicate cultures and with a worthwhile number of both positive and negative results. Many lack of fit tests have been proposed (Stein, 1922, Moran, 1954a, Moran, 1954b, Armitage, 1959, Cox, 1962, Shortley and Wilkins, 1965, Gart and Weiss, 1967, Thomas, 1972, Lefkovits and Waldmann, 1979, Taswell, 1984, Bonnefoix and Sotto, 1994, Bonnefoix et al., 1996, Bonnefoix et al., 2001). Some of the tests are graphically motivated (Shortley and Wilkins, 1965, Gart and Weiss, 1967, Bonnefoix et al., 2001). Lefkovits and Waldmann (1979) also emphasize the need to plot the data to check the assumptions. Two major types of deviation from the model can detected. Firstly, there is the multi-hit possibility, whereby the single-hit hypothesis might be false, and some sort of mechanism involving multiple cells might in fact contribute to a positive culture response. In this case, the proportion of positive assays is likely to increase more rapidly than expected as the cell dose is increased. Secondly, the single-hit model might be correct but the assays may not be homogeneous in terms of the active cell frequency. In this case, the proportion of positive cultures is likely to increase more slowly as the dose increases than the classic model would predict, although rapid increase is also possible if the heterogeneity is correlated with dose. These two possibilities correspond respectively to curves bending down and curves bending up in the plots of Lefkovits and Waldmann (1979). However these two possibilities have not always been clearly been distinguished in the literature. Cox (1962) and Thomas (1972) test a particular multi-hit model, although this test is relatively difficult to implement and interpret. Shortley and Wilkins (1965) and Gart and Weiss (1967) concentrate on heterogeneity whereas Bonnefoix et al. (1996) concentrate on the single-hit hypothesis. However these are all regression based tests which are straightforward to implement and interpret, and have good properties in small samples. The Pearson goodness of fit tests proposed by Lefkovits and Waldmann (1979) and Taswell (1981) do not distinguish the two types of deviation. Pearson tests also have poor power (Bonnefoix and Sotto, 1994), and are unreliable when the number of replicate cultures is small (McCullagh, 1985).
In many immunological contexts, the only practical way to assess the single-hit hypothesis is by way of the statistical tests described above. However there are experimental situations for which it is worthwhile and practical to validate the assumption experimentally. Shackleton et al. (2006), Quintana et al. (2008), Leong et al. (2008) and Vermeulen et al. (2008) validate the single-hit hypothesis experimentally, by confirming a single input cell in each culture by microscope visualization, before the assay is conducted. The fact that any of the single-cell assays lead to a positive response is then proof that a single cell is sufficient. Where the single-hit hypothesis can be confirmed experimentally, as in these cases, the need to validate the hypothesis statistically in each and every assay is no longer compelling, although the need to check heterogeneity remains. If the single-hit model can be assumed, then the active cell frequency may be accurately estimated from a limited number of distinct dilutions, provided that a worthwhile number of positive and negative cultures are available from at least one dilution.
Counting the number of cells also has the consequence that the number of cells no longer follows a Poisson distribution, but rather is a fixed quantity. This means that the classical Poisson model of LDA does not apply.
This article describes a coherent approach to LDA which includes extreme data situations, multiple populations and non-Poisson situations. The approach is implemented in the ELDA (Extreme LDA) webtool for LDA. ELDA provides a convenient interface for users without any need to download or install software. ELDA implements the GLM approach to LDA, with a number of extensions to cover situations commonly seen in current stem cell and other medical research, but not covered by classical analysis. Hypothesis tests are provided, using standard GLM theory, to compare active cell frequencies between two or more cell populations. Although these tests use standard GLM theory, they have not been fully available previously in specialist LDA software. In a novel extension, one-sided confidence intervals provided for the active cell frequency when 0% or 100% positive responses are observed at all doses. The tradition assumption that the number of cells follows a Poisson distribution is also varied to allow for the possibility that the number of cells in the culture is observed exactly. We show that the GLM framework still applies, with a minor modification, even when the total number of cells is not Poisson but is fixed. The graphical displays recommended by Lefkovits and Waldmann (1979) are included but with efficient estimation of the active cell frequency.
We give tests of heterogeneity and the single-hit hypothesis which are adapted from Gart and Weiss (1967) and Bonnefoix et al. (1996) and which take advantage of the GLM framework. The GLM test has the best performance of the goodness of fit tests in small samples, and it also has to ability to distinguish heterogeneity of samples from multi-hit alternatives.
ELDA has already proved valuable for LDA in a wide variety of high-profile research areas (Diaz-Guerra et al., 2007, Hosen et al., 2007, Leong et al., 2008, Quintana et al., 2008, Shackleton et al., 2006, Siwko et al., 2008, Vaillant et al., 2008, Vermeulen et al., 2008).
The ELDA webtool is described in Section 2, and Section 3 gives examples of usage. These two sections are written for readers wishing to use the webtool. Section 4 gives details of the statistical methodology for readers wanting the mathematical background. Section 5 finishes with discussion and conclusions.
Section snippets
The ELDA webtool
ELDA is an online tool for limiting dilution analysis. Users simply cut and paste a table of data into the web page. There is no need to download software or to undertake any programming.
ELDA accepts an input data table of three or four columns, separated by any combination of commas, spaces or tabs (Table 1). Users can type the data directly into the webpage text field, or can simply cut and paste the whole table from any spreadsheet application. Each row of data gives results for a particular
Confidence intervals and tests
A key facility of ELDA is the ability to handle extreme data situations. Table 1 shows a small data example which illustrates some of the capabilities of the software. This gives data on the frequency of repopulating mammary cells from a tumorigenic mouse model (Vaillant et al., 2008). Here a positive assay is one which results in a visible mammary epithelial outgrowth. In this experiment, the wild-type cells did not produce any outgrowths, although this might be due to insufficient cell
Generalized linear models
In this section, we outline the statistical methodology behind the ELDA software. We begin by outlining the GLM approach to LDA. Alternative introductions to GLMs can be found in Collett (1991) and Bonnefoix et al. (1996).
The fundamental property of limiting dilution assays is that each culture results in positive or negative result. Write pi for the probability of a positive result given that the expected number of cells in the culture is di. If ni independent cultures are conducted as dose di
Discussion and conclusion
Despite more than a century of methodological development for LDA, the best methods have not generally been available to immunologists because of lack of easily accessible software.
The ELDA webtool gives researchers access to optimal LDA statistical techniques without the need to install software or to undertake any programming. The aims are (i) to give confidence intervals for the active cell frequency, (ii) to compare the active cell frequency across multiple cell subpopulations and (iii) to
Acknowledgments
Thanks to Mark Shackleton, Francois Vaillant, Jane Visvader and Geoff Lindeman for valuable discussions and feedback and for the use of unpublished data. Keith Satterley created the original web interface for ELDA.
References (56)
- et al.
The standard chi2 test used in limiting dilution assays is insufficient for estimating the goodness-of-fit to the single-hit Poisson model
J. Immunol. Methods
(1994) - et al.
Fitting limiting dilution experiments with generalized linear models results in a test of the single-hit Poisson assumption
J. Immunol. Methods
(1996) - et al.
Malignant transformation initiated by Mll-AF9: gene dosage and critical target cells
Cancer Cells
(2008) - et al.
Steel factor coordinately regulates the molecular signature and biologic function of hematopoietic stem cells
Blood
(2008) - et al.
Canonical notch signaling is dispensable for the maintenance of adult hematopoietic stem cells
Cell Stem Cell
(2008) - et al.
Limiting dilution assays. Experimental design and statistical analysis
J. Immunol. Methods
(1987) Limiting dilution assays for the determination of immunocompetent cell frequencies. III. Validity tests for the single-hit Poisson model
J. Immunol. Methods
(1984)- et al.
Rb regulates interactions between hematopoietic stem cells and their bone marrow microenvironment
Cell
(2007) An examination of some experimental cancer data in light of the one-hit theory of infectivity titrations
J. Natl. Cancer Inst.
(1959)