Introduction

Multiple sclerosis (MS) is a complex trait disorder of unknown etiology. Its phenotypic presentations represent a spectrum with characteristics of autoimmunity and neurodegeneration.1,2,3,4 The involvement of autoimmunity is supported by epidemiological, clinical and pathological studies.4,5,6 A survey of genome-wide scans reveals that 65% of the positive non-HLA linkage data map into 18 overlapping clusters of MS and other autoimmune diseases.3 The risk of this nonspecific autoimmunity is estimated to be λs=1.65,7,8 and involves a complex mechanism with chemokines playing an important role in it.9

The main function of chemokines and their receptors is to attract immune cells to the site of inflammation. In addition, these small molecules are involved in T-cell differentiation, apoptosis, cell cycle regulation, angiogenesis, metastatic processes and in the generation of soluble inflammatory products such as free radicals, nitric oxide, cytokines and matrix metalloproteases.9 In humans, approximately 50 chemokine genes have been described, which are divided into four subfamilies based on the characteristic patterns of cysteine residues close to the N-terminal end of molecules. In the CC chemokine ligand family (CCL) (also known as β-chemokines or small cytokine group A – (SCYA)), the two cysteines are adjacent to each other, while in the CXC and CX3C chemokine families the two cysteines are separated by one or three amino acids, respectively. In the XC family, only one cysteine is present.10 Chemokine receptors bind multiple chemokines. Currently, 10 CC chemokine receptors (CCRs), six CXCRs, one CX3CR1 and one XCR1 are known.10

A survey of CC chemokines and their receptors by Trebst and Ransohoff11 demonstrated that each of the CC chemokine receptors 1–5 (CCR1–CCR5) is present on infiltrating monocytes, macrophages and lymphocytes in MS lesions and interacts with multiple CCLs. Furthermore, several members of the CCL family (CCL2=MCP-1, CCL3=MIP-1α, CCL4=MIP-1β, CCL5=RANTES, CCL7=MCP-3, CCL8=MCP-2) are expressed by astrocytes, microglia and other inflammatory cells within plaques. These observations support the notion the CC chemokines play a role in the development of MS lesions.

In linkage studies, several MS susceptibility loci have been identified.8,12,13,14,15,16 A meta-analysis of three MS genome scans detected the highest score of nonparametric linkage (NPL)=2.58 at 17q11.17 While numerous candidate genes are located in this region (eg NOS2A, OMG, NF1), a 1.85 MB segment at 17q11.2–q12 encodes a cluster of evolutionarily closely related β-chemokines including CCL2, CCL7, CCL11, CCL8, CCL13, CCL1, CCL5, CCL16, CCL14, CCL23, CCL18, CCL3 and CCL4, respectively.

Recently, two quantitative trait loci (QTL), eae6 and eae7, were investigated by classic marker-specific linkage and interval mapping in a murine experimental allergic encephalomyelitis (EAE) model. Eae 6 and eae7 mapped to a region of mouse chromosome 11 syntenic to human chromosome 17q11. Eae 6 and eae7 were implicated in the severity and duration of the clinical presentation of EAE. However, eae7 also conferred susceptibility to a monophasic remitting/non-relapsing (MRNR) form of the disease.18 Sequencing of cDNA for all 11 β-chemokines and NOS2 encoded within eae7 only revealed variations in Scya1, Scya2 and Scya12 (corresponding to the human CCL1, CCL2 and CCL12 genes, respectively). A striking divergence in these Scya1, Scya2 and Scya12 variations was noted when the cDNA molecules of EAE-resistant (eg Balb/cJ and B10.S/DvTe) and EAE-susceptible strains (eg SJL/J) were compared. As multiple nonsynonymous sequence polymorphisms within Scya1 (TCA-3), Scya2 (MCP-1) and Scya12 (MCP-5) are now established candidates for eae7 (a QTL for the severity and duration, and a susceptibility locus for the MRNR subtype of EAE), we postulate that β-chemokine variants may also confer susceptibility to MS.

Patients, families and DNA specimens

DNA specimens from families with various structures were obtained from the Multiple Sclerosis DNA Bank (MSDB), University of California San Francisco, San Francisco (UCSF), CA, USA, the collection of the Canadian Multiple Sclerosis Collaborative Group (CMSCG) (Dr Ebers) and the Multiple Sclerosis Treatment and Research Center (MSTRC), Minneapolis, MN, USA (Dr Birnbaum) (Table 1). Separated data sets were received from various sources at various times and genotyped accordingly. The diagnosis of MS for patients whose DNA specimens were obtained from existing collections (MSDB, CMSCG) was made by the Poser criteria.19 Patients whose specimens were prospectively collected for this study were diagnosed using the McDonald's criteria.20 The diagnosis of primary-progressive (PP)-MS was established if sustained progression of disability was observed for at least 1 year in the absence of relapses.21 If one or more relapses occurred superimposed on a progressive course from onset, the disease was classified as progressive-relapsing (PR)-MS.

Table 1 Families studied

Genotyping

For genotyping, we contracted ACGT Inc. (Northbrook, IL, USA), a biotech company using the 5′ Nuclease (or TaqMan) assay for allelic discrimination of single nucleotide polymorphisms (SNPs) with high sensitivity and specificity on an ABI7900HT Instrument (ABI PE Inc., Foster City, CA, USA). PCR amplification of the template DNA sequence was performed with unlabeled forward and reverse primers in the presence of a specifically designed TaqMan probe having a nonfluorescent quencher plus a minor groove binder attached to the 3′ end, and a reporter fluorescent label (Fam or Vic) attached to the 5′ end. A fully hybridized probe remained bound during strand displacement, resulting in an efficient cleavage of the reporter dye by the 5′ nuclease activity of TaqGold DNA polymerase. Each reaction mixture contained two probes having a single nucleotide difference and Fam or Vic labels. The release of the reporter dyes correlated with the proportion of the matching alleles, and differed between homozygous and heterozygous states. The genotyping error rate with this method was <0.5%.

Single nucleotide polymorphisms

SNPs were identified in the NCBI database (http://www.ncbi.nlm.nih.gov/SNP). A total of 31 assays were successfully developed and validated. All SNPs were genotyped in three data sets (DS101–105, DS106–108, DS109–112). In DS113–114 and DS115, data from only 27 assays were obtained, as assays of CCL2B, CCL7N, CCL11X and CCL15M were not included. Table 2 shows the list of genes and SNPs assayed, the location of markers and intermarker distances, and the heterozygosity and the minor allele frequency of SNPs calculated from data of unaffected parents in DS101–115. Heterozygosity was calculated by using the Pedigree Statistics in MERLIN (Multipoint Engine for Rapid Likelihood Inference, http://www.sph.umich.edu/csg/abecasis/Merlin/).22

Table 2 SNPs genotyped (Contig NT010799, chromosome 17q11.2–q12)

Analysis

Files

Genotyping errors and Mendelian inconsistencies were screened by Pedcheck, and manually corrected based on scatter plots distribution and relative intensity of fluorescent signals. In case of uncertainties, genotypes were deleted. Allele and genotype frequencies were calculated for 356 unrelated parents in DS101–115. Deviation from the Hardy–Weinberg equilibrium was tested by using the χ2-statistics in MERLIN (http://www.sph.umich.edu/csg/abecasis/Merlin/).22

Pedigree disequilibrium test and TRANSMIT

Because of the various family structures, we chose to use the pedigree disequilibrium test (PDT), developed from the classical transmission disequilibrium test (TDT), to determine if a marker locus and the hypothetical disease locus are linked or are in linkage disequilibrium (LD).23,24 Under Mendelian inheritance, all alleles have a 50% chance of being transmitted to offspring by the parents. An allele may be associated with the disease risk, if it is transmitted more often than 50% of the time, indicating a transmission distortion. PDT can utilize data from trios, nuclear families and discordant sibships within extended pedigrees.

Different methods have been used to estimate the required sample size (N) in the TDT/PDT, taking into consideration varying figures of the genotypic risk ratio (λs=1.5–4), allele frequency (P=0.01–0.80) and proportion of heterozygous parents (0.025–0.500), and different modes of inheritance (multiplicative, additive, recessive or dominant).25,26,27 Based on four methods of power assessment in the TDT test of one candidate SNP, a sample size of 53–55 is required for λs=2 and a sample size of 192–195 is required for λs=1.5 to achieve 80% power, when assuming a disease-predisposing allele frequency of 0.1 and a multiplicative model.26,27 Therefore we considered that N=257 gives a reasonable sample size in a TDT- or PDT-based analysis of SNP variants of MS candidate genes with an expected range of λs=1.5–2.8

Haplotype-based analyses were conducted using the TRANSMIT version 2.5 program (http://www-gene.cimr.cam.ac.uk/clayton/software/).28 This program tests for association between genetic markers and disease by examining the transmission of markers and haplotypes from parents to affected offspring. TRANSMIT can analyze multilocus haplotypes, even if phase is unknown and parental haplotypes are missing. The tests are based on a score vector, which is averaged over all possible configurations of parental haplotypes and transmissions consistent with the observed data. Data from unaffected siblings may be used to restrict the number of possible parental genotypes to be considered. When transmission is fully observed, this test is similar to the Pearson's χ2-test. The program calculates the following χ2-statistics: (1) for each haplotype or allele, a test for excess transmission of that haplotype and (2) a global test for association (H−1 df, where H is the number of haplotypes for which transmission data are available). In brief, TRANSMIT v. 2.5 estimates the maximum log-likelihood of haplotype probabilities of paired marker alleles. The haplotype-based score test evaluates the difference in the observed and expected transmission of each possible haplotype from parents to affected children by using the χ2-statistics. We also performed a bootstrap significance test using 10 000 bootstrap samples of haplotypes.

Assessment of linkage disequilibrium by Ldmax

Ldmax in the GOLD program provides maximum likelihood estimates of pairwise disequilibrium (http://www.sph.umich.edu/csg/abecasis/GOLD/docs/stats.html) by using the expectation-maximization algorithm of Slatkin and Excoffier.29 SNP alleles were taken from 356 unrelated parents in DS101–115 to assess haplotype frequencies. For details of computing the δ2, D and D′ values, we refer to the web site. The usual contingency table of χ2-test was also generated to calculate significance from an asymptotic distribution with (r−1)(c−1) degrees of freedom, where r and c are alleles of A and B markers. The output file provides the usual χ2, P, δ2 and D′ values, where D′ ranges between 0 and 1 (greater values indicating stronger LD).

Stratification

To test if the inclusion of patients with PP-MS caused any bias in the outcome, we removed all PP-MS nuclear families and trios from the total DS101–115 data set. The majority (over 94%) of patients in DS101–104, DS106–108, DS109–112 and DS113–114 have relapsing-remitting (RR)/secondary progressive (SP)-MS. DS105 and DS115 are exclusively composed of PP-MS trios and PP-MS incomplete families. Based on natural history studies,30 patients (trios) with PR-MS are also classified here as PP-MS. We omitted seven trios with PP-MS from DS101–104, two trios with PR-MS from DS106–108 and four trios with PP- or PR-MS from DS113–114. Therefore, there were 47 PP-MS trios and incomplete families, and 217 RR/SP-MS families (trios, affected sib pair and multiplex families). PP-MS and SP-MS were both designated as chronic progressive disease in DS109–112 at the time of specimen collection and no clinical update is as yet available for phenotypic stratification. The estimated proportion of RR/SP-MS is over 90–95% in this data set.

Computation

PDT, TRANSMIT version 2.5, MERLIN and the GOLD programs were run in Unix- and Linux-based computational environments at the AMDeC Bioinformatics Core Facility operated by the Columbia Genome Center, Columbia University.

Results

Allele and genotype frequencies

Genotyping of DNA was similarly performed in DS101–105, DS106–108, DS109–112, DS113–114 and DS105. SNP allele and genotype frequencies were compared among the data sets, and also calculated for the unrelated parents in the combined DS101–115 data set. No deviations from the Hardy–Weinberg equilibrium were observed.

Markers showing allelic associations with multiple sclerosis

Analysis of data was carried out separately in each data set, and then in the combined DS101–115 set. As all data sets contain North-American families with European background and with similar allele frequency distributions, we present here the results from the combined DS101–115 families (Table 3). The analysis reveals transmission distortion for SNPs CCL2B, CCL11B, CCL5A, CCL18B, CCL18X and CCL3B (Table 3). Considering that 94% of the DS101–115 families include patients with RR/SP-MS, it is not surprising to detect similar trends of findings in the unstratified and the RR/SP-MS data sets. Although the PP-MS group is small, allelic associations are significant at CCL18B and CCL18X. However, all the significant findings are modest and disappear after using the conservative Bonferroni correction for multiple comparisons.

Table 3 Allelic associations as determined by PDT

Haplotypes associated with multiple sclerosis

Although no SNP listed in Table 3 has direct relevance to MS, they may define disease-associated haplotypes. We used the TRANSMIT program to evaluate haplotype transmissions (Table 4a and b). A total of 27 two-marker haplotypes generated in a pairwise manner from the list in Table 2 (but skipping subregions with a gap >30 kb) and five three-marker haplotypes in the subregions of two-marker haplotype associations were tested by TRANSMIT. We detected significant P-values for several two-marker haplotypes including CCL2bx, CCL11b–CCL8x, CCL8y–CCL13a, CCL13ab and CCL15om in both DS101–115 and the RR/SP-MS groups and for CCL3ba in the RR/SP-MS group. Only one three-marker haplotype, CCL15omn, showed significant distortion in the two groups. However, the number of transmissions and the variance of observed and expected transmissions (<2.5) of the CCL15om and CCL15omn haplotypes are too low to make reliable conclusion.28 As indicated in Table 4a and b, removal of the PP-MS families from DS101–115 did not markedly change the outcome of the analyses, and the PP-MS subgroup alone was too small to define reliably a pattern of phenotype-specific associations.

Table 4 Results of two- and three-marker haplotype analysis by TRANSMIT

Table 5 shows the distribution of the MS-associated haplotypes in correlation with the distribution of intermarker LD in the chr17q11.2–q12 region. Pairwise distributions of D′ values indicate that strong LD extends from CCL2B to CCL11B, then it drops significantly and there is no LD between CCL2A and the CCL13A markers. The D′ values generally correlate with the intermarker distances, and the highest D′ values (0.7–1.0) can be observed up to 20 kb. As haplotypes of CCL2bx, CCL13ab, CCL15omn and CCL3ba are defined by markers in strong LD with each other, further investigation of these haplotypes may reveal disease relevant mutations. The varying strength and discontinuous nature of LD observed at CCL11b–CCL8x and CCL8y–CCL13a suggest that more than one haplotype may carry disease relevant variants and fine mapping is necessary in this subregion.

Table 5 Distribution of intermarker D′ values

Summary of findings

These analyses suggest that variants of CCL genes are associated with MS. Four subregions of haplotypic associations are detected within the 1.85 MB CCL coding region. We note that candidate genes showing association with all MS phenotypes overlap with those detected in the RR/SP subgroup. The inclusion of PP-MS in the combined analyses did not noticeably modify the outcome, and as a subgroup alone, did not have enough power to draw conclusions regarding differential associations.

Discussion

Linkage studies have defined a number of susceptibility loci in MS. These loci, however, still encompass 2–20 cM chromosomal segments encoding large numbers of genes. Comparing allele sharing as a method of linkage and TDT as a method of association, Risch and Merikangas25 demonstrated that TDT is more powerful in detecting genes of modest effect in complex disorders. Subsequently, TDT and related methods assisted in identifying a significant number of candidate genes and alleles within regions of linkage in complex diseases.31 Based on these considerations, we genotyped and analyzed SNPs within candidate genes located on chromosome 17q11, a previously defined susceptibility locus of MS.17 Although none of the marker alleles showed direct association with MS after adjusting for multiple comparisons, disease-causing mutations are likely to be within or near haplotypes located in genes of CCL2 and CCL3 and in the subregion encompassing CCL11–CCL13 (Table 3, 4, 5). Because of the low number of transmitted haplotypes, the CCL15 gene region remains to be further investigated. The D′ values summarized in Table 5 indicate that LD extends up to 20 kb between paired markers in this region. The D′ value estimates are compatible with those obtained by simulation and experimental assessment of haplotype blocks in several chromosomal regions of Caucasians.32 The LD map (Table 5) facilitates the design of a high-density SNP mapping of disease relevant mutations in or close to the identified haplotypes.

Our results are supported by functional observations. CCL5 (RANTES) and CCL3 (MIP-1α) are likely to be involved in the recruitment of mononuclear cells (MNCs) into the central nervous system (CNS) and in the pathogenesis of both MS and EAE. Blockade of CCL3 prevents the development of both acute and relapsing forms of EAE, and the immigration of MNCs into the CNS.33 CCR1 and CCR5, the main receptors for these chemokines, are expressed on MNCs in blood, cerebrospinal fluid (CSF) and plaques. Evidence supports that monocytes competent to enter the CNS are derived from a minor pool of CCR1+/CCR5+ MNCs in the peripheral circulation. In the presence of their CCL3 and CCL5 ligands, these cells will be retained in the CNS.33

CCL2 (MCP-1) and its receptor CCR2 have also been implicated in the pathogenesis of both MS and EAE. A review of available data in blood, CSF and CNS of MS patients and the results of descriptive, transgenic, knockout and neutralizing antibody studies in EAE suggest that CCL2 and CCR2 play important roles in the development of inflammatory lesions in the CNS.9,34,35 MCP chemokines are generally chemoattractants for monocytes. Among them, CCL2 is also a chemoattractant for memory T cells, dendritic cells, natural killer cells and microglia. The expression of CCL2 can be induced in various cell types, including glial cells in the CNS.9 Consistent with the above observations concerning CCL2 and CCL3 in MS, our SNP haplotype analyses revealed that variants within these genes may be associated with the disease.

In conclusion, this study reveals that haplotypes within genes of CCL2, CCL11–CCL8–CCL13 and CCL3 are associated with MS. Further investigation of these haplotypes with dense SNP markers in families and in case–control cohorts of various ethnic origins will help to determine the variants conferring the highest risks to the development of the disease. If CCL variants produced by activated residential immune cells play an important role in the recruitment, retention and further activation of peripheral immune cells in the CNS, then genetically engineered small molecules with chemokine receptor antagonist properties may beneficially influence the natural history of the disease with autoimmune features and overt inflammatory activity.