Linkage studies identify regions of the genome that are shared more frequently than would be expected by chance by relatives affected by a particular disease. Most studies analyze affected sibling pairs and utilize genetic markers that are scattered throughout the genome at moderate density, typically microsatellites. A significant excess of allele sharing identical by-descent (IBD) in affected sibpairs suggests that the region containing the marker locus also contains a disease susceptibility locus. The first linkage scan for T1DM identified 20 chromosomal regions with suggestive evidence of linkage to disease, including the HLA and INS gene regions . Subsequent studies have replicated the linkage with HLA, but findings at other loci have been inconsistent. A recent analysis of 2,496 multiplex families from the Type 1 Diabetes Genetics Consortium reported significant evidence of linkage to HLA, along with a second locus on chromosome 6q. Suggestive evidence of linkage was observed near the CTLA4 and INS genes, along with two regions on chromosome 19 .
Genome-wide association studies (GWAS)
One of the limitations of linkage studies is their inadequate statistical power to detect risk variants with small effect sizes. As our understanding of the genetic basis of complex diseases has grown, it has become apparent that many susceptibility genes have a very modest influence on disease risk (OR≤1.3). Such small effects can be detected more readily by GWAS, provided that the alleles at the risk loci are relatively common in the studied population (minor allele frequency >5%) and sufficiently large datasets are available for analysis. GWAS utilize high-throughput genotyping platforms to analyze several hundred thousand single nucleotide polymorphisms (SNPs), providing much denser coverage of the genome than linkage studies. The analysis relies on the assumption that the SNPs selected for genotyping will “tag” potentially causal variants as a result of linkage disequilibrium. The marker SNPs are genotyped in a discovery cohort of affected (cases) and unaffected, unrelated individuals (controls) and those showing suggestive evidence of association with disease are then taken forward for replication in an independent case/control dataset. Due to the large number of SNPs investigated, very large sample cohorts are required in the replication phase to achieve genome-wide statistical significance for a given variant (widely accepted as a p value less than 5 × 10−8). GWAS are therefore generally performed by international collaborative efforts and large consortia, such as the Wellcome Trust Case–control Consortium (WTCCC) and the T1DGC.
Preliminary analysis of 6500 nonsynonymous SNPs in 2029 T1DM cases and 1755 control samples identified a significant disease association with the minor allele of rs1990760 in the IFIH1 gene . This locus encodes interferon-induced helicase C domain 1, which serves as a cellular receptor for double-stranded viral RNA and triggers the production of interferon in response to viral infection. As enteroviruses have been implicated as environmental triggers of T1DM development, the IFIH1 gene may be involved in mediating this effect by activating the innate immune system. The presence or absence of inflammatory signals produced by this system will determine whether T-cell activation in the periphery results in an aggressive effector response or a protective regulatory response.
The first complete GWAS for T1DM was reported by the WTCCC in 2007 . This confirmed associations at known loci (HLA, INS, CTLA4, PTPN22, IL2RA, and IFIH1) and also identified several novel associations with variants on chromosomes 12q24, 12q13, 16p13, 12p13, 5q31, and 4q27. Follow-up studies in different cohorts and meta-analyses of multiple datasets have subsequently confirmed associations at all these loci except 5q31, as well as identifying a large number of additional association signals [11,53]. To date, more than 50 loci have been implicated as determinants of T1DM risk using the GWAS approach; a complete list can be found on the T1DBase website (http://t1dbase.org). Many of these loci are also associated with other autoimmune diseases, suggesting common underlying mechanisms in disease development. The majority of the T1DM risk variants have very small effect sizes, with odds ratios between 1.05 and 1.3, significantly lower than those reported for the HLA genes, INS and PTPN22 (Figure 30.4). Given the statistical power of the GWAS and the large number of SNPs genotyped, however, it is highly unlikely that any additional common variants with large effects on disease risk were missed.
It is important to note that, like linkage analyses, GWAS do not directly identify causal variants. The designations often applied to associated variants can therefore be misleading. For convenience, signal SNPs are generally assigned the name of the closest gene, or the most plausible candidate gene in close proximity, as a reference point for their location, although some associated variants map to apparent gene deserts, with no known annotated genes. Close physical proximity of a candidate gene, however, does not necessarily mean that the gene has any functional involvement in disease pathogenesis or that it accounts for the risk associated with the nearby variant. Many disease associated variants map to large blocks of LD, which may span hundreds of kilobases and encompass many genes. In most cases the signal SNP will merely be a marker for a causal variant located somewhere in the LD block. Fine mapping is therefore required to narrow down the likely location of this etiologic SNP.This is generally achieved by deep sequencing of the LD region of interest to identify all the polymorphisms that might contribute to the observed association. These are then genotyped in large case–control cohorts and conditional regression analysis is used to tease out the contribution of each variant and determine whether this can be explained by LD with other SNPs or is an independent phenomenon. This approach often narrows down disease association to haplotypes comprising many closely correlated variants, but further refinement cannot be achieved using genetic means due to very strong LD between the SNPs. Functional analysis is then necessary to home in on the causal variant(s). This may be accomplished by identifying polymorphisms known to influence gene expression levels (from publicly available expression QTL (eQTL) databases) or missense variants within the associated haplotype that could disrupt protein function.
The T1DM risk loci identified by GWAS contain over 300 protein-encoding genes, many of which are good candidate genes due to the role of their products in immunologic or metabolic pathways. This estimate excludes genes located outside the associated LD blocks that could have regulatory elements within the blocks. As functional SNPs may exert their effects via long-distance gene regulation, their contribution to disease risk may be mediated by genes located several kilobases away from the LD blocks in which the SNPs lie. This highlights the difficulties involved in conclusively identifying the causal gene variants and elucidating the genetic mechanisms underlying T1DM risk. To date, only the HLA gene region, INS and IL2RA have been fine-mapped with any degree of thoroughness, although functional studies also support the PTPN22 R620W variant as a causal SNP.
To date, GWAS for T1DM have been performed exclusively in white European datasets, due to the difficulty in recruiting sufficiently large cohorts from other ethnic groups, in which the disease is relatively rare. One drawback of this limited approach is that some susceptibility loci might have been overlooked if their minor alleles occur with low frequency (<5%) in white Europeans, as the studies would have been underpowered to detect the associations. Allele frequencies at some loci are increased in certain populations as a result of population genetic drift, thus boosting statistical power to detect disease associations with these variants. It is therefore possible that additional susceptibility loci for T1DM may be discovered if different ethnic groups are analyzed by GWAS. A second issue is that it is not known whether the T1DM risk loci identified in white Europeans have global relevance. To date, studies have not attempted to replicate associations at all the GWAS loci in non-White populations as available datasets are too small. Candidate gene studies have shown suggestive evidence of population-specific effects, however. For example, the M55V variant of the SUMO4 gene is strongly and consistently associated with T1DM in East Asian populations, but not in white Europeans [53,54]. In contrast, the disease association with the R620W variant of PTPN22, reported in white Europeans, is not seen in Asians , perhaps due to the rarity of the minor allele in the latter population. Genetic heterogeneity between different populations is perhaps to be expected, given that most of the genes influencing T1DM risk are likely to act via the immune system. Population survival is dependent upon the adaptation of the immune response to local environmental insults, which are likely to differ in different areas of the world. This shaping of the immune repertoire might be reflected in ethnic differences in immune response genes. As a result, it may not be possible to extrapolate all genetic associations observed in European populations to populations of different racial ancestry. There is clearly a need to broaden genome-wide analyses to encompass a more diverse set of populations of different ethnic ancestry to fully understand the contribution of genetic factors to disease risk on a global scale. Exploitation of differences in LD patterns in different ethnic groups may also facilitate the fine-mapping of causal variants for T1DM.