Investigation of innate immunity genes CARD4, CARD8 and CARD15 as germline susceptibility factors for colorectal cancer

Background Variation in genes involved in the innate immune response may play a role in the predisposition to colorectal cancer (CRC). Several polymorphisms of the CARD15 gene (caspase activating recruitment domain, member 15) have been reported to be associated with an increased susceptibility to Crohn disease. Since the CARD15 gene product and other CARD proteins function in innate immunity, we investigated the impact of germline variation at the CARD4, CARD8 and CARD15 loci on the risk for sporadic CRC, using a large patient sample from Northern Germany. Methods A total of 1044 patients who had been operated with sporadic colorectal carcinoma (median age at diagnosis: 59 years) were recruited and compared to 724 sex-matched, population-based control individuals (median age: 68 years). Genetic investigation was carried out following both a coding SNP and haplotype tagging approach. Subgroup analyses for N = 143 patients with early manifestation of CRC (≤50 age at diagnosis) were performed for all CARD loci and subgroup analyses for diverse age strata were carried out for CARD15 mutations R702W, G908R and L1007fs. In addition, all SNPs were tested for association with disease presentation and family history of CRC. Results No significant differences were observed between the patient and control allelic or haplotypic spectra of the three genes under study for the total cohort (N = 1044 patients). None of the analysed SNPs was significantly associated with either tumour location or yielded significant association in the familial or non-familial CRC patient subgroups. However, in a patient subgroup (≤45 age at diagnosis) with early disease manifestation the mutant allele of CARD15 R702W was found to be significantly associated with disease susceptibility (9.7% in cases vs 4.6% in controls; Pallelic = 0.008, Pgenotypic = 0.0008, ORallelic = 2.22 (1.21-4.05) ORressessive = 21.9 (1.96-245.4). Conclusion Variation in the innate immunity genes CARD4, CARD8 and CARD15 is unlikely to play a major role in the susceptibility to CRC in the German population. But, we report a significant disease contribution of CARD15 for CRC patients with very early disease manifestation, mainly driven by variant R702W.

Results: No significant differences were observed between the patient and control allelic or haplotypic spectra of the three genes under study for the total cohort (N = 1044 patients). None of the analysed SNPs was significantly associated with either tumour location or yielded significant association in the familial or non-familial CRC patient subgroups. However, in a patient subgroup (45 age at diagnosis) with early disease manifestation the mutant allele of CARD15 R702W was found to be significantly associated with disease susceptibility (9.7% in cases vs 4.6% in controls; P allelic = 0.008, P genotypic = 0.0008, OR allelic = 2.22 (1.21-4.05) OR ressessive = 21.9 (1.96-245.4).
Conclusion: Variation in the innate immunity genes CARD4, CARD8 and CARD15 is unlikely to play a major role in the susceptibility to CRC in the German population. But, we report a significant disease contribution of CARD15 for CRC patients with very early disease manifestation, mainly driven by variant R702W.

Background
Colorectal cancer (CRC) occurs both as a part of recognized heritable syndromes and in the form of "sporadic" disease. However, epidemiological studies have also revealed familial clustering of CRC outside the recognized syndromes [1]. Estimates of the relative familial recurrence risk of nonsyndromic CRC range from 1.7 for unselected cases [2] to 6.2 for siblings of index patients aged <55 years [3]. As yet, the molecular basis of this familial clustering of "sporadic" CRC has not been fully explored.
Epidemiological and functional evidence suggest that cancer may arise in the context of chronic inflammation. Inflammatory bowel disease (IBD) is a well established example in that patients with IBD have an increased risk for the development of CRC [4]. It is estimated that 1-2% of all CRC cases in general population are due to a complicated course of ulcerative colitis (UC) or Crohn disease (CD), and up to 15% of all patients with IBD develop CRC during their life [5,6]. Therefore, it appears sensible to hypothesize that genetic factors predisposing to, or involved in, the chronic inflammatory response in IBD also play an important role for the predisposition to CRC.
As regards CD, the CARD15 (caspase recruitment domain family, member 15) gene mutations rs2066844 (R702W), rs2066845 (G908R) and rs2066847 (L1007fs) were originally shown to be associated with an increase disease risk [7,8], and this association has since been replicated in numerous Caucasian populations [9][10][11]. The CARD15 gene overlaps with the linkage-based IBD1 locus on chromosome 16q12. It exerts its biological function as part of a larger molecular network of genes involved in innate immune recognition and regulation [12]. There are two other genes in this CARD family that have been implicated in the etiology of IBD as well. These include CARD4/NOD1 [13] and CARD8/TUCAN [14], both of which also function as intracellular receptors of bacterial products, but with different ligand specificity.
CARD4/NOD1 is located within a susceptibility locus for IBD on chromosome 7p14. CARD4 and is an additional PAMP receptor, that is linked to up apoptosis and activates NF-B. The complex intronic indel polymorphism ND1+32656 (partially defined by rs6958571) has recently been reported to be associated with IBD [13]. CARD8/TUCAN (tumor-up-regulated CARD-containing antagonist of caspase nine) is expressed in gastrointestinal epithelium. There is evidence to suggest that CARD8 may be a negative regulator of NF-B and also has a regulatory effect on apoptosis. The common variant rs2043211(c.30T>A) introduces a stop codon (Cys10Ter) at position 10 of the amino acid sequence. The common allele T was recently found to be associated with Crohn Disease [14].
Recently, several studies implicated CARD15 in the susceptibility to CRC. Thus, Kurzawski et al. [15] observed an increased frequency in comparison to newborns of the 3020InsC mutation among 250 Polish CRC patients aged >50 years at the time of diagnosis resulting in an odds ratio of 2.2. Interestingly, no evidence for an association with CRC was found in younger patients. The 3020InsC association was confirmed in a subsequent study from Greece [16] and for CARD15 R702W in a study from New Zealand [17]. However, three subsequent studies from Finnland [18,19] and Hungary [20] failed to corroborate any of these findings. A summary of CARD15 allele frequencies observed in different CRC populations is given in Table 1.
In view of these controversial results, and in order to obtain a more complete assessment of the impact of variation in the innate immunity genes on CRC susceptibility, we investigated multiple variants of the CARD4, CARD8 and CARD15 genes in a large case-control sample from Northern Germany, following both a haplotype tagging and a coding SNP approach.
Poland, [15] Poland, [15] Finnland, [18] Germany Germany Germany Germany Germany Greece, [16] New Zealand, [17] Hungary, [20] Finnland, [18] Age  [21]. They were interviewed by mail questionnaire and a venous EDTA blood sample was obtained either at the POPGEN office or by the patient's general practitioner. All study protocols were approved by the institutional ethics committees and the local data protection officer. Written informed consent was obtained from all study participants. For both cases and controls, the study was restricted to probands of German ancestry, i. e. only individuals whose parents were born in Germany. Patients and controls fulfilling either of the clinical Amsterdam or Bethesda criteria for HNPCC were excluded for the study [22], as were patients with a history of inflammatory bowel disease (IBD). The first consecutive 522 male and 522 female participants were included in the study, yielding a sample size of 1044 patients. The median age at diagnosis was 59 years (range: 18-92 years; Table 2). Patients with FAP (as were all other known monogenic forms of CRC) were excluded from the study. Patients having at least 1 first-degree relative diagnosed with CRC were defined as familial cases, whereas non-familial cases had no first-degree relatives with CRC. Healthy control individuals (N = 724, including 362 males) with a median age at time of recruitment of 68 years (range: 48-81 years) were obtained from the population-derived pool of controls individuals in the POP-GEN project, identified on the basis of the local population registry [21]. Control individuals with a history of malignant disease or IBD were excluded from the study. Controls were sex-and age-matched to the case sample, with a median age of 10 (± 1) years above the cases' age at diagnosis.

Genotyping
DNA from all samples was prepared using the FlexiGene chemistry (Qiagen, Hilden, Germany) according to the manufactures protocols. DNA samples were evaluated by gel electrophoresis and adjusted to 20-30 ng/l DNA content using the Picogreen fluorescent dye (Molecular Probes -Invitrogen, Carlsbad, Ca, USA). One microliter of genomic DNA was amplified with the GenomiPhi (Amersham, Uppsala, Sweden) whole genome amplication kit and fragmented at 99°C for three minutes. One hundred nanograms of DNA were dryed overnight in TwinTec hardshell 384 well plates (Eppendorf, Hamburg, Germany) at room temperature. Genotyping was performed for these plates using the SNPlex chemistry (Applied Biosystems, Foster City, USA) on an automated platform with TECAN Freedom EVO and 384 well TEMO liquid handling robots (TECAN, Männedorf, Switzerland). Genotypes were reviewed manually using the Genemapper 4.0 (Applied Biosystems) software. None of the variants showed a significant departure from Hardy-Weinberg equilibrium (p > 0.1), indicating robust genotyping in this experiment. All process data were logged and administered with a database-driven LIMS [23]. Genotypes of nonsynonymous polymorphism R702W (rs2066844) and the complex intronic indel polymorphism ND1+32656 (PCR forward primer GTCCTTCT-GGTGTACTGATGT ATGAAA, PCR reverse primer, Taqman probe VIC (T-allele): CGCCCCCCACACA, Taqman probe FAM [GG-allele): CCCCCCCCCACAC) were determined using the TaqMan (Applied Biosystems) system. Reactions were completed and read in a 7900 HT TaqMan sequence detector system (Applied Biosystems). The amplification reaction was carried out with the Taq-Man universal master mix. Thermal cycling conditions consisted of 1 cycle for 10 minutes at 95°C, 45 cycles for 15 seconds at 95°C, and 45 cycles for 1 minute at 60°C. Primers and probes have been reported before [24].

SNP selection and data analysis
For all genes, single nucleotide polymorphism (SNPs) were retrieved from HAPMAP http://www.hapmap.org by the automated selection, from the CEU dataset, of haplotype tagging SNPs for Causcasians (setting: Mendel errors: 0, minor allele frequency >0.05, HWE cut-off p > 0.01) [25]. In addition, coding SNPs reported in dbSNP or in the literature were included if they had a a minor allel frequency 0.01 in Caucasians. Figures 1, 2 and 3 show the distribution of markers across genes and the regional haplotype structures as generated by HAPLOVIEW.
The study was of case-control design. In order to improve power -i. e. to detect association to variants on the haplotypes, that are not directly tagged by one of the SNPs in the experiments -a sliding window haplotype analysis using window sizes of two to five markers was performed. Haplotype analysis was performed using COCAPHASE through the UNPHASED suite of programs http:// www.rfcgr.mrc.ac.uk/~fdudbrid/software/unphased/ [26]. COCAPHASE performs likelihood ratio tests under a loglinear model of the probability that a haplotype belongs to the case rather than to the control group. The expectation maximization (EM) algorithm is utilized to resolve uncertain haplotypes and provides maximum-likelihood estimates of frequencies. One single overall test statistic per sliding window (HAP2-5) is reported as a global significance value P for each haplotype tested. Nominal p values will be reported fo all tests. Single-point genotypeand allele-based tests of association were performed using a chi-squared test or fisher exact test.
Overview of the physical and genetic structure of the CARD4 gene region ] as generated by Haploview [29]. The LD plots have been generated from the HAPMAP data.

Results
A systematic power analysis was performed for the 1044 cases and 724 controls available for study, adopting various allelic odds ratio between 1 and 2, a nominal significance level of 0.05, and minor allele frequencies of 0.1, 0.2, 0.3, 0.4 and 0.5 for a potential susceptibility mutation, respectively [27]. The power to detect odds ratios >1.5 was found to be >80% under all models ( Figure 4). For frequent susceptibility factors, even odds ratios >1.3 would be detectable with the same power.
All tagging SNPs (see Methods section) from the three candidate genes and all validated nonsynonymous SNPs with a minor allele frequency 1% in Caucasians were included in the genotyping. Sixteen SNPs in the CARD4 gene, 13 SNPs in the CARD8, and 8 SNPs in the CARD15 gene were selected. These markers provided a good coverage of the respective genes and tagged all major haplotype blocks as determined by the tagging routine [28] implemented in HAPLOVIEW [29]. The disease association analyses were performed in single-tier fashion, using all cases and controls. Because earlier reports have indicated age-stratification of CARD15 association and younger patients in general may have a stronger genetic component to their disease, a separate analysis including only cases with an age of onset below 50 (median age of onset 45) compared to a sexmatched control population was performed, too. In addition, a further subgroup analysis for cases with an age of onset below 45 (median age of onset 41) was performed for CARD15 risk variants, too.

CARD4
For CARD4, some 16 SNP markers were genotyped. Figure  1 provides an overview of the linkage disequilibrium pattern generated from Hapmap Caucasian samples and the location of the tag SNPs in the gene. Association findings are summarised in Table 3. Nominal p-values for allelic association tests in the total cohort ranged from 0.10 to 0.92; sliding window haplotype analyses of two to five markers yielded p values between 0.08 and 0.99, respectively. The complex intronic indel polymorphism ND1+32656 yielded a p-value of 0.13 for the total cohort. ND1+32656 is part of a conserved haplotype also defined by SNPs rs2907748 and rs2907749 [30]. Both SNPs rs2907748 and rs2907749 (31 bp downstream to rs6958571) were also included in our experiments for ease and robustness of genotyping [30] and yielded similar p-values as ND1+32656. The non-synonymous coding SNP E266K, which was reported to be associated with disease susceptibility for Crohn's disease [31] in Hungarian Overview of the physical and genetic structure of the gene CARD8 gene region patients, yielded a p-value of 0.36 for the total cohort. In the analysis of the total patient sample, none of the single point or haplotype analyses revealed any significant association with CRC risk.
The subgroup analysis of patients younger than 50 years at onset of disease for ND1+32656 yielded an allelic pvalue of 0.06 ( Table 2). The subgroup analysis of the most youngest CRC patients (N = 72; 45 age at diagnosis) yielded (P allelic = 0.06, P genotypic = 0.025, OR allelic = 1.29 (0.79-2.11) and did not improve the results. For E266K a significant association was observed in patients younger than 50 years (P allelic = 0.004, OR allelic = 0.62 (0.45-0.86). A similarly p-value of 0.003 was yielded for the intronic tag SNP rs2075819 that is in strong LD (r 2 = 0.9) to E266K. The subgroup analysis of the most youngest CRC patients (<45) for E266K yielded (P allelic = 0.03, P genotypic = 0.014, OR allelic = 0.48 (0.28-0.82)). However, both E266K Overview of the physical and genetic structure of the gene CARD15 gene region Power analysis of the sample used in the present study Figure 4 Power analysis of the sample used in the present study. The power of an allelic test is plotted as a function of the underlying odds ration of the tested genetic variant. Calculations were performed for a nominal significance level of 0.05 in a two-sided test. The different colors denote the frequency of the minor allele of the respective variant. Clearly, power increased for frequent variants and higher underlying odds. It is evident, that odds ratios above 1.6 should be detectable with a power greater 80% for all allele frequencies. The graph was generated using PS-power [27] and shows the power as a function of the odds ratio (x-axis). and rs2075819 only have borderline significance in respect to a significance threshold of p 0.003 after Bonferroni correction for multiple testing (N = 16 SNP marker).

CARD8
In total, 13 SNPs in the CARD8 gene were genotyped (Figure 2). In our study, neither allele of rs2043211 was associated with CRC (P allelic = 0.86 OR allelic = 1.  marker). None of the other single marker or haplotyp SNPs at that locus yielded significant p values after Bonferroni correction was applied.

CARD15
Eight SNP markers in the CARD15 gene were genotyped ( Figure 3) and the results of the respective association analyses are summarised in Table 1, 5, 6 &7. The haplotype structure and the risk alleles at CARD15 are well defined [32], so that the initial analyses focussed on the three main coding SNPs (Table 5). No association was seen between CRC and CARD15 variants R702W (5.1% in cases vs 4.6% in controls; P allelic = 0.50, P genotypic = 0.59), G908R (1.5% vs 1.2%; P allelic = 0.43, P genotypic = 0.60) and L1007fs (3.6% vs 2.8%; P allelic = 0.17, P genotypic = 0.36). Two compound heterozygotes carrying R702W and G908R and four compound heterozygotes carrying R702W and L1007fs were observed among the CRC patients, whereas non of these combinations was found in controls. The combined frequency of genotypes harbour-ing R702W, G908R or L1007fs was also not significantly different in cases and controls (10.2% vs 8.6%; P allelic = 0.10, P genotypic = 0.10).
Although the haplotype structure of CARD15 has been explored before, we performed a sliding window haplotype analysis of this locus for the sake of consistency. The results reflected the pattern seen in the coding SNP analysis (Table 5) and were essentially negative: Only a nominal significance level of 0.04 was obtained for the 2-locus haplotype spanning rs5743291 and rs2066847. None of the neighbouring haplotypes showed any evidence for an association with CRC.
In contrast, R702W was not significant in the more older patient group (N = 901, >50 age at diagnosis), (4.7% in cases vs 4.6% in controls; P allelic = 0.83) ( Table 1). Similar negative results for R702W were obtained when patients diagnosed with CRC before age 60 (N = 597) were analysed (5.4% in cases vs 4.6% in controls; P allelic = 0.3).

Stratification analysis of disease presentation and family history
None of the analysed SNPs for CARD4, 8 and 15 was significantly associated (all p values > 0.05; Table 7) with tumour location or yielded significant association in the familial or non-familial CRC patient subgroups (Table 7 only lists a small selection of the analysed SNPs).

Discussion
The functional relationship between inflammation and cancer is well established and dates back to Virchow in 1863. He propagated this interesting hypothesis based on the observation that some classes of irritants, together with the tissue injury and causative inflammation, enhance cell proliferation. By now it is known that many cancers arise from sites of infection, chronic irritation and inflammation. About 15% of the global cancer burden is attributable to infectious agents, and inflammation is a major element of these chronic infections. For example, the development of mucosal associated lymphoid tissue (MALT) B cell lymphoma and gastric cancer is associated with Helicobacter pylori-induced chronic gastritis, and an increased risk of CRC accompanies inflammatory bowel disease (IBD) [4,[33][34][35]. Therefore, we tried to replicate previously reported associations of functional SNPs in innate immunity gene CARD15 with CRC susceptibility, and also investigated variants of the CARD4 and CARD8 innate immune genes in this context. a P value were not significant after Bonferroni correction for multiple testing, *no odds ratio calculated due to low allele frequency, # p-value reported for the first marker in the haplotype window. The minor allele frequencies (MAF) for cases and controls are reported. P values and odds ratios are reported for the allelic (p allelic ) test. Columns P HAP2 to P HAP5 refer to a sliding window haplotype analysis using COCAPHASE. For example, P HAP2 (0.295) for rs2066847 reports the global significance value for the window 2 haplotype sapnning rs2066847-rs8056611. The international HAPMAP project http://www.hap map.org has generated a whealth of genotype and marker information that significantly facilitates the design of candidate gene studies [25]. For our candidate gene study, a primary haplotype tagging approach was chosen, i.e. the genetic variation at both loci was captured by a set of carefully selected SNPs. This tagging approach is able to detect signals from hitherto unknown regulatory or functional elements in a given genetic region [28,36,37]. Therefore, it offers potential advantages over a direct mutation screen of the coding region of a gene because disease susceptibility may also be conferred by variations, for instance, in splice sites or intronic enhancers [38,39]. Tagging SNPs were thus selected from the public HAPMAP http:// www.hapmap.org resources. The genotype and allele frequencies for the SNPs investigated in our control population were not significantly different from those of the Caucasian HAPMAP individuals (Tables 3, 4 and 5), thereby justifying the selection of tag SNPs from this resource [25].
We utilized over 1000 patients who have been operated for CRC, which renders our study the largest case-control study of CARD15 mutations reported so far. In Germany, the population median age of affection by CRC is about 69 years for males and 74 years for females [40]. The 1*1 homozygous wild-type, 1*2 heterozygous, 2*2 homozygous mutant, (c) homozygous for mutant allele and compound heterozygous combined; P allelic and P geno are calculated from the observed genotypes (1*1, 1*2, 2*2); a P geno 2*2 (c): genotypic p values and OR CAR (c): odds ratio for carriership of rare allele and OR REC (c): odds ratio for homozygosity of rare allele under the ressesive disease model were calculated by judging compound heterozygotes as homozygotes of the rare allele, P allelic and P geno were calculated without considering compound heterozygotes; b P values and odds ratios for young CRC (45 age at diagnosis) were calculated against the total control cohort.  actual median age of onset in our cases was 59 years. In an additional attempt to increase the power of our investigations, only IBD-negative and cancer-free control individuals with a moderately higher median age of 68 years were used. Many polygenic disorders are characterized by a strong correlation between the age of onset of relatives, as has been documented for instance in breast cancer [41] and Alzheimer disease [42]. It is indeed plausible that the genetic influence upon the development of these disorders is partly reflected by the age at which individuals develop the disorder [43]. Confounding by population affiliation was minimized by the restriction to patients of German ancestry as determined by the birth place of both parents.
The sample size used in this study has a power >80% for the detection of allelic odds ratios >1.5 at a significance level of 0.05 ( Figure 4). For more frequent mutation variants, even odds ratios as small as 1.3 would have been detected with the same power. However, in patient subgroups of early disease manifestation (45 and 50 age at diagnosis) we detected a significant association of the R702W mutation with CRC. In the German cohort, this association would have remained undiscovered if the subgroup analyses for earlier disease manifestation would had been confined only to patients with 60 years at diagnosis, as done in previous studies that did not detect an association with CRC [18][19][20]. In line with this, the association signal for all tested CARD15 mutations totally fainted when CRC patients older than 50 years at diagnosis were analysed. No association was found for G908R. We observed an increase in the frequency of R702W and also 3020insC risk alleles and compound heterozygotes between both with decreasing age of diagnosis in non-IBD affected CRC patients. Thus, CARD15 mediated CRC disease susceptibility seems to be confined to early onset CRC in the German population. CARD15 R702W and 3020insC affect the C-terminal LRRpart of NOD2 lead to a reduced responsiveness to bacterial components and are assumed to influence the crosstalk with Toll-like receptor function that results in a proinflammatory cytokine bias [44][45][46]. The resulting chronic imflammatory state of the colon could provide the mechanistic link to cancer development.
In the whole, however, our findings corroborate the two studies from Finnland and Hungary. Both studies disproved CARD15 mutations as major contributors to general 'sporadic' colorectal cancer disease susceptibility. In addition, none of the analysed SNPs was significantly associated with either tumour location or yielded significant association in the familial or non-familial CRC patient subgroups.
Furthermore, our study did not unravel any major contribution of CARD4 and CARD8 variants to the predisposition to CRC despite the results of recent genome-wide association studies [47][48][49][50][51][52][53][54] showing that part of the CRC risk is due to common low-risk variants. The intronic complex indel polymorphism CARD4 ND1+ 32656 was not significantly disease associated neither in the total cohort nor in the subgroup analysis of younger CRC patients. The coding polymorphism CARD4 E266K was reported to be associated with Crohn's disease in Hungarian population with the risk allele being more frequent in patients than in controls [31]. In contrast, we observed the E266K mutant allele 8% less frequently in the younger subgroup of CRC cases (50 age at diagnosis) compared to controls. Interestingly, in a Scottish IBD population the mutant allele was likewise observed less frequently, though not significant, in early onset IBD (<17 age) patients compared to controls [55].
There may be many explanations of the lack of significant results for the total CRC cohort in the present study. A major reason might be population genetic differences in terms of allele frequencies and in terms of the contribution of individual risk variants. Regional heterogeneity within Europe as reported with respect to the contribution of CARD15 variants to CD susceptibility [32,55] may also apply to the CRC risk. In addition, environmental factors may differ widely even between Caucasian populations.
Another critical point may be the necessity of a sufficient sample size utilized in association studies. There might be a weakness in this promise of previously published data, as studies demonstrating evidence of CARD15 as a susceptibility gene for CRC, named Kurzawski et al. [15] and Papaconstantinou et al. [16], feature relatively low patient numbers.

Conclusion
In conclusion, common variants of the innate immunity genes CARD4, CARD8 and CARD15 are not associated with susceptibility to CRC in German population. In accordance with previous studies, these findings suggest that such variants are unlikey to play a major role disease development in the majority of CRC patients. However, our findings suggest a different situation for CRC patients with early disease manifestation. For this patient sub-