Identifying potential biomarkers of nonalcoholic fatty liver disease via genome-wide analysis of copy number variation

Background The prevalence of Non-alcoholic fatty liver disease (NAFLD) is increasing and emerging as a global health burden. In addition to environmental factors, numerous studies have shown that genetic factors play an important role in the development of NAFLD. Copy number variation (CNV) as a genetic variation plays an important role in the evaluation of disease susceptibility and genetic differences. The aim of the present study was to assess the contribution of CNV to the evaluation of NAFLD in a Chinese population. Methods Genome-wide analysis of CNV was performed using high-density comparative genomic hybridisation microarrays (ACGH). To validate the CNV regions, TaqMan real-time quantitative PCR (qPCR) was utilized. Results A total of 441 CNVs were identified, including 381 autosomal CNVs and 60 sex chromosome CNVs. By merging overlapping CNVs, a genomic CNV map of NAFLD patients was constructed. A total of 338 autosomal CNVRs were identified, including 275 CNVRs with consistent trends (197 losses and 78 gains) and 63 CNVRs with inconsistent trends. The length of the 338 CNVRs ranged from 5.7 kb to 2.23 Mb, with an average size of 117.44 kb. These CNVRs spanned 39.70 Mb of the genome and accounted for ~ 1.32% of the genome sequence. Through Gene Ontology and genetic pathway analysis, we found evidence that CNVs involving nine genes may be associated with the pathogenesis of NAFLD progression. One of the genes (NLRP4 gene) was selected and verified by quantitative PCR (qPCR) method with large sample size. We found the copy number deletion of NLRP4 was related to the risk of NAFLD. Conclusions This study indicate the copy number variation is associated with NAFLD. The copy number deletion of NLRP4 was related to the risk of NAFLD. These results could prove valuable for predicting patients at risk of developing NAFLD. Supplementary Information The online version contains supplementary material available at 10.1186/s12876-021-01750-4.


Introduction
Non-alcoholic fatty liver disease (NAFLD), defined as surplus fat accumulated in the liver, covers a spectrum from simple steatosis to non-alcoholic steatohepatitis (NASH), fatty liver fibrosis (HF), liver cirrhosis (LC) and hepatocellular carcinoma (HCC) [1]. NAFLD is also closely associated with higher prevalence of obesity, type 2 diabetes and cardiovascular disease [2][3][4]. The prevalence of NAFLD is increasing and emerging as a global health burden, therefore it is crucial to explore pathogenesis and effective prevention management.
In addition to environmental factors, numerous studies have shown that genetic factors play an important role in the development of NAFLD. For example, genomewide association studies (GWAS) revealed that singlenucleotide polymorphisms of genes are closely related to the risk of NAFLD [5][6][7]. Copy number variation (CNV) covers submicroscopic variation in the human genome ranging from 1 to 3 Mb in size, and includes insertions, inversions, deletions, duplications and translocations. CNV includes gene coding regions and regulatory elements that can influence gene expression, phenotypic variation and adaptation via disruption of genes, and altering gene dosage [8]. Studies have shown that CNV plays an important role in the evaluation of disease susceptibility and genetic differences in Alzheimer's disease [9], Parkinson's disease [10], schizophrenia [11], liver cancer [12] and lung cancer [13]. However, studies on the association between CNV and NAFLD are limited, there exists genetic heterogeneity among study populations, and the sample size is limited.
Herein, a case-control study was designed and conducted in which array-based comparative genomic hybridisation (ACGH) was performed to identify potential CNV associated with NAFLD. Gene annotation analysis software was utilised to ascertain biological processes associated with genes related to CNV. The findings may provide epidemiological evidence for the diagnosis and prevention of NAFLD.

Subjects
The study was approved by local ethics committees of Fujian Medical University (ethics number 2014096). Subjects were recruited from the Health Examination Centre of Nanping First Affiliated Hospital of Fujian Medical University from October 2015 to September 2017. Once cases and controls were linked to NAFLD, a letter of invitation and information about the study was sent to each potential case and control to obtain consent. In order to standardize the experimental process, improve the accuracy of the results, and enhance the comparability of the conclusions, all methods are implemented in accordance with relevant guidelines and regulations.
All subjects underwent abdominal ultrasound, and NAFLD was diagnosed by the presence of at least two of the following three abnormal findings following abdominal ultrasonography [14]: (1) increased echogenicity of the liver near-field region with deep attenuation of the ultrasound signal; (2) hyperechogenity of liver tissue ('bright liver'), accompanied by hypoechogenicity of the kidney cortex; (3) vascular blurring.

ACGH and CNV calling
Based on liver fatty accumulation diagnosed by abdominal ultrasound, NAFLD was divided into three grades, and four cases were selected from each grade [15]. The matching principle was formulated according to the gender, age (± 5 years). The controls was selected from healthy people diagnosed with no NAFLD by abdominal ultrasound during the same period. And those who met the case exclusion criteria were not included. Exclusion criteria were as follows: (1) alcohol consumption > 140 g/ week for men and > 70 g/week for women; (2) the presence of hepatitis B surface antigens or hepatitis C antibodies; (3) use of hepatotoxic drugs (such as tamoxifen, amiodarone, valproate and methotrexate) that can induce hepatic fat accumulation [16]; (4) hepatic disease, which can induce hepatic fat accumulation; (5) hepatic diseases such as Wilson's disease, autoimmune hepatitis and hemochromatosis. ACGH was performed on 12 NAFLD patients and 12 healthy controls.
We mainly focus on genetic susceptibility between CNV and NAFLD in the present study, DNA mutations in peripheral leukocytes reflect germline mutation, suggesting the association between mutations and genetic susceptibility, therefore peripheral leukocyte blood was utilised. Genomic DNA was extracted from peripheral blood samples obtained from each subject using a nucleic acid extraction kit (Qiagen, Hilden, Germany). DNA quality was determined by a Denovix DS-11 spectrophotometer (Denovix, Waltham, MA, USA). DNA purity was verified by A260/A280 ratio of 1.80-2.0. ACGH was performed according to the protocol established by the manufacturer (Oxford Gene Technology, Begbroke, UK). It was carried out using SurePrint G3 Human Genome 4 × 180 K microarrays (Agilent Technologies, Santa Clare, CA, USA) for genome-wide identification of putative disease-associated CNVs. The microarrays contained ~ 180,000 probes that enabled the profiling of molecular genomic imbalance with a mean resolution of 13 kb. Probes in the array covered both coding and noncoding regions of the human genome. A total of 1 µg genomic DNA from patients and controls was labelled with Cy3 and Cy5 dyes, respectively and probes were purified and mixed thoroughly using an Agilent SureTag Complete DNA Labeling Kit (Agilent Technologies). This was followed by denaturation and pre-annealing with 50 µg human Cot-1 DNA. Hybridisation of the mixture was performed on an array slide with constant rotation at 65 °C for 40 h. The slide was then washed with Agilent wash buffers 1 and 2, and scanned immediately using an Agilent G2565CA Microarray Scanner (Agilent Technologies). Data were extracted from scanned images using Feature Extraction Software version 10.10 (Agilent Technologies). Raw data were uploaded into Agilent Cytogenomics software (Agilent Technologies). Genomic aberrations were identified by applying log2 intensity ratios of samples to references (Cy3/Cy5, log2 ratios above 0 for duplicates, and below 0 for deletions). CNV was assigned for segments with at least three consecutive probes. Chromosomal aberrations were reported in accordance with the human genome sequence assembly Build 37, hg 19 (http:// www. ncbi. nlm. nih. gov).

Functional enrichment analysis of CNV
After merging overlapping CNVs appearing in two or more samples, a genomic CNV map of NAFLD patients was constructed using R software. Genes associated with CNV were retrieved from the Homo sapiens (GRCh37) assembly provided by University of California Santa Cruz (UCSC). To analyse genes with CNV and investigate the functional impact of CNV on various biological processes, KOBAS gene annotation analysis software which can be accessed at http:// kobas. cbi. pku. edu. cn was employed. This program annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations. It allows for both ID mapping and cross-species sequence similarity mapping. It then performs statistical tests to identify statistically significantly enriched pathways and diseases. KOBAS 2.0 incorporates knowledge across 1327 species from five pathway databases (KEGG PATHWAY, PID, BioCyc, Reactome and Panther) and five human disease databases (OMIM, KEGG DISEASE, FunDO, GAD and NHGRI GWAS Catalog) [17]. The final list of genes associated with NAFLD was determined using the Pubmed database.

Quantitative PCR validation of CNV calls
To validate the CNV regions, TaqMan real-time quantitative PCR (qPCR) was performed using a Step One Plus instrument (Applied Biosystems) on 557 samples (297 cases and 260 controls) from one selected region (19q13; Assay Hs02992963_cn) which contained the NLRP4 gene. Each reaction (20 µl) contained 10 µl master mix, 1 µl TaqMan Copy Number Assay, 1 ml TaqMan Copy Number Reference Assay, 4 µl nuclease-free water, and 4 µl 5 ng/µl genomic DNA. All reactions were performed in quadruplicate. Thermal cycling conditions consisted of an initial denaturation step at 95 °C for 10 min, followed by 40 cycles at 95 °C for 15 s and 60 °C for 1 min. Furthermore, we analysed CNV in the NLRP4 gene and its association with the risk of NAFLD.

Statistical analysis
Chi-square tests were employed to assess categorical variables, and Mann-Whitney U tests were used for continuous variables. An unconditional logistical regression model was employed to analyse the association between target gene CNV and NAFLD risk. All statistical analyses were performed by SPSS 23.0 software (SPSS, Inc, Chicago, IL, USA). A two-tailed p-value < 0.05 was considered statistical significant.

Clinical characteristics of the study population
The clinical characteristics of the study population are outlined in Table 1. Compared with those of the control group, levels of BMI, WHR, ALT, TG, TC, FBG and LDL-C were significantly higher in the NAFLD group (p < 0.01-0.05) ( Table 1).

Detailed features of CNVRs in the genome
A rigorous quality control check was performed during sample processing, and all DNA samples were suitable for study. In total, 441 CNVs were detected, including 381 autosomal CNVs and 60 sex chromosome CNVs (Fig. 1).
Owing to the evolutionary bias due to small imbalances of sex chromosomes, we opted to exclude sex chromosome CNVs from further analysis. The 381 autosomal CNVs spanned between 5.7 kb and 2.35 Mb in size, with an average size of 181.8 kb (Table 2 and Additional file 1: Table SI). All samples displayed both copy number gains and losses, but copy number gains were more commonly observed than losses (estimated ratio of 1.7:1). Any CNVs overlapping between two or more samples were defined as shared CNVs, and these were integrated, longer fragments were split, and a genomic CNV map   (Table 3). Differences in the number of CNVRs per chromosome were very significant, ranging from 3 for chromosomes 21 and 22 to 36 for chromosome 1. The scales of CNVRs on each chromosome were also different; CNVRs were densely distributed on chromosomes 7, 8, 15 and 16, covering more than 2% of genomic sequences, with CNVRs on chromosome 16 covered 3.33%, compared with only 0.11% for chromosome 20. This demonstrates that the distribution of CNVRs on chromosomes was not uniform.

Validation of CNV by qPCR
The results of ACGH showed that 3 out of 12 pairs harboured copy number losses in chr19 (56370486-56416408) which contained the NLRP4 gene. Functional enrichment analysis indicated that this gene may participate in inflammatory responses by regulating the levels of inflammatory mediators.
Since inflammatory factors play an important role in the development of NAFLD, quantitative PCR (qPCR) Fig. 2 Distribution of CNVRs in the genome with large sample size was performed to validate this CNV. A total of 557 subjects were included in the study, comprising 297 cases and 260 controls. In case group, the prevalence of obesity and hypertension was significantly higher than that in control group. The level of ALT, AST, FBG and TG were higher in the control group, and the LDL-C was higher in cases (Tables 5, 6).
Copy numbers were calculated using CopyCallerv2.0 software and categorised as losses (< 2), normal (= 2), and gains (> 2). In the present work, the NLRP4 gene copy number distribution ranged from 1 to 2, with 15 (5.1%) losses and 282 (94.9%) normal copy number in cases. There were four (1.5%) losses and 256 (98.5%) normal copy number in controls. The distribution of NLRP4 gene copy number variation was statistically significant between the two groups (χ 2 = 5.19, p = 0.02; Table 7, Fig. 4a, b, c). Furthermore, to investigate the relationship between NLRP4 gene copy number variation and the risk of NAFLD, unconditional logistical regression analysis was performed. The results showed that NLRP4 gene copy number deletion was associated with the risk of NAFLD (OR = 3.40, 95% CI = 1.12-10.39). After adjustment for confounding factors (gender, age, BMI, blood pressure and diabetes history), the association was still statistically significant (OR = 4.49, 95% CI = 1.3-15.52; Table 7).
Seven samples exhibited CNV in the region of Chr22 (24347959-24390254) which contained the glutathiones-transferase 1 gene (GSTT1). Six samples displayed copy number gains, and one sample displayed copy number losses. GSTs are a superfamily of proteins that participate in phase II detoxification, which promotes    [18] and diabetes [19]. However, one study targeted at Southeast Iran showed that genetic polymorphism of GSTM1 and GSTP1, but not GSTT1, was associated with NAFLD [20]. The CNV regions Chr14:74001651-74022324 and chr20:33470663-33495348 contained the genes acyl-CoA thioesterase 1 (ACOT1) and acyl-CoA synthetase short-chain family member 2 (ACSS2), both of which are related to lipid metabolism. ACOT1 is a key cytosolic enzyme that participates in fatty acid (FA) metabolism by catalysing the conversion of acyl-CoAs to FAs and coenzyme A, and it is also believed to be a target gene of PPARα [21][22][23]. Previous studies revealed that overexpression of ACOT1 reduces the availability of excess long-chain acyl-CoAs for β-oxidation, and represses PPARα signalling pathways to reverse altered substrate metabolism and attenuate increased oxidative stress, mitochondrial dysfunction, and cardiac inefficiency in diabetic hearts [24]. In mice fed a high fat diet, Acot1 knockdown elicits a robust induction of inflammatory and oxidative stress markers, and increased ROS levels. Thus, when a high fat diet induces steatosis, ACOT1 protects against inflammation and oxidative stress that lead to fibrosis [25].
ACSS2 plays a key role in lipogenesis by converting acetate to acetyl-CoA for lipogenesis [26]. A recent study showed that knockdown of ACSS2 increased the invasion and migration abilities of HCC cells, which plays an important role in the prognosis of patients with HCC [27]. Another study also found that mRNA levels of genes associated with de novo fatty acid synthesis, triacylglycerol synthesis, lipid droplet formation and fatty acid oxidation were downregulated after ACSS2 knockdown [28]. In diet-induced obese mice, lack of ACSS2 results in a significant reduction in body weight and hepatic steatosis [29].
The CNV region chr1:65935075-65959904 contains the gene encoding leptin receptor (LEPR) that binds to leptin in target tissues. Intravenous administration of leptin reduces appetite, while its deficiency increases food intake. Furthermore, leptin can modulate insulin secretion and action through LEPR, and polymorphism of LEPR has been linked to NAFLD [30].
The CNV region Chr3:57199594-57590187 contains genes encoding Adaptor protein and PH domain-and leucine zipper-containing 1 (APPL1). APPL1 is the first identified adaptor protein that interacts directly with   [31]. Polymorphism of APPL1 has been associated with NAFLD [32]. The CNV region chr19:56370486-56416408 contains the gene encoding NLRP4, a negative regulator of proinflammatory cytokines that is associated with inactivation of IKKa/NF-kB [33]. A previous study demonstrated that NLRP4 attenuates the inflammation response by reducing the expression of transforming growth factor-β1 (TGF-β1), tumour necrosis factor-α (TNF-α), interleukin-1β (IL-1β), IL-18 and IL-6 in fructose-treated cardiac cells [34]. In visceral adipose tissue from patients with pericellular fibrosis, NLRP4 mRNA levels were significantly lower than in those without pericellular fibrosis, and the NLRP 4 gene is significantly downregulated in adipose tissue from NASH patients [35].
The CNV region chr17:38688567-38738474 contains the gene encoding C-C chemokine receptor 7 (CCR7), which is primarily expressed in immune cells. Studies found that CCR7 deficiency leads to the development of multi-organ autoimmunity [36], chronic renal disease [37] and autoimmune diabetes [38]. A recent study showed that infiltration of CCR7-expressing cells in adipose tissue is associated with insulin resistance in obesity [39]. The absence of CCR7 decreases IL-10-producing iNKT cells in fatty liver, and exacerbates NAFLD [40].
The CNV region Chr7:54813380-55274871 contains the gene encoding epidermal growth factor receptor (EGFR), a receptor tyrosine kinase expressed in activated hepatic stellate cells (HSCs) that is associated with the development of liver fibrosis. One study indicated that depressing the phosphorylation of EGFR can reduce the number of activated HSCs [41]. Another study suggested that EGFR is phosphorylated in liver tissues in a high-fat diet-induced murine model of NAFLD. Inhibition of EGFR prevents diet-induced lipid accumulation, oxidative stress, HSC activation and matrix deposition [42].
The CNV region chr12:57897795-57918452 contains the gene encoding CCAAT/enhancer-binding protein (C/EBP) homologous protein (CHOP), a major transcriptional regulator of endoplasmic reticulum (ER) stress-mediated apoptosis that is implicated in lipotoxicity-induced ER stress and hepatocyte apoptosis in NAFLD [43]. A previous study found that CHOP knockout (CHOP-/-) mice fed a high-fat diet developed more severe histological NASH features than wildtype controls;The severity of NASH in high-fat diet-fed CHOP−/− mice correlated with a significant decrease in peroxisomal β-oxidation, increased denovo lipogenesis, and ER stress-mediated hepatocyte apoptosis [44]. These findings indicate that CHOP protects hepatocytes from ER stress, and plays a significant role in the mechanism of liraglutide-mediated protection against NASH pathogenesis.
The use of ACGH technology allows CNV discovery at high resolution, and hence confidence in CNV detection. Therefore, this approach may be used to identify noninvasive biomarkers with potential for the pathological evaluation of NAFLD. However, as more comprehensive studies are still ongoing, genes known to be related to NAFLD will be investigated, and our preliminary results remain to be substantiated by studies on larger patient groups. In addition, functional studies on genes residing within these loci are needed to fully characterise the functions of genes and their relationships with NAFLD.

Conclusions
In case group, the copy number deletion of NLRP4 gene in the Chr19:56370486-56416408 region was higher than that in control group. The copy number deletion of this gene was related to the risk of NAFLD.