Meta-analyses of FibroTest diagnostic value in chronic liver disease

Background FibroTest (FT) is a biomarker of liver fibrosis initially validated in patients with chronic hepatitis C (CHC). The aim was to test two hypotheses, one, that the FT diagnostic value was similar in the three other frequent fibrotic diseases: chronic hepatitis B (CHB), alcoholic liver disease (ALD) and non-alcoholic fatty liver disease (NAFLD); and the other, that the FT diagnostic value was similar for intermediate and extreme fibrosis stages. Methods The main end points were the FT area under the ROC curves (AUROCs) for the diagnosis of bridging fibrosis (F2F3F4 vs. F0F1), standardized for the spectrum of fibrosis stages, and the comparison of FT AUROCs between adjacent stages. Two meta-analyses were performed: one combining all the published studies (random model), and one of an integrated data base combining individual data. Sensitivity analysis integrated the independency of authors, lenght of biopsy, prospective design, respect of procedures, comorbidities, and duration between biopsy and serum sampling. Results A total of 30 studies were included which pooled 6,378 subjects with both FT and biopsy (3,501 HCV, 1,457 HBV, 267 NAFLD, 429 ALD, and 724 mixed). Individual data were analyzed in 3,282 patients. The mean standardized AUROC was 0.84 (95% CI, 0.83–0.86), without differences between causes of liver disease: HCV 0.85 (0.82–0.87), HBV 0.80 (0.77–0.84), NAFLD 0.84 (0.76–0.92), ALD 0.86 (0.80–0.92), mixed 0.85 (0.80–0.93). The AUROC for the diagnosis of the intermediate adjacent stages F2 vs. F1 (0.66; 0.63–0.68, n = 2,055) did not differ from that of the extreme stages F3 vs. F4 (0.69; 0.65–0.72, n = 817) or F1 vs. F0 (0.62; 0.59–0.65, n = 1788). Conclusion FibroTest is an effective alternative to biopsy in patients with chronic hepatitis C and B, ALD and NAFLD. The FT diagnostic value is similar for the diagnosis of intermediate and extreme fibrosis stages.

FT is widely used as a non invasive alternative to liver biopsy, with 190,000 tests ordered between September 2002 and April 2007 (Biopredictive data on file, Jean Marie Castille, personal communication); however, two main critiques are often made by experts: 1) FT has been mainly studied in chronic hepatitis C, and 2) the FT diagnostic value is lower for intermediate fibrosis stages (bridging vs. non bridging fibrosis) than for extreme stages (no fibrosis or cirrhosis) [9,10]. In this latter critique, which is also true for liver biopsy, there is a risk of confusion between adjacent stages and intermediate stages or an absence of taking into account the prevalence of fibrosis stages defining advanced and non-advanced fibrosis [11,12].
The aim of this meta-analysis was to test two hypotheses, first, that the FT diagnostic value was similar in patients with HCV and in patients with the three other frequent fibrotic diseases; and second, that the FT diagnostic value was similar for intermediate and extreme stages.

Design
Two meta-analyses were performed; one combined all the published studies (random model), and the other used an integrated database combining individual data provided by authors.
To select published studies we used the Standards for Reporting of Diagnostic Accuracy (STARD) criteria and the Cochrane Database of Systematic Reviews (CDSR) methods [13]. Key STARD criteria include factors such as whether: 1) the study population was relevant to the clinical question being addressed; 2) there was a careful description of the population from which the patients were drawn, as well as actual inclusions and exclusions; 3) recruitment and the mode of sampling were carefully described; 4) researchers interpreting the non-invasive test were blinded to the reference test result; and 5) sufficient data were provided to complete a 2 × 2 table of true and false positive and negative diagnoses. Studies published only with an abstract provided insufficient data and were excluded [14].

Search strategy
We searched MEDLINE with the key word "FibroTest". We hand-searched key journals (Gastroenterology, Hepatology, Journal of Hepatology, Gut, Journal of Viral hepatitis and American Journal of Gastroenterology) from February 2001 to April 2007 to validate the search, as well as the abstract books of the American Association and European Association for the Study of Liver Disease annual meetings.

Inclusion and exclusion criteria
Two reviewers (a hepatologist and a hepatologist-statistician) independently assessed the papers with predetermined STARD criteria. Disagreements were resolved through discussion with a third reviewer. The decision as to inclusion or exclusion was not related to results.
We excluded all studies except those that: included patients with chronic liver diseases; stated that all patients had had the FT and liver biopsy; provided data for true positives and negatives, false positives and negatives and AUROCs for advanced fibrosis; stated that the FT had been assessed blind to the biopsy; and stated the method used for defining the degree of fibrosis. We were careful to avoid including data from duplicate publications.

Data extraction
To allow comparisons between causes of liver disease in the studies, we categorized them into 5 classes: patients with CHC, CHB, ALD, NAFLD and mixed causes.
We extracted the following, when possible, from the published studies and from the integrated database for the sensitivity analyses: study design (prospective or retrospective); analytical procedures (fresh serum and compliance with the recommended procedures or not); spectrum of disease (patients with elevated or normal transaminases); co-morbidity (several morbidities including HCV, HBV, alcohol consumption, HIV coinfection, presence/ absence of renal disease); and whether the study was performed by the FT inventor group (yes, no, mixed groups including inventor). Patient inclusion was never dependent on the result of the non-invasive test under investigation.

Statistical analysis Comparison of FT diagnostic values between different chronic liver diseases
The main end point was the FT value for the diagnosis of advanced fibrosis (bridging fibrosis or stages F2, F3, F4 according to the METAVIR scoring system [15][16][17]), as assessed by the area under the receiver operating characteristics curve (AUROC)].

Comparison of FT diagnostic values between adjacent stages
The main endpoint was the comparison of the FT AUROCs between adjacent stages: either between two adjacent intermediate stages F2/F1 or between two adjacent extreme stages F4/F3 and vs. F1/F0.

Statistical methods
A significance level of 5% was used as the alpha risk. Each estimate was given with its 95% confidence interval. Comparisons of the odds ratio and of percentages between strata were performed using their 95% confidence interval (95% CI). The primary analysis was per patient. In two studies patients were included twice, as they had FT and biopsy once before and once after the treatment; a sensitivity analysis was performed including and excluding these studies.
We used a random effects model for the primary metaanalysis to obtain a summary estimate for the AUROCs with a 95% CI of FT compared with liver biopsy.
The AUROC was used as a measure of discrimination, estimated using the empirical (non-parametric) method by DeLong et al. [18], and was compared using the paired method by Zhou et al. [19]. All analyses are performed on NCSS software (Kaysville, Utah, USA) [20].
Sensitivity analyses done by comparing AUROCs were planned for pre-specified items: study design (prospective or retrospective); analytical procedures (fresh serum or not); compliance with recommended analytical procedures (yes or no); spectrum of disease (patients with elevated or normal transaminases); year of study; comorbidity (several morbidities including HCV, HBV, alcohol consumption, HIV coinfection, overweight, diabetes, hyperlipidemia, renal disease); whether the study was performed by the FT inventor group (yes, no, mixed groups including inventor).
Meta-analysis was performed twice, once according to the absolute value of the observed AUROCs (ObAUROC) and once according to the AUROCs standardized for the spectrum of fibrosis stages (AdAUROC). We previously demonstrated that the AUROCs were highly related to the difference between the mean fibrosis stages in the advanced fibrosis and non advanced fibrosis groups (DANA); the AdAUROC is the AUROC adjusted for the difference of the observed DANA versus a standard DANA of 2.5 fibrosis METAVIR units (DANA = 2.5 if there was a uniform prevalence of 0.20 in each of the 5 stages); all the AUROCs were adjusted to a DANA of 2.5 using the formula: AdAUROC = ObAUROC + (0.1056) (2.5-ObDANA) [11,12].

Liver biopsies
The recruiting method of the sampling has been detailed in the previous publications. In the integrated database, liver biopsies were processed using standard techniques. A pathologist who was unaware of the biochemical markers evaluated the fibrosis stage and necrosis grade according to the METAVIR scoring system [15][16][17]. Fibrosis was staged on a scale of 0 -4: F0 no fibrosis, F1 portal fibrosis without septa, F2 few septa, F3 numerous septa without cirrhosis, and F4 cirrhosis. Biopsies were performed with a 16-gauge Hepafix Luer Lock needle (Braun Melsungen) in the Paris center and the Bordeaux center, and with various needles in the multicenter study from Marseille.aucs according to the prevalence of fibrosis stages
Individual data were available in 3,282 patients who constituted the integrated data base: 2,431 HCV, 322 HBV, 267 NAFLD and 262 ALD ( Table 2). Among the 3,282 patients included in the integrated database 875 patients belong to independent studies (27%), 1,431 to mixed (43%) and 976 (30%) to non-independent studies.

Comparison of FT diagnostic values between different chronic liver diseases
The mean of the observed AUROCs in published studies was 0.80 (95% CI, 0.78-0.82) ( Figure 2) and of the AdAUROCs was 0.84 (95% CI, 0.83-0.86) ( Figure 2). There was a significant heterogeneity between studies for the ObAUROCs (Cochran Q = 56; P = 0.001) but not for the AdAUROCS (Cochran Q = 26 P = 0.19). There was no significant difference between the ObAUROCs (Figure 1) or AdAUROCs ( Figure 3) in HCV patients compared to other liver diseases ( Table 2, and Table 3).
In the integrated data base, the mean FT ObAUROC was 0.79 (95% CI, 0.77-0.82) and the mean AdAUROC was 0.84 (0.82-0.86). There was no significant difference between AdAUROCs in HCV patients compared to other liver diseases. The only significant difference was a higher ObAUROC in ALD than in HCV (P = 0.001) ( Table 2).
Sensitivity analyses according to study characteristics are detailed in Table 3 for meta-analysis and in Table 4 for the integrated data base. There were no significant differences according to liver disease, baseline transaminases level, authors' independency, to the mean length of biopsy, the  (17) 43 (36) 28 (3) 14 (12) 15 (13)  interval serum-biopsy, and co-morbidity. Prospective studies, and studies following guidelines were associated with higher ObAUROCs but these differences were no more significant for AdAUROCs. In the integrated database fragmented biopsies were associated with higher ObAUROC but this difference was no more significant for AdAUROCs.

Comparison of FT diagnostic values between adjacent stages
The AUROC for the diagnosis of intermediate stages There were also no differences between adjacent stages when the AUROCs were compared for each chronic liver disease (Table 5).

Discussion
This meta-analysis demonstrated that the diagnostic value of FT was similar in the four most frequent chronic liver diseases. This meta-analysis also demonstrated that the diagnostic value of FT, as for liver biopsy, was similar between all the adjacent fibrosis stages but without a specific "gray zone" or "inaccurate zone" between intermediate stages. FT, like biopsy, has lower diagnostic value to discriminate between two adjacent stages than between two extreme stages [17].
The advantages of the present study are the large number of studies included, as well as the opportunity to analyze an integrated database, which included the individual characteristics of 3,282 (51%) patients out of 6,378 patients included in the published studies. This permitted to better take into-account the variability factors associated with FT diagnostic value.

Comparison of FT diagnostic values between different chronic liver diseases
One limitation of the study is that the number of studies and patients in non HCV related diseases is smaller than those in HCV. However we analyzed a total of eleven studies including 2,877 non HCV or mixed causes, and 851 non-HCV patients in the integrated data base. Another limitation was that there were few independent studies in other chronic liver diseases (1 for HBV, 1 for NAFLD and none for ALD). However two studies in HBV [4,34] and two studies in ALD [5,35] were mixed and three inde- pendent studies included HBV and ALD in their analyses [6,7,38] with same results than in non-independent studies (Table 1).
To compare as well as possible the FT diagnostic value according to liver diseases, we used the standardization of the AUROCs, and sensitivity analysis in both the metaanalysis and the integrated data base with individual data.
We applied the standardization of the observed AUROCs according to the spectrum of fibrosis stages among advanced and non advanced fibrosis. We recently demonstrated that this standardization is mandatory for any interpretation of AUROCs estimating the diagnostic value of a fibrosis marker [12]. For instance, this method allowed an adjustment to be made in the ObAUROCs of FT according to the cause of liver disease, which had significant difference in fibrosis stage spectrum. The significant difference observed between ALD and HCV ObAUROCs disappeared after adjustment (Table 2). In HBV studies patients had lower DANA than in studies of ALD patients. After standardization, the difference between AUROCs was reduced by two (0.77 versus 0.88 before and 0.80 versus 0.86 after standardization) ( Table  2).
These data are also in accordance with the similarities of advanced fibrosis stages among chronic hepatitis C and B, NAFLD and ALD. Despite differences in the dynamics of fibrosis progression [40] and the initial fibrosis stages, the bridging stages are very similar including cirrhosis and were estimated in the same way by the METAVIR scoring system for advanced fibrosis [40,41]. Fibrosis stages and pathogenetic mechanisms are very similar in NAFLD and ALD [42]. Repeated FT improved similar to fibrosis as estimated by repeated biopsies during treatment for HCV [22,23], HBV [4,34] and NAFLD [43]. The components of the FT had similar modifications according to fibrosis stages for these four chronic liver diseases [1,3,5,8].
The sensitivity analyses did not reveal any significant differences between AdAUROCs according to all the other characteristics analyzed (Table 3 and Table 4). Significant differences or the absence of differences between ObAU-ROCs could be due to confounding factors. A demonstrative illustration is the artificially higher ObAUROCs for fragmented versus non-fragmented biopsies. Because of a higher prevalence of cirrhosis in patients with fragmented biopsies, the DANA was higher than in patients with nonfragmented biopsies [11]. This difference was no longer seen after adjustment [ Table 4).

Comparison of FT diagnostic values between adjacent stages
There is still a controversy among experts concerning the FT diagnostic value for "intermediate fibrosis stages". For panel biomarkers including FT, Gebo et al. stated that "One of the major limitations may be in the lack of reliable identification and classification of the intermediate stages of fibrosis" [9]. Bissell also stated that for panels including FT "Their accuracy for intermediate fibrosis is relatively poor." [44]. Rockey and Bissell stated that "Decision-making requires a test that differentiates minimal disease [stage 0/1 fibrosis) from intermediate fibrosis [stage 2/3). For this purpose, the current generation of non-invasive tests falls short, and liver biopsy still is needed for definitive staging" [45]. These statements are not evidence-based. Figure 2 Meta-analysis of the observed area under the ROC curves (AUROC) assessed in published studies of Fibrotest diagnostic value. AUROCs were all significantly higher for Fibrotest than the random 0.50 value (upper panel) (P < 0.001). There was no significant difference between the different liver diseases.

Meta-analysis of the observed area under the ROC curves (AUROC) assessed in published studies of Fibrotest diagnostic value
The first error is stating that "liver biopsy is still needed for definitive staging of intermediate stages". The entire liver is certainly the gold standard but the liver biopsy is an imperfect gold standard. The present overview of the 25 studies giving biopsy length, all performed in tertiary centers, observed among 5,404 patients that the median of mean biopsy length was 18 mm. For the two larger studies including more than 500 patients (total 1,428) the median was 14 mm and in the integrated data base the mean was 17 mm out of 3,282. In our tertiary center a prospective study observed in 1,769 patients that biopsy was greater than 25 mm in only 16% (280/1769) of patients [46].
A liver biopsy of 15 mm has an AUROC of 0.82 between F1 and F2, being around 20% of false positives or false negatives [17]. Therefore FT with an AUROC of 0.66 (usually described as a weak value when using a true gold   [1] and with this database (data not showed).
The third error is assessing the diagnostic value of a biomarker in a subpopulation of patients defined by liver biopsy such as F2/F3 vs. F0/F1. The exclusion of F4 patients defined by a 15 mm biopsy will not exclude the risk of false positives or false negatives of the remaining non-F4. It is much more important to assess the spectrum of fibrosis stage among the F0/F1 and F2/F3; if the AUROCs are not standardized according to the DANA, the ObAUROCs will be misleading [12]. This once again underlines that assessing the AUROCs between all adjacent stages remains the best way, knowing that for the "perfect" biomarker, the best possible achievable AUROC is 0.82 for a 15 mm biopsy.
There are also different methodological approaches for the overview of fibrosis markers. Parkes et al. arbitrarily defined an "inaccurate" zone of a marker when it "cannot reliably attribute result for tests as tests perform with lower sensitivities/specificities at thresholds, where positive predictive value < 90%, negative predictive value > 95%" [47]. There is no rationale for choosing these thresholds, but this definition could be acceptable if a true gold standard existed. This is not the case for fibrosis markers. If this definition is applied to 15 mm liver biopsies, the biopsy will be inaccurate in 40% of cases for a diagnosis between F1 and F2.
The only significant difference identified using AUROCs between adjacent stages (Table 5) was for HBV versus ALD. The obAUROC for ALD was particularly high and this should be validated in population with greater sample size.
High risk profile "The observed high risk profile of FT in the published studies (4.1%) and in the integrated database (1.9%) were concordant with the post marketing analyses finding (2.1%) in 32,527 consecutive tests [2,46]. In these analyzes there were 272 cases (0.8%) with a high-risk profile of false positives, for which the other components were not concordant in favor of significant fibrosis. Patients with extremely low haptoglobin, particularly when the rest of the exams were hardly modified, could have had hemolysis. A high-risk profile of false positives due to possible Gilbert syndrome was observed in 409 (1.3%) cases.
In the presence of acute inflammation (i.e., sepsis or acute hemolysis), FT analysis must be postponed [2]."

Conclusion
This study suggests that FT could be used as an alternative to liver biopsy in the four more common chronic liver diseases: HCV, HBV, NAFLD and ALD. Neither biomarkers nor biopsy are sufficient alone to take definitive decision in a given patient and all the clinical and biological data must be taken into account.  However, due to the dramatically insufficient risk-benefit ratio of biopsy (coefficient variation 40%, 0.3% severe adverse events and 3/10,000 mortality), it is surprising that many leaders and associations in the field of hepatology still recommend liver biopsy as the first line investigation for millions of people exposed to the risk of fibrosis. This study reinforced our previous conclusion [48] that, based on current evidence, a wise recommendation would be a moratorium on liver biopsy as a first line procedure while awaiting studies demonstrating its cost-utility versus that of biomarkers. Biopsy as a second line estimate of liver injury should still be indicated for intricate diseases or clinicobiological discordances.
Practices are evolving rapidly and in France a nationwide survey recently found that among 546 hepatologists, 81% used non-invasive biomarker (FibroTest-ActiTest) and 32% used elastography, with a dramatic decrease in the use of liver biopsy for more than 50% of patients with chronic hepatitis C, and with a subsequent increase in the number of patients treated [49]. FibroTest is available in more than 50 countries [50] and the cost varies (from 100 to 300 euros per country) according to the price of the five components and the price of algorithms. In France the cost of the components was covered by social security since 2002 and the algorithms reimbursement has been approved in December 2006 [51].
A recent overview by French health authorities officially approved non invasive biomarkers FibroTest and elastography (Fibroscan) as first line estimates of fibrosis in patients with chronic hepatitis C, recommended reimbursement by social security and approved liver biopsy only as second line estimate in case of discordance or non interpretability of non invasive markers. An updated overview is pending for other chronic liver diseases at the end of 2007 [50].