- Research article
- Open Access
- Open Peer Review
Evidence of surgical outcomes fluctuates over time: results from a cumulative meta-analysis of laparoscopic versus open appendectomy for acute appendicitis
BMC Gastroenterologyvolume 16, Article number: 37 (2016)
In surgical trials, complex variables such as equipment development and surgeons’ learning curve are involved. The evidence obtained in these trials can thus fluctuate over time. We explored the stability of the evidence obtained during surgery by conducting a cumulative meta-analysis of randomized controlled trials for open and laparoscopic appendectomy.
We conducted a cumulative meta-analysis of randomized controlled trials comparing laparoscopic appendectomy with open appendectomy for acute appendicitis, a topic with the greatest number of trials in the gastroenterological surgical field. We searched the MEDLINE (PubMed), EMBASE, and CINAHL databases up to September 2014 and reviewed the bibliographies. Outcomes were the incidence of intra-abdominal abscess, incidence of wound infection, operative time, and length of hospital stay. We used the 95 % confidence interval (95 % CI) of effect size for the significance test.
Sixty-four trials were included in this analysis. Of the 51 trials addressing intra-abdominal abscesses, our cumulative meta-analysis of trials published up to and including 2001 demonstrated statistical significance in favor of open appendectomy (cumulative odds ratio [OR] 2.35, 95 % CI 1.30–4.25). The effect size in favor of open procedures began to disappear after 2001, leading to an insignificant result with an overall cumulative OR of 1.32 (95 % CI 0.84–2.10) when laparoscopic appendectomy was compared with open appendectomy.
The evidence regarding treatment effectiveness changed over time, after treatment effectiveness became significant in trials comparing laparoscopic and open appendectomy. Observing only the 95 % confidence interval of effect size from a meta-analysis may not provide conclusive results.
Meta-analyses of randomized controlled trials (RCTs), which combine the evidence presented in individual research reports, are expected to produce the highest level of evidence and have accordingly become increasingly important in health care . The concept of a cumulative meta-analysis, which is reanalyzed each time the results of a new trial are published, was introduced by Lau et al. in 1992 . This technique was designed to enable determinations of both clinical efficacy and harm as well as the tracking of trials and planning of future trials .
The 1992 study by Lau et al. indicated that two very large clinical trials on the efficacy of streptokinase for acute myocardial infarction [4, 5] may have been unnecessary because, according to their cumulative meta-analysis, the treatment efficacy was already statistically significant before those two trials were conducted. Later, cumulative meta-analyses of other topics demonstrated that statistically significant results in meta-analyses can later disappear, especially when well-powered and well-designed trials with sufficient numbers of outcomes and patients appear. In the surgical field, it is quite possible that once a surgical intervention is established, evidence regarding its effectiveness can change over time because of the complexity of surgical trials, which involve advances in surgical equipment and techniques, progress in surgeons’ learning curves as they develop novel skills, and variations in postoperative management, among other factors .
To identify changes in the evidence obtained in surgical trials over time, we selected trials comparing the clinical effectiveness of laparoscopic appendectomy and open appendectomy for acute appendicitis. We considered this topic suitable for the observation of chronological trends because, to the best of our knowledge, this topic is associated with the highest number of RCTs in the gastroenterological surgical field . In light of the existing meta-analyses on this topic, including a Cochrane review [8–10], our purpose was to identify any changes in the evidence over time rather than the superiority of one procedure over the other.
We asked the following clinical question: might the evidence demonstrated by a meta-analysis of RCTs of surgical procedures change over time? To answer this question, we conducted a cumulative meta-analysis of RCTs that had compared laparoscopic appendectomy with open appendectomy.
Herein we conducted a cumulative meta-analysis of RCTs to ascertain chronological trends in the comparison of laparoscopic appendectomy and open appendectomy for acute appendicitis. We used the cumulative meta-analysis technique introduced by Lau et al. in 1992 . In a cumulative meta-analysis, studies are added one at a time according to their date of publication, and the results are summarized as each new study is added.
We systemically searched the MEDLINE (PubMed), EMBASE, and CINAHL databases for articles in all languages that described RCTs published between 1991, when laparoscopic appendectomy was initiated, and September 2014. In MEDLINE, we utilized the CRD/Cochrane Highly Sensitive Search Strategy  with the search terms “appendectomy” and “appendicitis.” We performed the EMBASE search strategy to optimize sensitivity and specificity  with the terms “appendectomy” or “appendicitis.” We searched the CINAHL database using a strategy in which terms with the best optimization of sensitivity and specificity  were combined with “appendectomy” or “appendicitis.” Reference lists of the review articles and previously published meta-analyses were searched by hand. The search was last done on December 18, 2014.
Selection criteria for studies in this review
Our inclusion criteria were as follows: (1) prospective RCTs, (2) studies comparing laparoscopic surgery and open surgery for acute appendicitis, (3) studies with human adult participants, and (4) studies written in any language. We excluded studies with any of the following characteristics: (1) pediatric participants, (2) comparisons of diagnostic efficacy, and (3) assessment of the effectiveness of variations of standard laparoscopic techniques, such as the single trocar technique versus the standard technique.
Outcomes included the incidence of intra-abdominal abscess, the incidence of wound infection, the operative time, and the length of hospital stay. We adopted these four outcome measures because these are most frequently measured in RCTs addressing this topic.
Assessment of study quality
We assessed the risk of bias with respect to adequate sequence generation, allocation concealment, blinding, incomplete outcome data addressed, and selective reporting. Two authors (TU and HT) assessed the studies that met the inclusion criteria (Table 1).
Binary data were extracted for the incidences of intra-abdominal abscess and wound infection, and continuous data were extracted for the operative time and length of hospital stay. Two authors (TU and HT) independently undertook this process, and disagreements were resolved through discussion. Participants who were converted intraoperatively from laparoscopic appendectomy to open appendectomy were included in the laparoscopic appendectomy arm on an intention-to-treat basis.
A meta-analysis was performed using Review Manager (RevMan) software, version 5.3.5, provided by the Cochrane Collaboration, Copenhagen, Denmark. Since a cumulative meta-analysis cannot be performed by RevMan, we used Comprehensive Meta Analysis software, version 3.3.070 (Biostat, Englewood NJ, USA). For the binary variables (i.e., the incidence of intra-abdominal abscess and wound infection), the statistical analyses were performed using the odds ratio (OR) of laparoscopic appendectomy to open appendectomy as the summary statistic. The OR point estimate was considered significant at the p < 0.05 level when the 95 % confidence interval (95 % CI) did not include the value 1. For the continuous variables (i.e., the operative time and the length of hospital stay), the statistical analyses were performed using the mean difference (MD) as the summary statistic, i.e., the time taken for an laparoscopic appendectomy subtracted by the time taken for an open appendectomy. The MD point estimate was considered significant at the p < 0.05 level if the 95 % CI did not include the value 0.
We used both the fixed-effects model and the random-effects model according to the Mantel-Haenszel method  for the statistical analysis. The fixed-effects model assumes the homogeneity of the true treatment effect, whereas the random-effects model accepts between-study differences in the treatment effects. The confidence interval thus tends to be wider in the random-effects model when a certain level of treatment effect heterogeneity is observed. We performed both the fixed-effects model and random-effects model, and if their results were similar, the random-effects model was adopted.
We also tested for study homogeneity by calculating I 2. This value can be calculated as I 2 = 100 % × (Q ‐ df)/Q, where Q is Cochran’s heterogeneity statistic and df is the degree of freedom . An outcome with no events was considered a “zero cell” in the 2 × 2 table. Although correction is needed to pool the ORs of studies that include zero cells, this can influence the results and possibly introduce bias . To conduct a bias-free meta-analysis, we used the Mantel-Haenszel model with a correction factor of 0.5 (0.5 was added to all cells in the 2 × 2 table when there was a zero cell).
Our database and bibliography searches yielded 1,438 and 150 articles. After eliminating duplicate articles, we evaluated the titles and abstracts of these studies according to the inclusion and exclusion criteria, after which 95 articles remained (Fig. 1). After the full texts of these articles were read and the ineligible studies were excluded, 64 RCTs published from 1992 to 2012 were used for the data extraction (Table 1).
This outcome analysis included 51 relevant studies with a total of 6,512 participants (3,273 for laparoscopic appendectomy and 3,239 for open appendectomy) (Fig. 2). The total numbers of events were 61 in the laparoscopic appendectomy group (1.80 %) and 43 in the open appendectomy group (1.30 %). The overall OR was 1.34 (95 % CI 0.92–1.94) in the fixed-effects model and 1.32 (95 % CI 0.84–2.10) in the random-effects model. The overall I 2 was 6 %. A visual inspection of the funnel plot for small-study effects did not show asymmetry (Fig. 3). A cumulative meta-analysis demonstrated that the CI narrowed until it identified the first significant difference in favor of open appendectomy in the trial published in 2001 (OR 2.35, 95 % CI 1.30–4.25). However, as more studies were added, the CI shifted to the left in favor of laparoscopic appendectomy. Finally, the CI included the value 1 in 2010, and there was no significant difference (Fig. 4).
Sixty studies and 7,462 participants (3,736 for laparoscopic appendectomy and 3,726 for open appendectomy) were included for this outcome (Fig. 5). The total numbers of events were 123 in the laparoscopic appendectomy group (3.29 %) and 290 in the open appendectomy group (7.78 %). The overall OR was 0.41 (95 % CI 0.33–0.51) in the fixed-effects model and 0.47 (95 % CI 0.38–0.59) in the random-effects model. The overall I 2 was 0 %. A visual inspection of the funnel plot showed slight asymmetry in small studies (Fig. 6). A cumulative meta-analysis showed that the significant difference was first observed in the seventh study in 1995, and that this trend did not change substantially with subsequent studies (Fig. 7).
There were 43 studies with a total of 4,202 participants (2,135 for laparoscopic appendectomy and 2,067 for open appendectomy) that compared the operative time between laparoscopic appendectomy and open appendectomy (Fig. 4). The average operative times were 57.3 min for laparoscopic appendectomy and 47.0 min for open appendectomy. The overall MD was 4.4 min longer for laparoscopic appendectomy (95 % CI 3.5–5.3) in the fixed-effects model and 10.1 min (95 % CI 5.9–14.3) in the random-effects model. The I 2 value was 95 %. Significant heterogeneity was found, and thus a cumulative meta-analysis was not performed.
Length of hospital stay
Thirty-nine studies with a total of 4,240 participants (2,165 for laparoscopic appendectomy and 2,153 for open appendectomy) were included for this outcome (Fig. 5). The average length of hospital stay was 3.21 days for laparoscopic appendectomy and 4.40 days for open appendectomy. The MD of the length of hospital stay was 1.08 days shorter for laparoscopic appendectomy (95 % CI 1.01–1.75) in the fixed-effects model and 1.12 days (95 % CI 0.77–1.47) for the random-effects model. The I 2 value was 95 %. Because significant heterogeneity was found, a cumulative meta-analysis was not performed.
Our cumulative meta-analysis of the comparison of laparoscopic appendectomy and open appendectomy for acute appendicitis demonstrated that the evidence provided by the meta-analysis of surgical RCTs can change over time. Intra-abdominal abscesses were significantly more frequent in the laparoscopic appendectomy group during the period from 2001 to 2009, but this significance disappeared as more trials accumulated. Our present findings visually demonstrated how evidence changes over time in the surgical field. Although other outcome measures did not exhibit the same transition as intra-abdominal abscess, all of the outcome measures demonstrated similar trends in favor of laparoscopic appendectomy.
Fluctuation of evidence
When there is evidence concerning the effectiveness of a medical intervention, one can reasonably conclude that no further research is needed on the topic. However, previous studies have shown that the results of meta-analyses are underused, and many RCTs are conducted even after significant evidence has been demonstrated through a meta-analysis [2, 17]. Some researchers have contended that it is unethical and a waste of resources to randomize participants in unnecessary trials, and they emphasized the importance of avoiding redundant trials. In the meta-analysis from a Cochrane review published in 2010 , intra-abdominal abscess was significantly more frequent in laparoscopic appendectomy than open appendectomy (albeit with moderate heterogeneity), but the present study illustrates that a significant result turned insignificant. The findings of our study thus provide an example in which large intervention effects are not always conclusive, and they fluctuate over time. Fluctuation was not found in the wound infection outcome data, and the result was consistently in favor of laparoscopic appendectomy. Penninga et al. have shown strong evidence of favoring laparoscopic appendectomy for wound infection using a trial sequential analysis , and our result is consistent with this.
The nature of surgical trials
Evidential instability can be explained by the nature of surgical interventions, which are highly complex and difficult to evaluate . Surgical interventions involve many factors, including the surgeons’ skill and judgment, the skills of the treating team, the development of surgical devices, and pre- and post-surgical management. All of these factors change on a daily basis. Second, the effect of the learning curve influences outcomes . For example, surgeons’ performances improve to the point of acquiring expertise as they gain training and experience. The observed shift toward favoring laparoscopic appendectomy, which we observed in later trials, might be explained by the effects of these factors. Because of this phenomenon, surgical trials may differ from pharmaceutical trials, as in the latter, theoretically efficacy does not change over time. To minimize these factors, trial designs that consider the effects of the learning curve or perioperative management should be used [20, 21].
The shift favoring conservative treatment in the early to middle period
Although we observed a shift favoring laparoscopic appendectomy in later trials in our analysis, an apparent shift in the opposite direction was observed in the early to middle period after good results were obtained for laparoscopic appendectomy in very early trials. Early trials tend to overestimate treatment effects for a variety of reasons, such as the under-reporting of disappointing results or the selection of favorable subgroups [22–24]. However, as new interventions are disseminated and the study participant inclusion criteria are broadened, positive results become less extreme. Relevant examples can be found elsewhere [25, 26]. We assume that the results favoring open appendectomy in the early-middle period are another example of this phenomenon.
This study has several limitations. First, we observed the change from statistically significant to insignificant findings using the 95 % CI of odds ratios. There are other methods to analyze the results of meta-analyses chronologically, such as the trial sequential analysis (TSA) which takes into account random errors due to repetitive meta-analyses [27, 28]. We performed a TSA for the intra-abdominal abscess outcome, and it did not show significance throughout; i.e., the required information size was not reached and the Z-curve did not cross the trial sequential monitoring boundaries. Therefore, we cannot dismiss the possibility of random errors which brought the statistical significance in the analysis. This strengthens the importance of not relying only on the 95 % CI of the effect size and of performing a TSA to conclude the comparison.
Second, this analysis can be prone to publication bias. It is likely that trials with contradicting results would be published more often than those confirming the existing evidence. Small study effects could also have existed in our analysis. A visual inspection of the funnel plot showed slight asymmetry in small studies (Fig. 6).
Third, we did not conduct an analysis of the operative time or the length of hospital stay due to considerable heterogeneity, and a small study effect could have a substantial impact on the heterogeneity.
Fourth, studies with a high risk of bias could skew the results. We conducted subgroup analyses for trials with low and high risks of bias, and the results showed that the intra-abdominal abscess outcome after laparoscopic appendectomy was considerably more frequent among the studies with a low risk of bias compared to those with a high risk of bias, although a cumulative meta-analysis showed a similar trend toward favoring laparoscopic appendectomy [see the Additional file 1].
Fifth, the rarity of intra-abdominal abscesses may have complicated the analysis. The small number of events can increase the uncertainty. Since there are many trials with zero-events in intra-abdominal abscess, we substituted the correction factor of 0.5 to 0.01 as a sensitivity analysis to test for robustness . The cumulative meta-analysis showed that statistical significance was first observed in the trial published in 2001 and it disappeared as more results accumulated, with the overall OR of 1.24 (95 % CI 0.84–1.81). The results were similar between both correction factors.
Sixth, the number of studies published each year was unbalanced. We included more trials during the time period from 1996 to 2001, which might have biased the results. Nevertheless, considering that the shift from surgeons favoring open appendectomy to those favoring laparoscopic appendectomy occurred after 2002, new findings might have been more evident if more trials had been published after 2001.
Finally, although we report herein an example demonstrating evidential instability in the surgical field, we cannot generalize this observation to all surgical interventions or other fields. Although the numbers of RCTs addressing other surgical topics are generally low , more evidence should be accumulated for other topics to understand the stability of evidence by means of not only cumulative meta-analyses, but also TSAs.
Our cumulative meta-analysis of RCTs comparing laparoscopic and open appendectomy demonstrated that evidence can fluctuate over time in surgery with complex variables. Observing only the 95 % confidence interval of the effect size from meta-analyses may not provide conclusive results. More stringent analyses should be used to assess the results of meta-analyses.
randomized controlled trials
trial sequential analysis
Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. BMJ. 2009;339:b2700. doi:10.1136/bmj.b2700.
Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC. Cumulative meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med. 1992;327(4):248–54.
Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC. A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts. Treatments for myocardial infarction. JAMA. 1992;268:240–8.
Effectiveness of intravenous thrombolytic treatment in acute myocardial infarction. Gruppo Italiano per lo Studio della Streptochinasi nell’Infarto Miocardico (GISSI). Lancet. 1986;1:397–402.
Randomised trial of intravenous streptokinase, oral aspirin, both, or neither among 17,187 cases of suspected acute myocardial infarction: ISIS-2. ISIS-2 (Second International Study of Infarct Survival) Collaborative Group. Lancet. 1988;2:349–360.
Ergina PL, Cook JA, Blazeby JM, Boutron I, Clavien P-A, Reeves BC, Seiler CM, Altman DG, Aronson JK, Barkun JS, et al. Challenges in evaluating surgical innovation. Lancet. 2009;374(9695):1097–104.
Shikata S, Nakayama T, Noguchi Y, Taji Y, Yamagishi H. Comparison of effects in randomized controlled trials with observational studies in digestive surgery. Ann Surg. 2006;244(5):668–76.
Li X, Zhang J, Sang L, Zhang W, Chu Z, Li X, Liu Y. Laparoscopic versus conventional appendectomy— a meta-analysis of randomized controlled trials. BMC Gastroenterol. 2010;10:129.
Ohtani H, Tamamori Y, Arimoto Y, Nishiguchi Y, Maeda K, Hirakawa K. Meta-analysis of the results of randomized controlled trials that compared laparoscopic and open surgery for acute appendicitis. J Gastrointest Surg. 2012;16:1929–39.
Sauerland S, Jaschinski T, Neugebauer EA. Laparoscopic versus open surgery for suspected appendicitis. Cochrane Database Syst Rev. 2010;10:CD001546. doi:10.1002/14651858.CD001546.pub3.
Glanville JM, Lefebvre C, Miles JNV, Camosso-Stefinovic J. How to identify randomized controlled trials in MEDLINE: ten years on. J Med Libr Assoc. 2006;94:130–6.
Wong SS, Wilczynski NL, Haynes RB. Developing optimal search strategies for detecting clinically sound treatment studies in EMBASE. J Med Libr Assoc. 2006;94(1):41–7.
Wong SS, Wilczynski NL, Haynes RB. Optimal CINAHL search strategies for identifying therapy studies and review articles. J Nurs Scholarsh. 2006;38(2):194–9.
DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7:177–88.
Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557–60.
Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Stat Med. 2004;23:1351–75.
Habre C, Tramèr MR, Pöpping DM, Elia N. Ability of a meta-analysis to prevent redundant research: systematic review of studies on pain from propofol injection. BMJ. 2014;348:g5219. doi: http://dx.doi.org/10.1136/bmj.g5219.
Penninga L, Gluud C, Wetterslev J. Meta-analysis of randomised trials on laparoscopic versus open surgery for acute appendicitis: has firm evidence been reached? J Gastrointest Surg. 2014;18(7):1383–4.
Ramsay CR, Grant AM, Wallace SA, Garthwaite PH, Monk AF, Russell IT. Assessment of the learning curve in health technologies. A systematic review. Health Technol Assess. 2001;5(12):1–79.
McCulloch P, Altman DG, Campbell WB, Flum DR, Glasziou P, Marshall JC, Nicholl J, Aronson JK, Barkun JS, Blazeby JM, et al. No surgical innovation without evaluation: the IDEAL recommendations. Lancet. 2009;374:1105–12.
Harrysson IJ, Cook J, Sirimanna P, Feldman LS, Darzi A, Aggarwal R. Systematic review of learning curves for minimally invasive abdominal surgery: a review of the methodology of data collection, depiction of outcomes, and statistical analysis. Ann Surg. 2014;260(1):37–45.
Chan A-W, Song F, Vickers A, Jefferson T, Dickersin K, Gøtzsche PC, Krumholz HM, Ghersi D, van der Worp HB. Increasing value and reducing waste: addressing inaccessible research. Lancet. 2014;383:257–66.
Pereira TV, Ioannidis JPA. Statistically significant meta-analyses of clinical trials have modest credibility and inflated effects. J Clin Epidemiol. 2011;64:1060–9.
Pereira TV, Horwitz RI, Ioannidis JP. Empirical evaluation of very large treatment effects of medical interventions. JAMA. 2012;308(16):1676–84.
Klein JB, Jacobs RH, Reinecke MA. Cognitive-behavioral therapy for adolescent depression: a meta-analytic investigation of changes in effect-size estimates. J Am Acad Child Adolesc Psychiatry. 2007;46(11):1403–13.
Clarke M, Brice A, Chalmers I. Accumulating research: a systematic account of how cumulative meta-analyses would have provided knowledge, improved health, reduced harm and saved resources. PLoS One. 2014;9:e102670.
Roberts I, Ker K, Edwards P, Beecher D, Manno D, Sydenham E. The knowledge system underpinning healthcare is not fit for purpose and must change. BMJ. 2015;350:h2463.
Wetterslev J, Thorlund K, Brok J, Gluud C. Trial sequential analysis may establish when firm evidence is reached in cumulative meta-analysis. J Clin Epidemiol. 2008;61(1):64–75.
Keus F, Wetterslev J, Gluud C, Gooszen HG, van Laarhoven CJ. Robustness assessments are needed to reduce bias in meta-analyses that include zero-event randomized trials. Am J Gastroenterol. 2009;104(3):546–51. doi:10.1038/ajg.2008.22.
Barkun JS, Aronson JK, Feldman LS, Maddern GJ, Strasberg SM, Balliol C, Altman DG, Barkun JS, Blazeby JM, Boutron IC, et al. Evaluation and stages of surgical innovations. Lancet. 2009;374(9695):1089–96.
The authors declare that they have no competing interests.
TU performed the systemic literature searches, extracted data from the original trials, conducted the statistical analyses and prepared the manuscript. SS conceived the study idea, designed the study and corrected the draft of the paper. HT performed systematic literature searches and extracted data from original trials. LD performed systematic literature searches and critically revised the manuscript. YN was responsible for the statistical analyses and critical revision of the manuscript. TN designed the study and critically revised the manuscript. YT was a guarantor and critically revised the manuscript. All authors read and approved the final manuscript.
Pooled odds ratio in intra-abdominal abscess for trials comparing laparoscopic appendcetomy and open appendectomy among studies with low risk of bias. Cummulative odds ratio in intra-abdominal abscess comparing laparoscopic appendectomy and open appendectomy among studies with low risk of bias. Pooled odds ratio in intra-abdominal abscess for trials comparing laparoscopic appendcetomy and open appendectomy among studies with high risk of bias. Cummulative odds ratio in intra-abdominal abscess comparing laparoscopic appendectomy and open appendectomy among studies with high risk of bias. (ZIP 1902 kb)