Data extraction and criteria
Data for gastric cancer listed in phase II/III were collected on December 1, 2021 by querying ClinicalTrials.gov. The search term included “Stomach Cancer”, “Stomach Neoplasms”, “Gastric Cancer”, “Gastric Neoplasms”, “Gastric Carcinoma”, “Stomach Carcinoma”, “Gastroesophageal Junction Cancer”. Some clinical trials may evaluate gastric cancer along with other types of cancers, we included them in our dataset.
According to the trial registration requirements of United States Food and Drug Administration (FDA) and International Committee of Medical Journal Editors (ICMJE), trials that started prior to January1, 2007 were excluded. We also excluded trials starting after 12/01/2020 to allow trials to leave at least 12 months for participants enrolling before the analysis. Trials at the status of “not yet recruiting”, “suspended”, “withdrawn” or “unknown” were ignored for unclear actual termination status. Trials that failed to provide an anticipated accrual number or a start date were also excluded from the dataset. Following these criteria, 567 trials were finally included in the dataset, and all clinical trials information and characteristic were downloaded and extracted from ClinicalTrials.gov XML files (Fig. 1).
Data cleaning and trial characteristic classification
Clinical trial statuses of “recruiting”, “enrolling by invitation”, “active, not recruiting” were categorized as “active”. There is an optional text box enable trial managers to describe trial termination reason on ClinicalTrials.gov website since February 2007. Based on the information provided by this, two authors (JY and YS) separately classified them into the following nine reasons: safety reason, efficacy reason, ethical reason, trial no longer needed, business/sponsor reason, recruitment failure, logistic reason, PI left and no reason given. If two authors assign a trial into different categorizes, then a discussion will be made to determine which one is primary. Terminated trials attributed to factors including safety reason and efficacy reason were considered being terminated for good reasons, while those caused by factors like business/sponsor reason, ethical reasons, trials no longer needed, recruitment failure, logistic reason, PI left and no reason given were considered being terminated for bad reasons. Trials terminated for good reasons were hereby defined as a substantive outcome in the characteristic analysis of descriptive trials.
A trial involving more than one recruiting centre in its clinical site record was categorized it as a multi-centre trial; otherwise, a single-centre one. A trial involving multiple recruiting centres located in more than one country was labelled it as a multi-country clinical trial; otherwise, a single-country one. Besides, trials listed as phase I/II in the record were considered as phase II trials, while those listed as phase II/III in the record were phase III trials. It was found that the intervention type of some trials was mislabeled in the dataset. In this case, each trial was hereby reviewed by clinical experts and categorized according to its primary intervention type. In this study, each trial’s anticipated accrual number roughly followed a normal distribution, but some outliers in the trial anticipated accrual number were observed, as some phase II studies would enroll less than 10 participants, while some phase III studies would enroll more than 5,000 participants. To this end, the top 1% and bottom 1% of the data were winsorized. Since the distribution of anticipated accrual number is highly skewed, we log transformed this variable. Besides, the trial duration was calculated from the actual start date to the actual completion date or the termination date. For active studies, the trial duration was calculated from the actual start date to the date when the files were downloaded (Dec 1, 2021). Again, as extreme large phase III trials would consume significantly longer time than other trials, the top 1% of data were winsorized.
Statistical analysis
The median anticipated trial accrual and trial duration were compared across studies using Kruskal-Wallis tests, and categorized variables like intervention type, phase, sponsor type were compared using chi-square tests.
Stata version 15 SE (Texas, USA) was used for comparing the risk factors and likelihood of clinical failures. Each trial’s duration was considered the survival time in our model, and characteristics in this study included phase, start year, treatment, sponsor type, single-centre versus multi-centre, single-country vs. multi-country, duration, anticipate accrual and status. Besides, logistic regression models were used to estimate how trial characteristics are correlated with clinical trial failure, and the cumulative Kaplan-Meier survival curve was adopted for the estimation of the failure risk of trials in different times.