Validation of serrated polyps (SPs) in Swedish pathology registers

Background Little is known about the natural history of serrated polyps (SPs), partly due to the lack of large-scale epidemiologic data. In this study, we examined the validity of SP identification according to SNOMED (Systematised Nomenclature of Medicine) codes and free text from colorectal histopathology reports. Methods Through the ESPRESSO (Epidemiology Strengthened by histoPathology Reports in Sweden) study, we retrieved data on SPs from all pathology departments in Sweden in 2015–2017 by using SNOMED codes and free-text search in colorectal histopathology reports. Randomly selected individuals with a histopathology report of SPs were validated against patient charts using a structured, retrospective review. Results SPs were confirmed in 101/106 individuals with a histopathology report of SPs, yielding a positive predictive value (PPV) of 95% (95%CI = 89–98%). By year of diagnosis, the PPV was 89% (95%CI = 69–97%), 96% (95%CI = 81–99%) and 97% (95%CI = 89–99%) for individuals diagnosed before 2001 (n = 19), between 2001 and 2010 (n = 26) and after 2010 (n = 61), respectively. According to search method, the PPV for individuals identified by SNOMED codes was 100% (95%CI = 93–100%), and 93% (95%CI = 86–97%) using free-text search. Recorded location (colon vs. rectum) was correct in 94% of all SP histopathology reports (95%CI = 84–98%) identified by SNOMED codes. Individuals with SPs were classified into hyperplastic polyps (n = 34; 32%), traditional serrated adenomas (n = 3; 3%), sessile serrated adenomas/polyps (SSA/Ps) (n = 70; 66%), unspecified SPs (n = 3, 3%), and false positive SPs (n = 5, 5%). For individuals identified by SNOMED codes, SSA/Ps were confirmed in 49/52 individuals, resulting in a PPV of 94% (95%CI: 84–98%). In total, 57% had ≥2 polyps (1: n = 44, 2–3: n = 33 and ≥ 4: n = 27). Some 46% of SPs (n = 71) originated from the proximal colon and 24% were ≥ 10 mm in size (n = 37). Heredity for colorectal cancer, intestinal polyposis syndromes, or both was reported in seven individuals (7%). Common comorbidities included diverticulosis (n = 45, 42%), colorectal cancer (n = 19, 18%), and inflammatory bowel disease (n = 10, 9%). Conclusion Colorectal histopathology reports are a reliable data source to identify individuals with SPs.

Some data suggest that cancers evolving through the serrated pathway may account for up to 15-30% of all CRC cases, and that they are significantly overrepresented in interval cancers [6], i.e. CRC occurring before the next recommended screening after an initially negative finding. Even though the adenoma-carcinoma pathway still accounts for the majority of the CRC burden, a recent study comparing the risk of CRC development found that the increased risk of CRC in individuals with SPs is similar or higher than that seen in individuals with conventional adenomas [7].
Little is known about the natural history of SP, which may in part be due to the lack of availability of largescale data. Through the ESPRESSO (Epidemiology Strengthened by histoPathology) study [8], we contacted all pathology departments (n = 28) in Sweden to construct a cohort of individuals with an SP diagnosis according to computerised histopathology reports. We then retrieved patient charts from 106 randomly selected individuals with a record of SP. The primary purpose of this study was to validate SP diagnosis according to computerised histopathology reports against patient chart data. A secondary aim was to describe the characteristics of individuals with SPs.

Methods
We validated SP diagnosis based on computerised histopathology reports in a random subset of individuals through a structured, retrospective review of histopathology reports and patient charts.

Study population
The ESPRESSO study consists of gastrointestinal histopathology reports from 2.2 million unique individuals with a total of 6.1 million separate data entries. Some 53.9% of individuals had been biopsied more than once. Data on gastrointestinal histopathology reports were collected between October 12, 2015 and April 15, 2017 from all pathology departments in Sweden (n = 28). Overall we had data on 1,618,953 colon biopsies and 771,511 rectal biopsies [8]. Through the unique personal identity number [9] assigned to all Swedish residents, histopathology data were linked to the Swedish national health registers (Patient Register [10], Cause of Death Register [11], Cancer Register [12], Medical Birth Register [13], Prescribed Drug Register [14], The LISA database with socioeconomic data [15], as well as the Total Population Register [16]). Details about ESPRESSO and registry linkage have been described previously [8].
For the current study on SPs, we included individuals with a colorectal biopsy (topography codes: T67-68) with the following Systematised Nomenclature of Medicine (SNOMED) codes: M82160, M8216, M82130, M8213. We also included individuals with a colorectal biopsy of which the histopathology report free text listed "serrated polyp" (Swedish "sågtand(ad)").

Study sample
Power calculation using EpiTools [17] indicated a minimum of 139 individuals were needed to obtain a positive predictive value (PPV) for SP of 90% with a 95% confidence interval (95%CI) range of 85-95% (using an alpha of 0.05 and a beta of 0.20). For this validation, we requested patient charts from a random sample of 160 individuals with a histopathology report of SPs from five Swedish counties. We were able to retrieve patient chart data from 126 individuals, out of which 106 had sufficient information for our validation (Fig. 1).

Case definition
We defined a true SP as having a consistent histopathology report and a patient chart supporting an SP diagnosis. Individuals with an SP diagnosis could have one or multiple SPs. Assessment of histopathology reports and patient charts was executed by the principal author (SRB). Uncertain cases were discussed with JFL and MS. If no consensus was reached, the case was considered inconsistent with SP.

Data elements
Data from patient charts were extracted using a standardised form, similar to the form used by Svensson et al. in their validation of microscopic colitis [18]. The starting point of data extraction was set to 2 years before the diagnosis until March 2018. The data from the patient charts mainly included patient history, laboratory data, referral letters and endoscopy and histopathology reports. Individuals were excluded in the absence of a histopathology report or insufficient/incomplete data.

Statistics
The main outcome of this study was the PPV for SP diagnosis in the 106 individuals with patient charts containing sufficient data. To identify any potential differences, results were stratified according to search method (SNOMED codes or free-text search). Given the changing nomenclature of SPs over time, we also analysed the data by year of diagnosis. For individuals identified by SNOMED codes, we validated the SP location by comparing the topography code with the patient chart. SNOMED codes were also used to identify SSA/Ps, for which a separate PPV was calculated. We estimated 95%CIs with the Wilson score interval [19] using Epi-Tools [20].
In addition to retrieving colonoscopy and histopathology reports, we collected data on sex, age, year of diagnosis, smoking, obesity, comorbidity, diagnostic tools and indication for endoscopy. For evaluation of anaemia, we used 132 g/L for men and 122 g/L for women as the lower limits of normal haemoglobin concentration as proposed by Beutler and Waalen [21]. The size of polyp characteristics was determined as either larger or smaller than 10 mm as this size has been proposed as the threshold for determining the future management of SPs [22]. Other aspects investigated were number of polyps (0, 1, 2-3 or ≥ 4), location (proximal, distal, rectal) and grade of dysplasia (none, low, high). The proximal colon was defined as the ileocecal valve until the splenic flexure, followed by the distal colon until the last 10 cm of the gastrointestinal tract that represent the rectum.
For the descriptive analysis, we calculated the population and polyp characteristics according to SP subgroups. To reflect the previous version of the WHO recommendations on SP classification, SPs described as serrated adenomas (SAs) or mixed polyps with a serrated component were deemed consistent with SSA/P [23]. Nonetheless, data were also analysed separately for these polyp subgroups. Data on false positive SPs were also presented separately.

Data
The charts of 106 individuals were retrieved from pathology centres distributed in five counties in Sweden: Dalarna, Norrbotten, Skaraborg, Stockholm and Örebro.

Positive predictive value (PPV)
SPs were confirmed in 101/106 individuals, yielding a PPV of 95% (95%CI = 89-98%) ( Table 1). Of the five individuals with false positive SPs, one had SP ruled out by the pathologist. The other four individuals had SPs mentioned in the histopathology report but sufficient evidence to confirm the diagnosis was lacking. No false positive case was found among individuals identified by SNOMED codes (n = 52), resulting in a PPV of 100% (95%CI = 93-100%). For individuals identified by freetext search of histopathology reports (n = 76), the PPV was 93% (95%CI = 86-97). Out of these, 22 individuals also had a SNOMED code.

Demographics and risk factors
Of the 106 validated individuals, 50 were female (47%) and the median age at diagnosis was 70 years (Table 2). Most SP cases were diagnosed through colonoscopy (n = 86, 81%), with smaller proportions diagnosed through partial lower endoscopy (sigmoidoscopy, rectoscopy or proctoscopy, n = 15, 14%) or hemicolectomy (n = 5, 5%). The data were stratified as follows: HP (n = 34, 32%), TSA (n = 3, 3%), SSA/P (n = 70, 66%), unspecified SP (n = 3, 3%), and false positive SP (n = 5, 5%). The SSA/P subgroup also included polyps described as serrated adenomas (n = 51) and mixed polyps (n = 12). Because some individuals had polyps of different subtypes (n = 9), the sum of individuals in the subgroups exceeds the total number of individuals reviewed. Notably, the HP subgroup was diagnosed earlier than the SPs overall (median year: 2003 vs. 2012) and there were no polyps specifically described as an SSA/P or TSA before 2011. Polyps described specifically as serrated adenomas were reported as early as 2002. Otherwise, population characteristics were similar in the different SP subgroups. At diagnosis, 16 (15%) individuals were current smokers, whereas 14 (13%) had a record of earlier smoking (Table 3). Obesity (body mass index, BMI ≥30 or indication of obesity in the patient chart) was seen in 12 individuals (11%). Heredity for CRC, intestinal polyposis syndromes, or both was reported in seven individuals (7%). Common comorbidities consisted of diverticulosis (n = 45, 42%), conventional adenomas (n = 33, 31%), CRC (n = 19, 18%) and inflammatory bowel disease (IBD) (n = 10, 9%). Comorbidities were defined as having a diagnosis prior to or in conjunction with a diagnosis of SP, except for conventional adenomas for which prior diagnoses were not considered.

Discussion
Our study found a high PPV (95%, 95%CI: 89-98%) for SPs according to colorectal histopathology reports based on SNOMED codes and free-text searches. The high PPV was similar over time. This finding suggests that histopathology reports are a reliable source to identify individuals with SPs. The PPV of this study is comparable with that of other gastrointestinal diagnoses based on histopathology: celiac disease (PPV 95%) and microscopic colitis (PPV 95%) [18,24]. The high specificity for SPs is not surprising given that the assignment of the SNOMED code and free-text diagnosis is already based on histopathological evaluation.
As to search method, the use of SNOMED codes to identify individuals with SPs had a higher specificity than the use of free-text search (PPV: 100% vs. 93%), but still the PPV using free text is consistent with the accuracy of having a physician-assigned diagnosis in the Swedish Patient Register (95%CI PPV = 85-95%) [10]. Furthermore, the high PPV of SSA/P among individuals identified through SNOMED codes (94%, 95%CI: 84-98%) indicates that an exclusive use of SNOMED codes can serve to target these polyps specifically. For individuals identified by SNOMED codes, the corresponding topography codes can also be used to determine the location of the SPs and SSA/Ps (PPV: 94%; 95%CI = 84-98%). The cases of incorrect topography codes exclusively concerned individuals with a rectal topography code (T68), which were classified as distal (sigmoidal) according to Weight loss 2 (6%) 0 (0%) 0 (0%) 6 (12%) 1 (8%) 7 (10%) 1 (33%) 0 (0%) 9 (8%) our validation. This discrepancy occurred because we mainly used endoscopy reports to determine the macroscopic location of the polyps, whereas topography codes are assigned by the pathologist and sometimes based on histological appearance. Subsequent to the recognition of the different SP subgroups, several studies have investigated their respective prevalence. HPs have consistently been shown to be the most common subtype, representing 70-90% of all SPs [25][26][27]. Likewise, SSA/Ps have been shown to represent up to 10-25% of all SPs while TSAs represent about 1% [25][26][27][28][29]. In our study, we primarily targeted SSA/Ps. As such, we did not include SNOMED codes for HPs. Consequently, the proportion of HPs in our cohort does not reflect the overall proportion among SPs, as HPs are likely to have been included when they have been described as "serrated" in the histopathology report. As a result, most individuals with HPs have been identified by free-text searches (n = 31, 91%).
Given the evolving nomenclature of SPs, a large number of polyps in our study were described following the previous version of the WHO classification of colorectal polyps published in 2000 [23]. This version recognised HPs separately and SAs as a subtype under adenomas. Within the SA subtype, there was no differentiation between SSA/Ps and TSAs. As such, polyps described as serrated adenomas can represent any of these two. However, given the predominate prevalence of SSA/Ps, it is reasonable to assume that the number of TSAs described as serrated adenomas is small. It is also reassuring to note that the specific SP descriptions correlated well with the publication year of the different WHO classifications, i.e. polyps described as serrated adenomas began to appear after 2000 and polyps described as SSA/ Ps or TSAs were found only after 2010.
The individuals in our study were equally distributed in terms of sex (female: 47%). However, the mean age of the cohort was 70 (range: 35-93) years, which is slightly  High 0 (0%) 0 (0%) 1 (9%) 3 (5%) 1 (8%) 5 (6%) 0 (0%) 0 (0%) 5 (3%) higher than that found in previous studies [3,26,28,30]. The age difference may, to some extent, be explained by the high proportion of dysplastic SPs in our study (n = 75, 48%). Heredity, smoking and obesity have all been established as risk factors for SP, with smoking being more strongly linked to SSA/Ps than to the other subgroups [31][32][33][34]. In this study mention of risk factors in the patient chart was regarded as indicative of that risk factor, while, for instance, an individual in which smoking was not mentioned in the patient chart was regarded as a nonsmoker. Thus, the prevalence of some risk factors may have been underestimated. For instance, only 11% of our individuals had a record of obesity compared with 16% in the general Swedish population despite evidence showing that obesity is a risk factor for SP [31,32,35]. Several studies have established low detection as a significant challenge in SP research, and endoscopy screening seems less effective for detecting proximal CRC, which is believed to originate predominantly from the serrated pathway [6,25,[36][37][38][39]. Moreover, HPs are considered less likely to bleed compared with adenomas, and SSA/Ps lack some genetic markers currently used in DNA faecal tests, decreasing the sensitivity of faecal tests for SPs.
In our study 15 (14%) individuals had a positive FOBT prior to endoscopy and 58 (55%) unique individuals had at least one sign of gastrointestinal bleeding (FOBT, haematochezia/melena or anaemia). To some extent the high percentage of individuals with SPs and signs of gastrointestinal bleeding can be explained by the simultaneous presence of adenomas (n = 18, 31%), as well as the overrepresentation of SPs other than HPs. However, we cannot exclude that bleeding-prone SPs are overrepresented in our cohort.
Most individuals with SPs underwent endoscopy due to clinical symptoms (n = 64, 60%). In addition, regardless of endoscopy indication, we found that 78 individuals (74%) had at least one symptom (which includes positive FOBT) at the time of diagnosis, including one individual with a false positive SP. Of note, false positive cases more often presented with clinical symptoms as an indication for endoscopy (80% vs. 60%).
In a notable proportion of HPs (n = 43, 73%), grade of dysplasia was not specified. The reason for this is probably that HPs are normally defined as non-dysplastic. Thus, any specification of dysplasia by the pathologist would therefore be redundant considering that it is already implied by the HP diagnosis [40]. As such, the proportion of HPs without dysplasia should be interpreted as 97% (59/61) instead of 26% (16/61). Among the polyps classified as SAs, the vast majority exhibited low-grade dysplasia (n = 45, 80%) and there were only three polyps (5%) with no dysplasia. This observation reinforces the idea that polyps described as SAs are consistent with SSA/Ps, or possibly TSAs, as HPs are typically non-dysplastic [40]. More specifically, consistent with the literature on SSA/P location, we believe that proximal SAs will almost exclusively consist of SSA/Ps. However, SAs located in the rectum are likely to include a small number of TSAs.
The literature has shown that only about 15% of SSA/ Ps have any dysplastic features, implying that SSA/Ps with dysplasia are overrepresented in our study [28]. We cannot rule out that a few SSA/Ps with no dysplasia may have been misclassified as HPs given the established difficulty of distinguishing SSA/Ps from large proximal HPs [41]. Yet, it is also possible that SSA/Ps without dysplasia may have been overlooked and left undetected to a larger extent than SSA/Ps with dysplasia.

Strengths and limitations
The main strength of our study is the random selection of individuals with SPs from a nationwide histopathology cohort. Using a standardised form, we were able to examine not only the PPV for a histopathology report with SP but also describe Swedish individuals with SPs for clinical characteristics and risk factors. Our results are consistent with similar studies for which the gold standard of diagnosis is biopsy, further reinforcing the reliability of the present results.
A limitation of our study includes the lack of reexaminations of actual biopsies. The ethics review board allowed us to collect digital data but not actual tissue samples. Instead, the validation was based on re-evaluation of patient charts that included, among other things, histopathology and endoscopy reports. The quality of the patient chart data varied, especially in the documentation of risk factors and symptoms. Still, given that SP is a strictly histopathological diagnosis, the difference in data availability among the individuals should not have affected the validation in that all individuals had to have the corresponding histopathology report available to be included in the study.
Earlier studies have shown inter-observer variability for classification of SPs among pathologists [42,43], and we cannot rule out some misclassification, especially for the subgroup classification. This could potentially affect the validity of SSA/P since some of the SSA/Ps may have been misdiagnosed as HPs, and vice versa [43]. The diversity of pathologists in this study, where some may not specialize in SPs, may have decreased the accuracy in polyp classification.

Conclusion
In conclusion, this study suggests that colorectal histopathology reports are a reliable data source to identify individuals with SPs.