Modeling the invisible stages of colorectal cancer—from theory to clinical strategy
Colorectal cancer (CRC) remains the third most common cancer worldwide and a leading cause of cancer-related death (1). Although existing population screening programmes, primarily based on faecal immunochemical testing (FIT) and colonoscopy, reduce incidence and mortality, there are still opportunities to personalise, optimise and innovate (2,3).
Advances in mathematical modelling, molecular understanding and surveillance analytics are enabling a shift from one-size-fits-all screening to precision prevention, whereby screening intensity, modality and intervals are tailored to individual risk profiles while maintaining equity and cost-effectiveness.
The process by which a single mutated cell develops into invasive CRC is long, complex and, until recently, largely unobservable (4,5). This editorial focuses on three recent studies that approach this problem from different angles, combining advanced modelling techniques, biological realism, and practical surveillance needs. Together, they provide a blueprint for incorporating mechanistic models into clinical and public health decision-making processes.
Modelling techniques: from abstract stages to mechanistic pathways
Paterson et al. (6) deviate from traditional multistage models by constructing a stochastic, genomically informed framework for CRC initiation. Rather than imposing a fixed mutation order, their model uses a stochastic mathematical simulation to model CRC initiation via mutations in three driver genes: adenomatous-polyposis-coli (APC), tumor protein p53 (TP53) and Kirsten rat sarcoma virus (KRAS). The authors integrate genomic data and experimental mutation rates to simulate various evolutionary trajectories, thus avoiding the need to assume a strict mutation order. Their key finding is that mutation order is determined more by the fitness advantages conferred than by mutation probabilities. When calibrated with realistic proliferation rates and crypt dynamics, the model recapitulates observed CRC incidence rates. It also quantifies the probability of malignancy, the size of premalignant lesions and the timing of driver acquisition, highlighting the limited immune suppression of untreated lesions. This work offers a quantitative evolutionary framework for early CRC development.
Simonetto et al. (7) take a different approach and apply shape-specific multistage clonal expansion (MSCE) models to large-scale colonoscopy data. Their key innovation is the incorporation of tumour morphology—sessile, pedunculated or flat—into growth dynamics, supported by cell-based stochastic models. This adds a new morphological parameter to risk estimation, bridging the gap between abstract statistical compartments and observable lesion features. Based on over 50,000 screening colonoscopy records from Bavaria, the authors demonstrate that adenoma shape significantly influences growth dynamics and cancer transition risk. For example, sessile adenomas have twice the malignancy risk at the same size compared to flat or pedunculated types. A 1 cm sessile lesion may require more aggressive follow-up than a larger pedunculated polyp. Their MSCE model also indicates that growth primarily occurs within two-dimensional (2D) crypt-like structures and that mutation/division rates decline with age. A simplified hazard function for interval cancers offers mechanistic insight into detection efficacy. The study highlights the importance of incorporating shape into risk stratification and screening protocols.
Akwiwu et al. (8) also address the long-standing statistical problem: how to estimate unobservable transitions [non-advanced adenoma → advanced adenoma (AA) → CRC] when precursor lesions are routinely removed. Their progressive three states model uses a weighted likelihood to correct for outcome-dependent sampling and interval censoring. By explicitly modelling both the natural history process and the observation process, they produce unbiased, efficient parameter estimates that cannot be matched by standard approaches. When applied to a Norwegian cohort of 1,495 individuals, the model estimated 5-year CRC risks post-AA onset at ~17%, with these risks increasing with age. Simulation studies demonstrate improved efficiency and bias reduction compared to naïve or unweighted methods. This framework is pivotal for optimizing surveillance intensity and resource allocation in CRC prevention.
The MSCE models employed by Simonetto et al. (7) and Paterson et al. (6) are probabilistic frameworks that depict carcinogenesis as a series of mutations alongside the stochastic birth-death dynamics of premalignant clones. These models are central tools in mathematical oncology and quantitative cancer risk assessment. They bridge the gap between biology (e.g., mutations and cell kinetics) and epidemiology (e.g., incidence data), and they explain the age-dependence of cancer risk better than purely statistical models. They can also incorporate exposure timing (e.g., smoking duration versus intensity). However, parameters may not be uniquely identifiable from incidence data alone, and real biology may not follow neat sequential steps—parallel and branched pathways exist. MSCE models require strong assumptions about clonal independence and constant rates (9).
Clinical implications: from mutations to morphology
Mechanistic models offer more than theoretical elegance—they transform how we evaluate individual risk.
Paterson et al. (6) calculate the probability of a colorectal crypt collecting three driver events (APC, KRAS and TP53). Their insight supports the targeted monitoring of lesions carrying specific advantageous mutations, regardless of sequence.
Simonetto et al. (7) demonstrate that sessile adenomas possess approximately twice the malignant potential of similarly sized flat or pedunculated lesions. This changes the way we assess lesions clinically: morphology is not just descriptive, but also prognostic.
Akwiwu et al. (8) provide quantified transition probabilities—a 17% 5-year CRC risk post-AA onset—that can be stratified by age. These probabilities enable data-driven scheduling of colonoscopy intervals and help clinicians balance risk against the burden of the procedure. They found 5-year risks of around 13% and 34% for individuals who had non-AA or AA removed, respectively. The 10-year CRC hazard after AA onset was estimated at 1.77. Their results support current recommendations for post-polypectomy colonoscopy surveillance.
Surveillance strategies: towards precision prevention
The methodological advances across these studies converge on a single aim: precision in surveillance. Paterson et al.’s (6) initiation model can serve as a baseline risk generator, predicting incidence curves under varying biological assumptions. When coupled with the weighted-likelihood progression model of Akwiwu et al. (8), we can build end-to-end simulations of the progression from normal tissue to adenoma to carcinoma, incorporating both biological factors and observational bias. Simonetto et al. (7) introduce the morphological dimension, enabling surveillance programs to stratify patients by size-shape-risk profiles rather than size alone. This could optimize the allocation of colonoscopies: high-risk morphologies could be recalled sooner, while low-risk shapes could follow extended intervals.
More broadly, there is a shift from one-size-fits-all screening to dynamically adjusted surveillance. Mechanistic models could be integrated into electronic health systems to update patient risk in real time as new endoscopic, genetic or histological data become available. CRC prevention driven by data.
The challenge ahead lies in translating these frameworks into operational tools that clinicians and patients can trust and use, ultimately closing the gap between invisible disease evolution and visible intervention resulting in a proposal for improved CRC screening.
A second challenge for the three models is how to incorporate the expanding range of clinically available and emerging tools (cell-free DNA testing, proteomic-based assays, or other improvements that broaden the screening landscape). This editorial is too short in order to give a sufficient answer to these challenges. But, one proposal is to treat the biomarker as an observation on a latent natural-history process that turns the natural-history simulator into a hidden Markov model (HMM)/state-space model layered on the MSCE generator and one may make FIT or colonoscopy observations as additional “competing” observation channels. The MSCE model will be modified regarding initiation rates as well as clonal net growth, malignant conversion and other relevant parameters possibly modified by polygenetic risk scores/lifestyle/family history.
Potential to improve CRC screening
The core innovations in CRC screening include risk stratification, an advanced model to inform policy and the integration of cost-effective innovative technologies.
Risk stratification considers demographics, genetics, family history, lifestyle, comorbidities, and adenoma characteristics (size, histology, and shape), as well as quantitative FIT values. Additionally, dynamically adjusting screening intervals would be helpful. All three studies provide innovative input. The first study extends risk dimensions to include adenoma shape, while the second study involves dynamic adjustment.
The three studies also provide advanced modelling to inform policy. Paterson et al. (6) and Akwiwu et al. (8) improve natural history and progression models. They offer multistate frameworks and handle interval censoring and outcome-dependent sampling. Simonetto et al. (7) provide a biology-informed stochastic model that incorporates adenoma growth patterns and shape-specific cancer risks. All three studies introduce innovative simulation engines to test screening thresholds, colonoscopy capacity and cost-effectiveness prior to implementation.
Clinical pathway design: the results of these studies inform post-polypectomy schedules by examining size, shape, histology and age profiles and by defining early recall for high-risk morphologies (e.g., 1 cm sessile lesions). They support dynamic surveillance schedules, especially allowing extended intervals for low-risk profiles. Future version of these models may incorporate additional genetic, molecular, environmental and family-based information. However, they do not contribute to initial testing, but provide important input for health economic models for CRC surveillance, facilitating research into optimizing surveillance schedule.
There are complementary models informing when to start CRC screening. They consider family history, lifestyle, high-penetrance genetic variants, or low-penetrance polygenic profiles to identify high risk groups for early CRC, motivating an early start of the screening and potentially reducing CRC incidence and mortality through early detection (10,11).
Consequences for data infrastructure, operational and workflow readiness, ethical, legal, and social issues (ELSI), quality assurance and monitoring
To improve model-based predictions, individual-level longitudinal tracking is required from invitation to outcome data. This makes it challenging to establish privacy-by-design registries with interoperable data formats [Fast Healthcare Interoperability Resources (FHIR), Systematized Nomenclature of Medicine (SNOMED), International Classification of Diseases for Oncology (ICD-O)]. A secure data lake or curated datasets could supply an analytics platform with embedded statistical models for real-time risk recalculation. Such a system requires governance through dedicated data stewardship and audit mechanisms. It is operated by trained endoscopists, nurse endoscopists, pathology staff, and data scientists who specialize in recognizing sessile/flat lesions, performing the less risky cold snare technique, and interpreting data with the help of artificial intelligence (AI). Standardization would benefit this system: uniform reporting, histology classification, and pathology turnaround times. Appropriate informed consent procedures should be in place, with a focus on underserved groups. The data collected while the system is running should be well protected by role-based access, data-sharing agreements and privacy impact assessments. The system should be run under a well-designed quality management system that monitors uptake and coverage, test performance, endoscopy quality, appropriate outcome metrics and resource efficiency. Such national data infrastructures are not yet available.
Akwiwu et al. (8) conducted research on a Norwegian adenoma cohort (12). Simonetto et al. (7) used data from a quality study performed in Bavaria, Germany, over a limited period [2006–2009] (13). Paterson et al. (6) parameterized the model using experimentally measured parameters.
Implementing a roadmap for innovative CRC screening and its strategic benefits
The implementation could have three phases. Phase 1 concerns design and governance: leadership should be appointed, governance boards established and KPIs defined. During this phase, the registry and modelling infrastructure would be established. Phase 2 would involve establishing a pilot and defining an adaptive rollout. Stepped-wedge or cluster-randomized designs could be used to compare standard versus risk-stratified strategies. Furthermore, pre-specification of adaptation rules would take place (e.g., adjusting the FIT threshold if positivity exceeds capacity). In Phase 3, the system could be scaled up and optimized. A national or regional scale-up could be installed with continuous quality improvement. Emerging tests and AI capabilities could be integrated as validated.
Such a program offers many strategic benefits. For example, it could improve the detection rate, enabling the earlier identification of high-risk lesions, especially sessile serrated adenomas. It could optimize the use of resources by allocating colonoscopies more effectively and reducing unnecessary procedures. It could also offer equity gains, with tailored outreach reducing disparities in CRC outcomes. It could also provide scientific leadership, with a data-rich environment supporting ongoing research, AI validation and public health innovation.
Actual CRC screening guidelines differ globally in scope and implementation. The United States promotes screening from age 45–75 years with multiple options [FIT, stool DNA, colonoscopy, computed tomography (CT) colonography] and allows rapid integration of Food and Drug Administration (FDA)-approved innovations such as blood-based and stool RNA tests. In contrast, Europe, the UK, Australia, and Canada operate organized, population-based programs centered on biennial FIT (ages 50–74 years) with colonoscopy follow-up, emphasizing equity, quality assurance, and cost-effectiveness. Innovation in these systems requires pilot evaluation and health technology assessment (HTA) before adoption.
Translating novel CRC screening approaches into clinical and policy frameworks therefore depends on system architecture: the U.S. favors market-driven diversification, while other regions prioritize public-health feasibility and program performance. Effective roadmaps must align innovation with infrastructure, participation targets, and capacity planning.
Conclusions
The three studies discussed could influence the future of CRC prevention, where mathematical rigour meets biological realism and clinical pragmatism. They demonstrate that improved models are not merely academic exercises, but can directly inform our screening decisions, frequency, and urgency. The challenge ahead is to translate these frameworks into operational tools that clinicians can trust and use, ultimately closing the gap between invisible disease evolution and visible intervention.
Implementing an innovative CRC screening program requires synergy between advanced modelling, high-quality clinical pathways, robust data systems and equity-focused operations. The three research approaches—genomic-evolutionary modelling, bias-adjusted surveillance analytics and morphology-based risk assessment—provide the methodological backbone for precision prevention strategies that can be implemented today.
In recent years, the convergence of genomic data, epidemiological surveillance and advanced mathematical modelling has shed light on the hidden stages of CRC development and has the potential to transform CRC screening. Taken together, the three studies offer a compelling vision of how we might better anticipate, monitor and intercept this common malignancy.
Together, these papers represent a shift from descriptive to mechanistic modelling of colorectal tumorigenesis. They do not merely fit data—they explain it. This progression is not just academic; it paves the way for predictive models that are biologically grounded, epidemiologically validated and clinically actionable.
As we advance towards precision prevention in CRC, integrating genomic, morphological, and statistical insights is essential, not optional. These studies exemplify how such integration can shed light on the invisible stages of cancer development, offering new hope for timely, targeted and efficient intervention.
Acknowledgments
None.
Footnote
Provenance and Peer Review: This article was commissioned by the Editorial Office, Annals of Cancer Epidemiology. The article has undergone external peer review.
Peer Review File: Available at https://ace.amegroups.com/article/view/10.21037/ace-2025-9/prf
Funding: None.
Conflicts of Interest: The author has completed the ICMJE uniform disclosure form (available at https://ace.amegroups.com/article/view/10.21037/ace-2025-9/coif). The author has no conflicts of interest to declare.
Ethical Statement: The author is accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Morgan E, Arnold M, Gini A, et al. Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from GLOBOCAN. Gut 2023;72:338-44. [Crossref] [PubMed]
- Montminy EM, Jang A, Conner M, et al. Screening for Colorectal Cancer. Med Clin North Am 2020;104:1023-36. [Crossref] [PubMed]
- Winawer SJ, Fletcher RH, Miller L, et al. Colorectal cancer screening: clinical guidelines and rationale. Gastroenterology 1997;112:594-642. [Crossref] [PubMed]
- Leslie A, Carey FA, Pratt NR, et al. The colorectal adenoma-carcinoma sequence. Br J Surg 2002;89:845-60. [Crossref] [PubMed]
- Risio M. The natural history of adenomas. Best Pract Res Clin Gastroenterol 2010;24:271-80. [Crossref] [PubMed]
- Paterson C, Clevers H, Bozic I. Mathematical model of colorectal cancer initiation. Proc Natl Acad Sci U S A 2020;117:20681-8. [Crossref] [PubMed]
- Simonetto C, Mansmann U, Kaiser JC. Shape-specific characterization of colorectal adenoma growth and transition to cancer with stochastic cell-based models. PLoS Comput Biol 2023;19:e1010831. [Crossref] [PubMed]
- Akwiwu EU, Klausch T, Jodal HC, et al. Fitting a progressive 3-state colorectal cancer model to interval-censored surveillance data under outcome-dependent sampling using a weighted likelihood approach. Am J Epidemiol 2025;194:1764-75. [Crossref] [PubMed]
- Brouwer AF, Meza R, Eisenberg MC. Parameter estimation for multistage clonal expansion models from cancer incidence data: A practical identifiability analysis. PLoS Comput Biol 2017;13:e1005431. [Crossref] [PubMed]
- Zheng Y, Hua X, Win AK, et al. A New Comprehensive Colorectal Cancer Risk Prediction Model Incorporating Family History, Personal Characteristics, and Environmental Factors. Cancer Epidemiol Biomarkers Prev 2020;29:549-57. [Crossref] [PubMed]
- Chen X, Li H, Guo F, et al. Alcohol consumption, polygenic risk score, and early- and late-onset colorectal cancer risk. EClinicalMedicine 2022;49:101460. [Crossref] [PubMed]
- Løberg M, Kalager M, Holme Ø, et al. Long-term colorectal-cancer mortality after adenoma removal. N Engl J Med 2014;371:799-807. [Crossref] [PubMed]
- Mansmann U, Crispin A, Henschel V, et al. Epidemiology and quality control of 245 000 outpatient colonoscopies. Dtsch Arztebl Int 2008;105:434-40. [Crossref] [PubMed]
Cite this article as: Mansmann U. Modeling the invisible stages of colorectal cancer—from theory to clinical strategy. Ann Cancer Epidemiol 2025;9:5.

