Implications of outcome misclassification in risk effect modeling in cancer population studies
Highlight box
Key findings
• Under non-differential misclassification, bias of cancer risk model is worse at lower cancer prevalence settings (1.95%). At lower prevalence, bias is mainly driven by specificity, whereas at higher prevalence, bias is influenced by specificity and sensitivity. With differential misclassification, bias can either underestimate or overestimate effect of binary covariate.
What is known and what is new?
• It is important to minimize misclassification rates when examining general disease outcomes.
• This study quantifies the potentially large effects of outcome misclassification on cancer risk modeling.
What is the implication, and what should change now?
• Validating binary cancer outcomes is necessary to minimize false misclassifications rates. Electronic health records may be beneficial in validating these outcomes.
Introduction
Accurate ascertainment of cancer outcomes is essential for developing reliable cancer risk models. To achieve this, it is important to minimize cancer outcome misclassification, which can occur in two ways: false negatives and false positives. The false negative rate [1 − sensitivity (Se)] is the probability that an individual with cancer is incorrectly classified as cancer-free. Similarly, the false positive rate [1 − specificity (Sp)] is the probability that a cancer-free individual is incorrectly classified as having cancer.
There are two common methods for cancer ascertainment: self-reporting through questionnaires and linkage to cancer registries. Self-reporting often leads to both false positives and false negatives due to inaccuracies in patient-reported data. In contrast, cancer registry linkage is generally considered to be a more accurate and cost-effective approach (1). However, this method is not without its limitations. For instance, missing personal identification information, such as a patient’s social security number, can result in a false positive linkage (2). In addition, cancer registries in the U.S. are state-specific with no routine data exchange mechanisms across states, and investigators may need to link their study population to multiple cancer registries. If a study participant moves outside the registry coverage area, any incident cancers afterwards may be missed, leading to false negative cases (3). To mitigate these issues, medical record validation can be employed to confirm the cancer cases identified by either ascertainment method, significantly reducing the false positive rate. However, since medical record validation is rarely performed for individuals identified as cancer-free, it does not effectively reduce the false negative rate.
When the Se and Sp depend on a covariate in the risk model, the misclassification is called “differential”; otherwise, it is called “non-differential”. A large body of literature addresses differential outcome misclassification (4,5) and non-differential outcome misclassification (6,7) from a general disease perspective. However, few have focused on the context of cancer outcomes, which has several unique features. First, most cancer outcomes are rare, and the impact of misclassifying rare outcomes may differ substantially from diseases with more frequent outcomes (8). Second, cancer outcome ascertainment methods typically have high Sp but relatively low Se. While it is uncommon for these methods to find non-existent cancer cases, it is common for them to miss actual cancer cases, such as due to incomplete registry coverage. Third, cancer outcome misclassifications are often associated with key demographic variables, such as race, ethnicity, age, and gender, which poses a threat to the validity of cancer disparity research. For example, Raza et al. showed that the cancer diagnoses in women were under-reported more frequently in cancer registries compared to men, suggesting that the Se of cancer outcome assessment is related to gender (9). In another example, Randall et al. determined linkage errors may be more common for younger study participants or those living in remote locations, and thus misclassification may be related to age or residential area (10). These studies suggest misclassification may be differential with respect to certain sociodemographic variables.
The purpose of this article is to highlight the effects of outcome misclassification on risk modeling for practicing cancer epidemiologists, and to recommend strategies to minimize the misclassification in data collection. Specifically, we aim to explore the ramifications of false-positive and false-negative rates in cancer ascertainment on relative-risk estimation. We focus on logistic regression modeling within both cohort and case-control study designs. Through extensive simulation studies and analytic calculations, we examine the potential differences in bias when misclassification is differential and non-differential.
Methods
We designed simulation frameworks to evaluate the performance of a risk model when the binary cancer outcome is misclassified. Two independent covariates were simulated: X1 from a binary distribution with probability 0.5 and X2 from standard normal distribution. For example, X1 could represent a sociodemographic variable such as socioeconomic status, where socioeconomic status is dichotomized into two categories: 1 denoting the most disadvantaged and 0 denoting the remaining population (10). Additionally, X2 could represent standardized age. The true outcome Y (e.g., cancer diagnosis within the first five years of the study) was generated from a logistic regression:
where , , and β0 takes the values from −5 to −3.5. The intercept parameter was varied to control the cancer prevalence between 1.95% and 7.5%. These prevalences represented typical cancer outcome frequencies in epidemiologic studies. The low prevalence of 1.95% mimics rarer cancers over 5–10 years, while the high prevalence of 7.5% captures moderately common cancers over longer follow-up. This range allows us to examine how misclassification impacts bias across realistic scenarios. We considered simulations under two scenarios: one where the Se and Sp did not vary by race or age (non-differential misclassification), and another where these parameters differed by racial group (differential misclassification).
In the first scenario of non-differential misclassification, the observed outcome Y* was generated as a misclassified version of Y, with Se ranging from 80% to 100% and Sp ranging from 98% to 100%. For example, with a sample size 50,000 and prevalence of 1.95%, 80% Se roughly corresponds to 200 false negatives, and 98% Sp corresponds to about 1,000 false positives.
In the second scenario of differential misclassification, we considered four cases where Se/Sp depend on the binary covariate X1:
- Se is 75.7% for the subjects with X1=0 and 90% with X1=1. Sp is 100% for all the subjects.
- Se is 90% for the subjects with X1=0 and 75.7% with X1=1. Sp is 100% for all the subjects.
- Sp is 99% for the subjects with X1=0 and 100% with X1=1. Se is 100% for all the subjects.
- Sp is 100% for the subjects with X1=0 and 99% with X1=1. Se is 100% for all the subjects.
The misclassification parameters were chosen to facilitate comparisons between the differential and non-differential cases. For example, the parameters in the first two cases were chosen so that the overall Se for the differential case was 80%, aligning with a setting presented for the non-differential scenario. Similarly, the parameters in the last two cases correspond to the non-differential scenario of a Sp equal to 99.5% and a Se equal to 100%.
Our simulations focus on a prospective cohort design, where a binary outcome is generated for every participant. Given that cancer outcomes are often rare, a case-control study is a practical design to study etiological effects on cancer risks. We will present findings for the case-control design in the supplementary materials (see Tables S1-S6 for misclassification results and Figures S7-S10 for coverage rates).
Each simulation generated 50,000 individuals, representing a cohort study dataset. The large sample size was used to illustrate bias patterns that are driven by Se and Sp, but not by finite sample bias (i.e., risk estimates for logistic regression are biased for rare diseases with low prevalence) . The case-control design then drew a sample of 1,000 individuals based on Y*, including 500 individuals each with Y*=0 and Y*=1. Logistic regression models were then fitted to the full cohort of 50,000 individuals and the case-control samples of 1,000 individuals, under different settings of disease prevalence, Se and Sp of Y*. This process was repeated 500 times, and we reported the bias, standard deviation of the point estimates, the mean of standard errors, and confidence interval (CI) coverage rates for the estimated parameters (β1,β2). The coverage rates were calculated as the proportion of simulations in which a 95% CI covers the true parameter value. In theory, with a valid interval inference, the CI coverage rate should be close to 95%. All simulations were conducted using R version 4.3.1.
In the supplementary material, we computed theoretical calculations of bias under various settings of non-differential and differential outcome misclassifications. These calculations were simplified by considering only one binary covariate (as opposed to the two covariates used in the simulation settings), allowing us to examine the effects of outcome misclassification in 2×2 tables (see Tables S7,S8). Unlike the simulation results, these calculations enable the exploration of a wider range of Se and Sp grid points, providing a more comprehensive assessment of the bias patterns. These results are shown in contour plots presented in the supplementary materials (see Figures S11-S13). Nonetheless, we emphasize the importance of the simulation studies, as they examine estimation performance in practical settings.
Results
Non-differential misclassification
Table 1 shows the bias of estimating β1 using the full cohort data at 1.95% disease prevalence with a Sp ranging from 98% to 100%, and Se from 80% to 100%. The reported negative biases indicate an underestimation of the associations between the risk factors and the cancer outcome. We observe that bias is highly influenced by Sp. For example, with a very high Sp of 99.5% and perfect Se, the bias in estimating β1 was 22.9% with a poor CI coverage rate of 4%. The bias increased substantially to 53.7% when Sp dropped to 98%. In contrast, Se has a much smaller impact on bias. With a Se of 80% and perfect Sp, the bias in estimating β1 is less than 1%. For cancer outcome ascertainment with a prevalence around 2%, these findings suggest that controlling the false positives (i.e., achieving perfect Sp) is much more important than controlling the false negatives (1 − Se).
Table 1
| Specificity (%) | Sensitivity (%) | Bias | Standard deviation | Mean of standard error | Coverage (%) |
|---|---|---|---|---|---|
| 100 | 100 | 0.010 | 0.071 | 0.073 | 94.8 |
| 100 | 90 | 0.000 | 0.076 | 0.077 | 96.2 |
| 100 | 80 | −0.006 | 0.080 | 0.082 | 95.6 |
| 99.9 | 100 | −0.051 | 0.068 | 0.071 | 88.8 |
| 99.9 | 90 | −0.059 | 0.072 | 0.074 | 87.6 |
| 99.9 | 80 | −0.079 | 0.076 | 0.078 | 84.6 |
| 99.5 | 100 | −0.229 | 0.063 | 0.063 | 4.0 |
| 99.5 | 90 | −0.249 | 0.063 | 0.065 | 2.8 |
| 99.5 | 80 | −0.273 | 0.071 | 0.068 | 2.2 |
| 99 | 100 | −0.370 | 0.055 | 0.056 | 0.0 |
| 99 | 90 | −0.392 | 0.059 | 0.058 | 0.0 |
| 99 | 80 | −0.424 | 0.058 | 0.059 | 0.0 |
| 98.5 | 100 | −0.473 | 0.049 | 0.051 | 0.0 |
| 98.5 | 90 | −0.498 | 0.053 | 0.052 | 0.0 |
| 98.5 | 80 | −0.521 | 0.051 | 0.054 | 0.0 |
| 98 | 100 | −0.537 | 0.049 | 0.048 | 0.0 |
| 98 | 90 | −0.563 | 0.050 | 0.049 | 0.0 |
| 98 | 80 | −0.591 | 0.050 | 0.050 | 0.0 |
Table 2 examines the same scenarios as Table 1, but with a higher prevalence of 7.5% compared to 1.95%. It is notable that for low prevalence, bias is primarily driven by imperfect Sp, whereas at higher prevalence, Se started to play a more important role. For example, in the scenario with Sp of 99.5% and Se of 100%. The bias was 22.9% under low prevalence but decreased to 6.9% under high prevalence. This suggests that imperfect Sp has a higher impact on bias when the disease prevalence is low. Conversely, for the scenario with Se =80% and Sp =100%, the bias is 0.6% for low prevalence and 3.0% for high prevalence. While Se is not as strong a driver of bias as Sp, its impact grows with increased prevalence. The results for estimating β2 (corresponding to the continuous covariate) are presented in Tables S9,S10, showing similar patterns of bias.
Table 2
| Specificity (%) | Sensitivity (%) | Bias | Standard deviation | Mean of standard error | Coverage (%) |
|---|---|---|---|---|---|
| 100 | 100 | 0.000 | 0.037 | 0.039 | 96.0 |
| 100 | 90 | −0.015 | 0.039 | 0.04 | 93.2 |
| 100 | 80 | −0.030 | 0.042 | 0.042 | 88.8 |
| 99.9 | 100 | −0.013 | 0.038 | 0.038 | 93.4 |
| 99.9 | 90 | −0.032 | 0.04 | 0.04 | 87.6 |
| 99.9 | 80 | −0.048 | 0.043 | 0.042 | 76.8 |
| 99.5 | 100 | −0.069 | 0.037 | 0.037 | 55.0 |
| 99.5 | 90 | −0.095 | 0.040 | 0.038 | 31.6 |
| 99.5 | 80 | −0.116 | 0.038 | 0.040 | 16.2 |
| 99 | 100 | −0.133 | 0.036 | 0.035 | 3.8 |
| 99 | 90 | −0.159 | 0.038 | 0.037 | 1.8 |
| 99 | 80 | −0.187 | 0.039 | 0.038 | 0.2 |
| 98.5 | 100 | −0.190 | 0.035 | 0.034 | 0.0 |
| 98.5 | 90 | −0.214 | 0.036 | 0.035 | 0.0 |
| 98.5 | 80 | −0.243 | 0.036 | 0.037 | 0.0 |
| 98 | 100 | −0.233 | 0.033 | 0.033 | 0.0 |
| 98 | 90 | −0.262 | 0.033 | 0.034 | 0.0 |
| 98 | 80 | −0.298 | 0.037 | 0.036 | 0.0 |
In Tables 1,2, the CI coverage rates of β1 are poor when the bias gets over 5% (see Tables S9,S10 for the CI coverage rates of β2). The average of estimated standard errors is close to the Monte Carlo standard deviations, suggesting that the variance estimation is accurate under outcome misclassification. However, these CI coverage rate results are pertaining to the sample size (50,000) considered. With such a large sample, the coverage rate depends on both bias and sample size and is heavily influenced by the bias. For cohorts with smaller sample sizes, we expect the coverage rates to be less influenced by bias and closer to nominal level. Figure S5 shows this by comparing the coverage rates of β1 for cohorts with 5,000 and 50,000 individuals (see Figure S6 for β2).
Differential misclassification
Table 3 shows the simulation results with differential Se (equals 90% or 75.7% depending on the binary covariate X1, with an overall Se of 80%). We also show results for the analogous non-differential case with the same overall Se of 80%. Unlike the case of non-differential Se, where estimation is only slightly attenuated, differential Se results in substantial bias in estimating β1 that can be in either direction. Conceptually, if X1 represent socioeconomic status (1 denoting the most disadvantaged and 0 denoting the remaining population) and the Se is 90% for individuals with the most disadvantaged socioeconomic status and 75.7% for individuals with any other socioeconomic status, then this misclassification results in a 22.4% underestimation of the effect of . Conversely, when the sensitivities are reversed for individuals with the most disadvantaged socioeconomic status and the remaining population, there is a 16.9% overestimation.
Table 3
| Non-differential | Differential | |||||||
|---|---|---|---|---|---|---|---|---|
| β1 | β2 | |||||||
| β1 | β2 | β1 | β2 | |||||
| Sensitivity | Se0 = Se1 =0.8 | Se0 =0.9; Se1 =0.757 | Se0 =0.757; Se1 =0.9 | |||||
| Point estimate | 0.97 | 0.965 | 0.777 | 0.963 | 1.169 | 0.977 | ||
| Bias | −0.03 | −0.035 | −0.224 | −0.037 | 0.169 | −0.023 | ||
| Coverage (%) | 88.80 | 63.20 | 0.00 | 55.60 | 2.00 | 78.40 | ||
| Standard deviation | 0.042 | 0.021 | 0.042 | 0.021 | 0.042 | 0.021 | ||
| Mean of standard errors | 0.042 | 0.021 | 0.041 | 0.021 | 0.042 | 0.021 | ||
Similarly, with differential Sp, the bias can also be in either direction (Table 4), in contrast to bias towards the null with non-differential Sp. It is also worth noting that, as Se and Sp are differential with respect to X1, but not X2, the bias in estimating β2 is comparable between differential and non-differential misclassification (Tables 3,4).
Table 4
| Non-differential | Differential | |||||||
|---|---|---|---|---|---|---|---|---|
| β1 | β2 | |||||||
| β1 | β2 | β1 | β2 | |||||
| Specificity | Sp0 = Sp1 =0.995 | Sp0 =1.00; Sp1 =0.99 | Sp0 =0.99; Sp1 =1,00 | |||||
| Point estimate | 0.928 | 0.933 | 1.090 | 0.937 | 0.768 | 0.925 | ||
| Bias | −0.072 | −0.067 | 0.090 | −0.063 | −0.233 | −0.075 | ||
| Coverage (%) | 50.40 | 4.40 | 30.00 | 14.00 | 0.00 | 3.20 | ||
| Standard deviation | 0.036 | 0.018 | 0.038 | 0.019 | 0.033 | 0.020 | ||
| Mean of standard errors | 0.037 | 0.019 | 0.038 | 0.019 | 0.036 | 0.019 | ||
Discussion
In this paper, we examined the impact of cancer outcome misclassification through extensive simulations and theoretical bias calculations. We considered both cohort and case-control designs, but focused primarily on cohort studies as the results were similar between the two designs.
When misclassification is non-differential, we found that imperfect Sp introduces a substantial underestimation of relative risks, particularly with a low prevalence. If cancer outcome ascertainment is based on self-report, lower Sp often results from misreporting cancer types (e.g., report a carcinoma in situ as cancer). While registry linkage is generally more accurate than self-report, false positives are still possible. For example, with incomplete social security information, an individual with a common name may be mistakenly identified as having cancer due to a false match in the registry linkage (2).
We also found that non-differential Se resulted in only a small bias in estimating relative risks. In practice, lower Se is quite common in cancer outcome ascertainment. False negatives in self-reported cancer outcomes can result from an individual failing to recall a cancer diagnosis (3). For cancer ascertainment through registries, false negatives can occur in several ways including, non-coverage in certain regions, under-reporting cancers to registries, or individuals moving out of registry coverage areas (3). Our findings suggest that we do not need to validate the non-cases if the Se is non-differential.
In contrast, when the misclassification is differential, the bias patterns are strikingly different from the non-differential case: bias can be substantial and occur in either direction. Unlike non-differential misclassification, where false negatives do not result in substantial bias, relative risk estimation can be seriously biased when the Se varies with a covariate in the model. For example, the effects of race-ethnicity on cancer risk may be poorly estimated in logistic regression when the Se or Sp of cancer ascertainment also varies by race-ethnicity. This finding emphasizes the value of understanding differences in the cancer ascertainment process by race-ethnicity in health disparities research, and recognizing situations where misclassification is linked to a key exposure.
Linet et al. [2020] provided a comprehensive review of 26 radiation cohort studies highlighting challenges in cancer outcome ascertainment and implications for risk estimation (11). They noted that when misclassification is non-differential to the key exposure (radiation dose levels), relative risk estimates tend to be biased towards the null, consistent with our simulation findings. They also identified four studies where the outcome misclassification appeared to vary by radiation level, though the impact of such misclassification was not quantified. In contrast, our study performed extensive simulation studies to quantitatively evaluate the bias in risk estimation by both differential and non-differential outcome misclassifications. More recently, Liu et al. [2025] directly compared cancer risk estimates based on self-reported diagnoses versus registry linkage in a large U.S. cohort, demonstrating that risk estimates varied depending on the outcome ascertainment method, with self-report introducing attenuation likely due to lower Sp (12). These empirical findings align with our simulation results that reductions in Sp can produce substantial underestimation of relative risk, particularly when disease prevalence is low. In addition, recent work in the context on electronic health records (EHRs) has shown similar patterns. For example, Zhang et al. [2024] demonstrated through simulations that misclassification of EHR-derived cancer outcomes can introduce meaningful bias in effect estimates (13). Together, these studies reinforce our key conclusion: the accuracy of cancer outcome ascertainment, particularly Sp, plays a critical role in the validity of cancer risk models.
While our simulations cover a wide range of realistic scenarios, they cannot exhaust all possible combinations of prevalence, sample sizes, effect sizes, and misclassification mechanisms. We provided the simulation code in the Supplementary Materials and recommend that practitioners tailor the simulations to their own context to assess how outcome misclassification may influence their relative risk estimates.
We acknowledge that our study has several limitations. First, we focus on binary cancer outcomes, not time to cancer diagnosis. Time-to-event outcomes introduce additional complications, as the timing of the event and censoring may be subject to reporting errors, potentially leading to the bias in hazard ratio estimates. While our conclusions likely extend to time-to-event analysis in terms of direction of bias, the magnitude and pattern may differ, warranting further investigations. Second, our simulations assume that Se/Sp are known. In practice, when validation data are limited or unavailable, reliable estimation of Se/Sp may not be feasible. In such cases, we recommend that researchers conduct Se analyses or simulation studies across plausible ranges of Se/Sp to assess the robustness of their findings.
Our study highlights the importance of using validated cancer ascertainment methods with a gold-standard (e.g., medical records). For non-differential misclassification, this is an easier problem since we only need to validate those identified as cancer cases to mitigate false positives. Validation becomes more challenging for differential outcome misclassification because both false positives and false negatives can introduce severe bias. The difficulty in this case is that correcting false negatives requires the validation of a large fraction of the study population. We recommend that several sources of cancer outcome data be obtained and used to conduct Se analyses for risk model estimation. For example, if self-report or cancer registry linkage is used as the primary outcome assessment method, EHR may be a good source for identifying missed cases while also confirming identified cases (14).
Conclusions
Our study quantified the potentially large effects of having a differentially and non-differentially misclassified binary cancer outcome in a cancer risk model. Our findings highlight the importance of having accurate cancer outcome ascertainment. We found that bias may be in either direction with differential misclassification, whereas under non-differential misclassification, bias underestimated the cancer risks estimates. Researchers may need to collect several sources to validate the cancer outcomes, and recognize situations where outcome is linked to key covariate.
Acknowledgments
None.
Footnote
Peer Review File: Available at https://ace.amegroups.com/article/view/10.21037/ace-2025-3/prf
Funding: None.
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://ace.amegroups.com/article/view/10.21037/ace-2025-3/coif). D.L. received support from National Cancer Institute (the author is an employee of this institute). The other authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Pinsky PF, Yu K, Black A, et al. Active follow-up versus passive linkage with cancer registries for case ascertainment in a cohort. Cancer Epidemiol 2016;45:26-31. [Crossref] [PubMed]
- Bond B, Brown JD, Luque A, et al. The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey. United States Census Bureau; 2014.
- Liu D, Linet MS, Albert PS, et al. Ascertainment of Incident Cancer by US Population-Based Cancer Registries Versus Self-Reports and Death Certificates in a Nationwide Cohort Study, the US Radiologic Technologists Study. Am J Epidemiol 2022;191:2075-83. [Crossref] [PubMed]
- Chen Q, Galfalvy H, Duan N. Effects of disease misclassification on exposure-disease association. Am J Public Health 2013;103:e67-73. [Crossref] [PubMed]
- Chyou PH. Patterns of bias due to differential misclassification by case-control status in a case-control study. Eur J Epidemiol 2007;22:7-17. [Crossref] [PubMed]
- Dosemeci M, Wacholder S, Lubin JH. Does nondifferential misclassification of exposure always bias a true effect toward the null value? Am J Epidemiol 1990;132:746-8. [Crossref] [PubMed]
- Weinberg CR, Umbach DM, Greenland S. When will nondifferential misclassification of an exposure preserve the direction of a trend? Am J Epidemiol 1994;140:565-71. [Crossref] [PubMed]
- Mullins MA, Kler JS, Eastman MR, et al. Validation of Self-reported Cancer Diagnoses Using Medicare Diagnostic Claims in the US Health and Retirement Study, 2000-2016. Cancer Epidemiol Biomarkers Prev 2022;31:287-92. [Crossref] [PubMed]
- Raza SA, Jawed I, Zoorob RJ, et al. Completeness of Cancer Case Ascertainment in International Cancer Registries: Exploring the Issue of Gender Disparities. Front Oncol 2020;10:1148. [Crossref] [PubMed]
- Randall S, Brown A, Boyd J, et al. Sociodemographic differences in linkage error: an examination of four large-scale datasets. BMC Health Serv Res 2018;18:678. [Crossref] [PubMed]
- Linet MS, Schubauer-Berigan MK, Berrington de González A. Outcome Assessment in Epidemiological Studies of Low-Dose Radiation Exposure and Cancer Risks: Sources, Level of Ascertainment, and Misclassification. J Natl Cancer Inst Monogr 2020;2020:154-75. [Crossref] [PubMed]
- Liu D, Linet MS, Albert PS, et al. Examining bias due to method of follow-up for cancer incidence in a large U.S. cohort: Self-report versus registry linkage. Ann Epidemiol 2025;107:44-50.
- Zhang H, Clark AS, Hubbard RA. A Quantitative Bias Analysis Approach to Informative Presence Bias in Electronic Health Records. Epidemiology 2024;35:349-58. [Crossref] [PubMed]
- Leggat-Barr K, Ryu R, Hogarth M, et al. Early Ascertainment of Breast Cancer Diagnoses Comparing Self-Reported Questionnaires and Electronic Health Record Data Warehouse: The WISDOM Study. JCO Clin Cancer Inform 2023;7:e2300019. [Crossref] [PubMed]
Cite this article as: Hill LA, Albert PS, Figueroa JD, Liu D. Implications of outcome misclassification in risk effect modeling in cancer population studies. Ann Cancer Epidemiol 2026;10:3.

