ARCHIVED - Chronic Diseases in Canada


Volume 30, no. 1, December 2009

Using cancer registry data: agreement in cause-of-death data between the Ontario Cancer Registry and a longitudinal study of breast cancer patients

D. R. Brenner, MSc (1,2); M. C. Tammemägi, PhD (3); S. B. Bull, PhD (1,2); D. Pinnaduwaje, PhD (1); I. L. Andrulis, PhD (1,4)


Author References

  1. Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto (Ontario)
  2. Dalla Lana School of Public Health, University of Toronto, Toronto (Ontario)
  3. Department of Community Health Sciences, Brock University, St. Catharines (Ontario)
  4. Department of Molecular Genetics, University of Toronto, Toronto (Ontario)

Correspondence: Darren Brenner, The Prosserman Centre for Health Research, Samuel Lunenfeld Research Institute, 60 Murray St., Toronto, ON  M5T 1X5, Tel.: 416-586-4800 ext. 8110, Fax: 416-586-5800, Email:



Data from the Ontario Cancer Registry (OCR) were compared with data from a multi-centred prospective cohort of 1655 node-negative breast cancer patients with intensive clinical follow-up. Agreement in cause of death was evaluated using kappa statistics. The accuracy of OCR classification was evaluated against the Mount Sinai Hospital (MSH) study oncologist’s interpretation of intensely followed, cohort-collected data as the reference standard. The two sources showed a high level of agreement (kappa statistic [k] = 0.88; 95% confidence interval [CI]: 0.86, 0.90) in vital status and cause of death. Among those cases where both sources reported a death, the OCR had a sensitivity of 95% (95% CI: 90.5, 98.8) and a specificity of 88% (95% CI: 79.6, 92.4). The OCR is a valuable tool for epidemiologic studies of breast cancer to acquire adequate and easily attainable cause-of-death information.

Keywords: epidemiology methods, data collection, data registries, vital statistics, breast neoplasms, cause of death, Ontario Cancer Registry


The use of cancer registry mortality and follow-up data in epidemiologic studies is common.1 However, it is unclear to what extent bias may be introduced because of incomplete or inaccurate cause-of-death data in the registries.2,3

The Ontario Cancer Registry (OCR), maintained and operated by Cancer Care Ontario since 1964, collects vital information on all new cases of cancer in the province except for non-melanoma skin cancers. Validation studies have shown the registry to be effective in ascertaining cancer cases in the province (98% sensitivity).4 The registry collects data from pathology reports, patient records, hospital discharge records and death certificates from the Registrar General of Ontario. Probabilistic linkage is then used to reconcile the data sources into a central database.5 The registry performs regular internal data quality evaluations; however, registry data are rarely compared to actual detailed medical records and data collected from additional external sources. One comparative study of head-and-neck cancer outcomes reported that the OCR had excellent agreement in index tumour site assignment, vital status and date of death; however, there was a 31% error rate in cause of death (cancer vs. noncancer).6 To our knowledge, no studies have examined agreement of cause-of-death data among breast cancer patients in the OCR with cause of death determined in an independent study with rigorous follow-up.

A multi-centred prospective cohort study based at Mount Sinai Hospital (MSH) in Toronto commenced patient enrollment in 1987. The MSH study collected incident cases of pathologically confirmed node-negative breast cancer from eight participating sites in the greater Toronto area. The study aim was to evaluate the associations between genetic and molecular tumour alterations and recurrence of disease and death due to breast cancer.7 Study managers systematically collected data from hospital and medical records, patient interviews, pathology reports, patient charts, coroner reports and death certificates. The study oncologist, a specialist in breast cancer, made the final determination as to the classification of cause of death after examining the collected information.

The aim of our study was to evaluate the agreement between cause-of-death data from the Ontario Cancer Registry and the MSH study, which has regular and systematic patient monitoring and follow-up, and specialist-determined outcome based, for the most part, on relatively complete and accurate data. Other studies have found that specialist classification of cancer outcomes is more accurate than registry classification, possibly due to more extensive data availability or experience or both.8 For these reasons, our study also evaluates the accuracy of OCR cause-of-death data using the MSH study data as the reference standard.


OCR data were linked to MSH study patients according to OCR standard procedures based on probabilistic linkage using personal identifiers in the MSH study database. ICD-9* and ICD-10† codes described causes of death, and these were then classified as 1) due to breast cancer or 2) due to other/competing causes. Those individuals without cause-of-death information in both the MSH study and the OCR data were considered to be alive. The MSH study followed patients from the time of diagnosis and enrolment, from 1987 until the spring of 2005. Data from the OCR contained events that occurred up to 2006; however, data quality was only verified until the end of 2004. This created some discrepancy in the duration of follow-up. Any discrepant deaths were examined for date of death in order to address the discrepancy.

Kappa statistics were calculated to determine the agreement in cause-of-death classification between the two sources.9 We calculated a weighted kappa with the rationale that a missed cancer-related death is of great importance to the MSH study and the OCR. Statistics were calculated using SAS® (SAS V9.1; SAS Institute Inc., Cary, NC) and all 95% confidence intervals (CI) were two-sided. Weighted kappa was determined using the default weighting scheme in SAS, based on the classification order displayed in Table 1. OCR classification accuracy was evaluated by estimating sensitivity and specificity using the MSH classification as the reference standard. The study was approved by the Mount Sinai Hospital Research Ethics Board.


The study population consisted of the 1655 patients in the hospital-based study with no axillary nodal involvement at diagnosis (stage I [72%] and II [28%]). Of these patients, one did not have a record linkage match in the OCR (i.e. there was no information in the OCR), and six were marked as deceased in the OCR with no cause of death provided. These patients were excluded from analysis as the data provided no potential for comparison. Sensitivity analyses showed that misclassification of these deaths had little impact on the results: kappa statistics changed from 0.87 to 0.86 when all these patients were treated as still alive, and the percent agreement changed from 90.0% to 87.6% when all seven deaths were considered as errors.


Table 1
Agreement between cause of death and vital status from cohort study and the Ontario Cancer Registry
MSH study
No death info. or LTFU* Competing cause of death Breast cancer Total
Ontario Cancer Registry No death info. 1331 6 6 1343
Competing cause of death 19 129 7 155
Breast cancer death 5 21 124 150
Total 1355 156 137 1648

KappaSimple: 0.87 (0.85, 0.90)

Weighted: 0.90 (0.88, 0.92)

Additional deaths provided by OCR

* LTFU – Lost to follow-up


Tableau 2
Comparison of cause of death between a cohort study with intensive follow-up and the Ontario Cancer Registry (MSH study cause of death assumed reference standard)
MSH study (assumed reference standard)
Breast cancer death Competing cause Total
Ontario Cancer Registry Breast cancer death 124 21 145
Competing cause 7 129 136
Total 131 150 281

Sensitivity 95% (90.5-98.8)

Specificity 86% (79.6-92.4)


The two sources showed a high level of agreement (kappa = 0.88; 95% CI: 0.86, 0.90) and a weighted kappa of 0.90 (95% CI: 0.80, 0.92). Of the 87 patients lost to follow-up by the MSH study, 11 had died and were located via the OCR. An additional 13 deaths not originally recorded by the MSH study were identified by the OCR, while 12 deaths recorded by the MSH Study were not identified by the OCR. These 12 discrepant deaths were checked for date of death; in three of the cases the deaths occurred in early 2005, but in the remaining nine cases the deaths occurred before 2000 and were yet to be picked up by the OCR. Table 1 shows the distribution of the causes of death provided by the study and the OCR.

Comparing the cases where both the OCR and the MSH study reported a death, the percent agreement on classification of death was 90.0% (Table 2, [(124 +129)/281]). Using the MSH study data as the reference standard, the OCR had a sensitivity of 95% (95% CI: 90.5, 98.8) and a specificity of 88% (95% CI: 79.6, 92.4).


These findings present several important points. First, the cause of death from the OCR abstraction and collection system strongly agreed with those from an intensively followed cohort study where cause of death was determined by a medical oncologist. This indicates that OCR data may be useful in studies where patient follow-up is incomplete or not available; it also highlights the utility of the OCR for epidemiologic studies that are unable to acquire adequate clinician expertise for interpretation of cause of death. In these instances, particularly for studies of breast cancer, the OCR may be used as a relatively accurate and easily attainable source of cause of death. Second, there were several deaths that were missed by the MSH study, as patients were lost to follow-up. In these instances, the OCR collected deaths that, due to the province-wide coverage, enhanced the follow-up data of the MSH study. Third, our study found a high level of accuracy in the abstraction techniques of the OCR: we observed high sensitivity and specificity when the results were compared with those of an experienced medical oncologist making informed decisions from extensive cohort data.

Our study showed a much lower error rate than in previous analyses of cause of death stored in the OCR.6 This difference may, however, be due to the different tumor sites being compared. Thus, our results may not be generalizable to all cancer sites. We reason, however, that misclassification of cause of death is greatest in those cancers, such as breast or prostate cancers, that have favourable prognoses, because the probabilities of deaths due to cancer and to competing causes approach one another much more than in highly aggressive cancers, such as lung or pancreatic cancers. Also, with aggressive cancers the course of illness is usually dramatic and clinically more clear-cut and thus classification of death should be more accurate. If our reasoning is correct, the results of our study might be reassuring to researchers investigating other cancers or advanced breast cancer.

Our results may not be generalizable to all cancer registries. Due to the centralized nature of the Ontario health care system, the OCR is able to obtain all the pertinent medical documentation in order to provide the epidemiologic data for this type of study. Other registries may not have the infrastructure or ability to be as complete and inclusive as the OCR. However, where registries are population-inclusive and verified, our results may be applicable.

There are a few methodological issues in this study that need to be addressed. Seven patients were removed from the analysis as they were recorded in the OCR as deceased with no cause of death. Sensitivity analyses showed, however, that misclassification of these deaths in either direction would have minimal effect on our conclusions.

Intensive follow-up for the MSH study ended in spring 2005 when funding for the clinical follow-up component of the study came to an end. The request for data from the OCR was made in August of 2006, at which time the quality of the registry was only assured until the end of 2004. Therefore, there was a slight discrepancy in the end of follow-up; however, this did not appear to affect the results. Our analyses used the decisions made by the MSH study medical oncologist from all collected data as the reference standard. It is possible that a small percentage of diagnoses were misclassified by the MSH study medical oncologist, potentially decreasing the agreement between the data sources. However, the high kappa statistics reflect good agreement in the absence of a gold standard.

We used kappa statistics to evaluate the agreement between two sources of categorical cause-of-death data as there was no clear cut gold standard (e.g. OCR found cancer deaths missed by the MSH study).   In doing so we were able to provide classification accuracy with sensitivity and specificity, as well as reliability with kappa.  These provide complementary pieces of information and strengthen the conclusions made about the utility of the OCR.

In conclusion, the results of our study show that there is strong agreement between the cause-of-death data collected from a longitudinal cohort study of breast cancer patients using a medical oncologist’s interpretation based on rigorous prospective data collection and the passive data collection system of the OCR. This information is important to the conclusions drawn from studies conducted using registry data, as it may strengthen their validity. It may also encourage researchers to use cancer registry data when study-specific cancer follow-up data is incomplete, absent or of poor quality. Also, our results suggest that researchers may want to routinely employ registry data to verify follow-up information in ongoing studies.


 *^ International Statistical Classification of Diseases and Related Health Problems, 9th Revision.

 †^ International Statistical Classification of Diseases and Related Health Problems, 10th Revision.


  1. ^ Marrett LD, Clarke EA, Hatcher J, Weir HK. Epidemiologic research using the Ontario Cancer Registry. Can J Public Health. 1986;77 Suppl 1:79-85.
  2. ^ Hilsenbeck SG. Quality control practices in centralized tumor registries in North America. J Clin Epidemiol. 1990;43(11):1201-12.
  3. ^ Robles SC, Marrett LD, Clarke EA, Risch HA. An application of capture-recapture methods to the estimation of completeness of cancer registration. J Clin Epidemiol. 1988;41(5):495-501.
  4. ^ McLaughlin JR, Kreiger N, Marrett LD, Holowaty EJ. Cancer incidence registration and trends in Ontario. Eur J Cancer. 1991;27(11):1520-4.
  5. ^ Holowaty E. Summarization of information from multiple data sources. In: Black R, Simonato, L, Storm, H, ed. Automated data collection in cancer registries. IARC Technical Report 32. Lyon: IART, 1998.
  6. ^ Hall S, Schulze K, Groome P, Mackillop W, Holowaty E. Using cancer registry data for survival studies: the example of the Ontario Cancer Registry. J Clin Epidemiol. 2006;59(1):67-76.
  7. ^ Andrulis IL, Bull SB, Blackstein ME, Sutherland D, Mak C, Sidlofsky S, Pritzker KP, Hartwick RW, Hanna W, Lickley L, Wilkinson R, Qizilbash A, Ambus U, Lipa M, Weizel H, Katz A, Baida M, Mariz S, Stoik G, Dacamara P, Strongitharm D, Geddie W, McCready D. neu/erbB-2 amplification identifies a poor-prognosis group of women with node-negative breast cancer. Toronto Breast Cancer Study Group. J Clin Oncol. 1998;16(4):1340-9.
  8. ^ Schouten LJ, Jager JJ, van den Brandt PA. Quality of cancer registry data: a comparison of data provided by clinicians with those of registration personnel. Br J Cancer. 1993;68(5):974-7.
  9. ^ Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-74.
Report a problem or mistake on this page
Please select all that apply:

Thank you for your help!

You will not receive a reply. For enquiries, contact us.

Date modified: