Reliability of self-reports:

Data from the Canadian Multi-Centre Osteoporosis Study (CaMos)

Vol. 25 No. 2, 2004

Victoria Nadalin, Kris Bentvelsen and Nancy Kreiger

Abstract

Reliable questions enhance study design. We assessed the reliability of questions that gather demographic, sun exposure, reproductive history, and physical activity information. Subjects were participants in the Canadian Multicentre Osteoporosis Study (CaMos), a cohort study of Canadian adults recruited January 1996 to September 1997 in nine cities, stratified by sex, age, and location. Following personal interviews, 367 subjects were re-administered part of the questionnaire by telephone. Reliability was assessed using kappa and intra-class correlation. Reliability was excellent for employment status, reproductive history, weight and height (0.91 to 0.97), not differing greatly when stratified by age group or sex. Physical activity and sun exposure were reported with fair to good reliability (0.44 to 0.58), except for moderate activity (kappa=0.30, 95% confidence interval 0.23, 0.37). Stratification by body mass index did not show significant differences. Many items can be reported reliably, especially those of height, weight, employment status and reproductive history, and, to a lesser extent, physical activity and sun exposure. Similar questions might be used reliably in future studies.

Key words: data collection; questionnaire design; reliability

Introduction

Since epidemiologic reports often rely on data collected through interviewer-administered questionnaires, it is important to determine the reliability of information obtained in this manner. The use of questions that have been proven reliable can add to the integrity of a study's results, and reliability assessments can aid in the design of subsequent questionnaires. We assessed the reliability of a number of demographic, reproductive, physical activity and sun exposure questions, items commonly explored in etiologic studies.

Developed to inform disease prevention strategies, the Canadian Multicentre Osteoporosis Study (CaMos) is a prospective cohort study that collects information on the skeletal health and risk factor exposures of a random sample of Canadian adults.1 CaMos data were obtained through the use of questionnaires, spinal x-rays and bone density scans. While similar questions have been assessed in the literature for test-retest reliability, the intent to use these data in an applied, prevention-oriented manner, along with differences in study design, study populations, and the wording of questions necessitates an examination of the reliability of the CaMos questionnaire. The purpose of this analysis was to assess the test-retest reliability of a portion of the CaMos questionnaire using a combination of administration modes.

Materials and methods

Test subjects (recruited January 1996 to September 1997) were those individuals who participated in the personal interview portion of CaMos. Subjects were over 24 years of age, randomly chosen within households, and selected through telephone lists. Households were first contacted via introductory letter, followed by a telephone call in which a personal interview was arranged. Questionnaires collected demographic, medical history, reproductive event, and lifestyle (diet, physical activity and tobacco exposure) information.

Three to five months following original (test) interviews, subjects from three study centres (Hamilton, Toronto, Québec City) were administered the retest questionnaire over the telephone by the interviewer who conducted the test interview. Retest questionnaires included questions on height, weight, physical activity, sun exposure, and reproductive history.

Subjects were recruited until there was a minimum of 35 subjects in each stratum (as defined by study centre, age and sex). Up to six attempts were made to contact each respondent. Those who refused or could not be contacted were declared non-responders and recruitment continued with the next subject on the list of test respondents. Ethics approval for the study was obtained from the Research Ethics Board of each study centre.

TABLE 1
Description of sample, respondents vs. non-respondents
Variable Respondents n Non-respondents n
Sex
  Male 150 27
  Female 217 28
Age group
  45 - 64 184 31
  65 - 80 183 24
Location of centre
  Hamilton 148 23
  City of Quebec 114 -
  Toronto 105 32
Total subjects 367 (87%) 55 (13%)

Data were analyzed using SAS software.2 To quantify the agreement between test and retest responses, statistics appropriate to the level of measurement were calculated: kappa and percent agreement for categorical, and intra-class correlation for continuous. In an attempt to explore differences between risk groups, reliability statistics were estimated within strata: age group (45-64, 65-80), sex, city, province, smoking status, family history of osteoporosis, and body mass index (<25, 25-29, 30+, for selected variables).

The agreement implied by the kappa statistic was quantified as follows: values below 0.40 represent poor agreement, those between 0.40 and 0.75 represent fair to good agreement, and those greater than 0.75 represent excellent agreement beyond chance.3

Results

Of the 422 respondents who were contacted, 367 (87%) completed the retest questionnaire. Table 1 shows the characteristics of respondents and non-respondents. Fifty-nine percent of respondents were female, and most respondents (69%) resided in Ontario. All City of Québec test participants completed the retest questionnaire. Kappa (K), percent agreement, and intra-class correlation (ICC) values for the level of agreement between test and retest are shown in tables 2-4 (stratum-specific data not shown). Table 2 displays the questions asked. Due to space limitations, actual questions are not shown in tables 3 and 4; they are, however, available from the authors.

Employment status, height and greatest adult weight were reported with excellent consistency in the total sample with reliability values ranging from 0.82 to 0.97. Excellent results for these variables were seen across strata defined by age, sex, and BMI, with the exception of those 65 years of age and older, who reported employment status with fair consistency (K=0.46, 95% confidence interval [CI] 0.33, 0.59).

TABLE 2
Reliability of general information
General information Effective sample size Reliability statistics
(95% CI)
Percent agreement
What is your current employment status?
(7 categories)
365 0.821
(0.77, 0.87)
86.6%
What was your greatest adult height? 349 0.972
(0.96, 0.97)
-
What was your greatest adult weight? 349 0.952
(0.94, 0.96)
-
Have you ever lost more than 10 lbs (other than after childbirth, one year post-partum)? (yes/no) 365 0.521
(0.44, 0.61)
76.2%

1Kappa statistics
2Intraclass correlation coefficient

Sun exposure and physical activity reporting were fair to good across strata, with two exceptions: hours per week of moderate activity in the last year showed poor agreement (K=0.30, 95% CI 0.23, 0.37) for the total sample and across all strata, and sun exposure at 50 years showed poor agreement in females (K=0.39, 95% CI 0.25, 0.53). When asked about ever experiencing a 10-pound weight loss, agreement was fair to good (K=0.52, 95% CI 0.44, 0.61).

Reproductive history information was reported with excellent consistency; K and ICC values fell between 0.91 and 0.97, findings that held true within age strata. Body mass index comparisons showed little change across three categories. As expected, all percent agreement values demonstrate higher agreement than do the results of kappa, as kappa is a statistic that corrects for chance agreement. Other analyses, stratified by location, province, family history of osteoporosis, and smoking status were performed, but there were too few respondents in each cell to derive stable estimates.

Discussion

The objective of this analysis was to assess the test-retest reliability of the survey questions in CaMos that measure sun exposure, physical activity, and reproductive history. Results demonstrate the excellent reliability of height, weight and reproductive history reporting and the generally fair to good reproducibility of physical activity and sun exposure information. Results were stable across strata; reliability estimates remained similar when stratified by age group, sex and body mass index.

TABLE 3
Reliability of physical activity and sun exposure variables1
Physical activity and sun exposure Effective sample size Reliability statistics2 (95% CI) Percent agreement
Description of activities at work (physical activity level)
(4 categories)
361 0.58 (0.50, 0.65) 71.5%
Number of hours/week spent on strenuous sports in last year (6 categories) 365 0.57 (0.47, 0.68) 83.8%
Number of hours/week spent on moderate activity in lastyear
(8 categories)?
365 0.30 (0.23, 0.37) 31.5%
Frequency of exposure to direct sun in last 12 months(4 categories) 365 0.56 (0.49, 0.64) 74.3%
Frequency of exposure to direct sun at 50 (4 categories) 234 0.44 (0.35, 0.54) 62.4%
Frequency of exposure to direct sun at 30 (4 categories) 365 0.49 (0.42, 0.55) 59.5%
Frequency of exposure to sun during childhood (4 categories) 363 0.53 (0.47, 0.59) 54.3%

1Actual questions are available from the authors.
2Kappa

The level of agreement for sun exposure variables was generally fair to good in our data. Rosso et al.4 found higher levels of consistency in sun exposure reporting, with ICC values ranging from 0.68 for outdoor work to 0.79 for leisure time outside. English et al.5 reported excellent agreement when subjects were questioned about the amount of time they spent outdoors (ICC=0.77, 95% CI 0.71, 0.83).

In our data, female reproductive history variables were reported with consistency. Similar results have been found in recent studies. Lin et al.,6 and Kelly et al.7 also found that number of pregnancies resulting in live births and age of menarche were reported with excellent consistency, and Bosetti8 found similar values for the same variables, as well as for age at first birth.

TABLE 4
Reliability of female reproductive history variables1
Reproductive variables Effective sample size Reliability statistics
(95% CI)
Percent agreement
Removal of uterus (yes/no) 216 0.942
(0.89, 0.99)
97.2%
Removal of ovaries (3 categories) 213 0.912
(0.84, 0.97)
96.2%
Number of pregnancies resulting in live births 182 0.963
(0.94, 0.97)
-
Age at first birth 178 0.973
(0.96, 0.98)
-
Breast feeding of children (yes/no) 178 0.922
(0.86, 0.98)
96.1%
Age of first period 213 0.953
(0.93, 0.96)
-

1Actual questions are available from the authors.
2Kappa
3Intraclass correlation coefficient.

In our analysis, height and weight were reported with consistency, an outcome that is supported by the literature. Kelly et al.7 reported high levels of reliability for current height (ICC=0.90) and weight (ICC=0.87) as indicated by subjects interviewed less than one year after test interviews. Cumming and Klineberg9 analyzed the responses of an elderly population (median age = 80 years) re-interviewed one to three months after test interviews; weight (0.97) and height (0.95) agreement both were high.

Our subjects reported with good to fair consistency on activity level at work and weekly time spent on strenuous sports in the last year, while hours per week spent on moderate activity was reported with poor consistency. It is possible that strenuous sports are played outdoors more often, and that time outdoors is more salient to respondents than is physical activity, which might explain the different kappa values for strenuous sports and moderate activity. Cumming and Klineberg9 found that at age 50, leisure (physical) activity level (K=0.61) and work activity (K=0.68) were reported with fair to good reliability in their sample of the elderly. Similarly, Batty,10 in a study of male factory workers (72% re-interviewed in 23 months or less, the remainder re-interviewed at >23 months), found that the reliability of physical activity reporting was not high; fair to good results were obtained for overall leisure activity (K=0.69) and overall work activity (K=0.49). Although physical activity is likely to vary by season, this questionnaire did not collect historical activity information, and the majority of retest interviews (n=224) were completed in the winter months; as such, meaningful seasonal comparisons are not possible.

Results reported here may have been affected by the interval between test and retest, reporting errors, and differences in data collection techniques between test and retest. Although the test-retest time interval was minimal (three to five months), changes may have occurred that would alter responses. It is difficult to assess if the low reliability of variables that asked about physical activity over the last year reflects a change in level (possibly related to seasonal differences) rather than the unreliability of questions asked.11,12 To minimize this source of error, Batty10 asked respondents if their physical activity level changed between test and retest (separated by a period of 4-6 weeks). Kappa values were higher when those with changed activity levels were excluded. The reliability of work activity values over the last year increased from 0.49 to 0.54 when those who reported a changed activity level were excluded. Our physical activity results may have shown greater reliability had those who changed their activity level been excluded.

The results of this study may be affected by the data collection technique; interview data can be affected by interviewer-respondent dynamics, and it is also possible that the changed method between test (in-person) and retest (telephone) interviews may affect the results. Studies comparing the information obtained from different data collection strategies, however, have generally found little difference between telephone and in-person interviews,13,14 and the high level of reliability reported for reproductive history indicates the acceptability of the data collection technique.

Results of this study demonstrate that height, weight, employment status and reproductive history questions (as asked in CaMos) can be answered reliably, as, to a lesser extent, can those which relate to physical activity and sun exposure. Using comparable questions, studies of similar populations may expect reliable data. Such questions can be used as a means of identifying and targeting high-risk individuals for prevention programs, and may also be used in a test-retest manner to assess the impact of public health interventions. It is important to note however, that it is important for each study to assess the reliability of the questions upon which its conclusions rely.

References

  1. Kreiger N, Tenenhouse A, Joseph L, Mackenzie T, Poliquin S, Brown JP, Prior JC, Rittmaster RS. The Canadian Multicentre Osteoporosis Study (CaMos): Background, rationale, methods. Can J Aging 1999;18:376-87.
  2. SAS Institute Inc. SAS/Stat (Version 8.2). Cary NC, 2001, SAS Institute, Inc.
  3. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74.
  4. Rosso S, Minarro R, Schraub S, Tumino R, Franceschi S, Zanetti R. Reproducibility of skin characteristic measurements and reported sun exposure history. Int J Epidemiol 2002;31:439-46.
  5. English DR, Armstrong BK, Kricker A. Reproducibility of reported measurements of sun exposure in a case-control study. Cancer Epidemiol Biomarkers Prev 1998; 10:857-63.
  6. Lin SS, Glaser SL, Stewart SL. Reliability of self-reported reproductive factors and childhood social class indicators in a case-control study in women. Ann Epidemiol 2002;12:242-7.
  7. Kelly JP, Rosenberg L, Kaufman, DW, Shapiro S. Reliability of personal interview data in a hospital-based case-control study. Ann J Epidemiol 1990;131:79-90.
  8. Bosetti C, Tavani A, Negri E, Trichopoulos D, La Vecchia C. Reliability of data on medical conditions, menstrual and reproductive history provided by hospital controls. J Clin Epidemiol 2001;54:902-6.
  9. Cumming RG, Klineberg RJ. A study of the reproducibility of longterm recall in the elderly. Epidemiology 1994;5:116-9.
  10. Batty D. Reliability of a physical activity questionnaire in middleaged men. Public Health 2000;114:474-6.
  11. Washburn RA, Montoye HJ. The assessment of physical activity by questionnaire. Am J Epidemiol 1986;123:563-76.
  12. Kelsey JL, Thompson WD, Evans AS. Methods in Observational Epidemiology. New York: Oxford University Press; 1986.
  13. Aneshensel CS, Frerichs RR, Clark VA, Yokopenic PA. Telephone versus inperson surveys of community health status. Am J Public Health 1982;72:1017-21.
  14. Siemiatycki J. A comparison of mail, telephone, and home interview strategies for household health surveys. Am J Public Health 1979;69:238-45.

Author References

Victoria Nadalin, Division of Preventive Oncology, Cancer Care Ontario, Toronto, Ontario, Canada Kris Bentvelsen, Amgen Canada Inc., Mississauga, Ontario, Canada Nancy Kreiger, Division of Preventive Oncology, Cancer Care Ontario, Toronto, Ontario and Departments of Nutritional Sciences and Public Health Sciences, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada Correspondence: Victoria Nadalin, Division of Preventive Oncology, Cancer Care Ontario, 620 University Avenue, Toronto, Ontario M5G 2L7 Canada; Fax: (416) 971-7554; E-mail: victoria.nadalin@cancercare.on.ca

Page details

Date modified: