Evaluation of the Language Instruction for Newcomers to Canada (LINC) Program

Appendix B: Other methodological considerations

Teacher surveys and class information forms, along with the learner surveys, instruction sheets and a return postage paid envelope were couriered to 56 of the 70 classes selected at random. The other 14 instructors were involved in the case studies. These teachers were emailed an electronic version of the questionnaires for return before the case study site visit (information on the forms was needed to prepare for the site visit).

Choosing the case study classes and learners

Multistage random sampling was used to select the classes and learners for the evaluation. Under this procedure, classes are chosen at random, and selections from within the selected classes are made at random. The principles of multistage sampling are straightforward, but avoiding inadvertent biases can complicate its execution. It is not as simple as first selecting the class and then selecting learners from within them. Each individual across all the classes should have an equal probability of selection. The selection of the LINC classes had to take into account the number of learners within them. Thus the first step was to get a list of all LINC students in Canada. This was available for Ontario only (through HART). For the six other provinces involved in LINC, SPO (numbering about 40) were asked for a list of their LINC classes and the number of learners in each. HART and survey data showed there were 27,470 LINC learners in Canada during the last week in April (this excludes only a few SPOs outside of Ontario that did not return a SPO survey)Footnote 61. An SPSS data set was created with all 27,470 students. The file was listed by class (which weights each class by its number of learners). It was randomly shuffled to guard against any inadvertent ordering effects. The total number 27,470 was divided by the number of classes to be chosen (70) to give the sampling interval of 392. A random number between 1 and 392 was chosen to determine the first class → 173. The class that learner 173 belongs to was then chosen. Then 392 was added to the random number to give 565; that person’s class was selected. And so on until 70 classes were selected. The final stage was to select 10 surveys at random from each class.

Choice of dependent variables

An important methodological consideration was the choice of dependent variables for the analyses. The evaluation used difference scores (also called change or gain scores) computed by subtracting the pretest from the posttest CLBA score for each skill area. Some researchers (e.g., Cronbach & Furby, 1970; Werts & Linn, 1970)Footnote 62have warned against use of difference scores because they often have little variability and they frequently correlate with the initial level of the characteristic measured: in short, they tend to be unreliable. Others (e.g., Rogosa & Willett, 1983; Allison, 1990; Williams & Zimmerman, 1996; Collins, 1996)Footnote 63, however, disagree, asserting that difference scores provide unique information on individual change and thus should not be dismissed. The emerging consensus is that for experimental designs, ANCOVA is preferable to difference scores because it is more powerful, but for quasi-experimental designs difference scores are the better alternativeFootnote 64.

When data meet certain conditions, the use of difference scores is not be problematic. Analysis of change scores can be a reasonable alternative when there is a high correlation between baseline and follow up measurementsFootnote 65. For this evaluation these correlations are: Listening = .80; Speaking = 0.83; Reading = 0.76; Writing = 0.76.
Another condition is a strong intervention: “In order for gain scores to be reliable it is necessary for the intervention between the two testing occasions to be relatively potent and for the instrumentation to be specially designed to be sensitive enough to detect changes attributable to the intervention."Footnote 66 There is a strong argument that this is the case for LINC.

Finally, very substantial decreases in post-test variance would be a warning sign that difference scores should not be used. This is not the case, so it reasonable to use difference scores for this evaluation.

  Listening Speaking Reading Writing
Original  3.329  3.410  4.579  3.235
Current  3.717  3.724  4.182  2.726

Surveying former LINC learners

One drawback of the multistage option is that it excluded former LINC learners, thus yielding less valuable information on longer term impacts of LINC. CIC expressed interest in hearing from former LINC clients to learn why they discontinued in LINC, what they thought of the LINC class and what they are doing now. This presented a significant challenge because former students are very hard to track down: newcomers move a lot. In fact, the evaluators believed the challenge was difficult enough that it was not a part of the original plans for the evaluation. But CIC wanted an attempt.

The first problem was identifying a sampling frame. The iCAMS database would have been ideal but CIC determined that confidentiality issues made this an unlikely option. The only other alternative was the HART database in Ontario. Although this is an excellent database it covers only Ontario LINC clients. Thus the first shortcoming of the sample: it represents Ontario only. The second problem was reaching the sample with the survey. Confidentiality is, of course, also a concern with HART data. The Centre for Education and Training was willing to release email addresses (with CIC’s approval) but not client names and addresses. Therefore, an email survey was the only possible mode for conducting the survey. Unfortunately only a small percentage of newcomers have email addresses. Moreover, it was also anticipated that many newcomers would not understand the survey. Since this was not a part of the proposed evaluation there were no funds available for translation into several languages. Other problems included the natural tendency of many people to distrust unsolicited emails with attachments and to delete the email; and surveying in summertime. All these problems yielded a response rate of only 17% despite two follow-ups. That leaves a chance for non-response bias. A separate report on the survey results was submitted to CIC.

Page details

Date modified: