ARCHIVED – Enhanced Language Training Initiative: Formative Evaluation

Appendix B: Analysis of Database Issues

Analysis of Database Gaps

Project level

The evaluation team determined that there are a total of 253 projects in the database provided. There is no overall listing of projects available, so it is impossible for the team to determine the extent to which this database is complete and/or representative of the projects funded under the ELT program.

Individual level

The evaluation team determined that there are a total of 2,552 individual records in the database representing 2,488 unique clients. These 2,488 clients are associated with 47 projects. The evaluation team undertook a series of calculations in the attempt to determine the potential gaps at the individual level in the database (i.e., missing individual records).

  • Three regions had not agreed to provide individual level data to the database, namely BC, MB, and SK. As a result, the database is not expected to include client data from these three provinces.
  • Client records are not expected to be available from development projects.
    Given the large variation in size of projects ranging from a few thousand dollars to over one million, it does not make sense to take an “average” number of participants per project given the immense differences in magnitude.
  • A relatively complete variable for the non-development only projects is the predicted number of clients. According to the projects that have both predicted clients and enrolled clients, there appears to be an approximate 75% meeting of targeted number.

Given these assumptions, we have calculated the expected number of participants in the database using the following steps:

  • Exclude all projects that are associated with BC, MB, and SK
  • Exclude all projects that are development only projects
  • Sum the total number of participants expected from the remaining projects
  • Discount this by 25%
  • Result is expected number of clients.

Resulting in the following calculations by step:

  • Start with 253 projects - 154 projects remain with 7,958 participants
  • 114 projects remain with 7,339
  • 7,339 participants predicted (note: this is likely an underestimate given some projects are missing data for this variable)
  • 5,504
  • Would expect 5,504 clients – database includes 2,552 records or 46% of expected individual level records.

B. Analysis of Data Quality Issues

This section presents a summary of the data quality for individual and project level data, by key variables

1. Participant Profile, by Key Variable

Area Variable % Complete Quality Considerations
Age Age 81% (2059/2552)
  • 3 cases are “0” years old
  • 1 case is “331” years old
  • 6 cases are greater than 65 years old (66-74)
Age Date of Birth 81% (2059/2552)
  • 1 case is “2006”
  • 6 cases are prior to 1940 (1933-1939)
Country of Origin Country of Origin 97% (2463/2552)
  • multiple spellings; no standardized categories or names
Previous Occupation NOC BeforeELT 82% (2078/2552)
  • 1 case is “-61”
  • 41 cases coded as “N/A
Immigration Status Immigration Category 92% (2344/2552)  
Lang. Skills Listening (pre) Listening_before 88% (2248/2552)  
Lang. Skills Speaking (pre) Speaking_before 88% (2235/2552)  
Lang. Skills Reading (pre) Reading_before 88% (2250/2552)  
Lang. Skills Writing (pre) Writing_before 88% (2249/2552)  
Full time program Full time program 94% (2401/2552)
  • it appears that this is a program variable that is reported on within the individual client questionnaire
Part time program Part time program 92% (2357/2552)
  • it appears that this is a program variable that is reported on within the individual client questionnaire
Language Training Component Language training 94% (2406/2552)
  • it appears that this is a program variable that is reported on within the individual client questionnaire
Internship Component Internship 87% (2211/2552)
  • it appears that this is a program variable that is reported on within the individual client questionnaire
Mentoring Component Mentoring 83% (2104/2552)
  • it appears that this is a program variable that is reported on within the individual client questionnaire
Networking Component Networking 85% (2178/2552)
  • it appears that this is a program variable that is reported on within the individual client questionnaire
Volunteering Component Volunteering 80% (2049/2552)
  • it appears that this is a program variable that is reported on within the individual client questionnaire

2. Participant Outcomes, by Key Variable

Area Variable % Complete Quality Considerations

SKILL OUTCOMES

Lang. Skills Listening (post) Listening_exit 64% (1625/2552)
  • Approximately one-third of individual records missing data – should be used with caution
Lang. Skills Speaking (post) Speaking_exit 64% (1622/2552)
  • Approximately one-third of individual records missing data – should be used with caution
Lang. Skills Reading (post) Reading_exit 64% (1634/2552)
  • Approximately one-third of individual records missing data – should be used with caution
Lang. Skills Writing (post) Writing_exit 64% (1631/2552)
  • Approximately one-third of individual records missing data – should be used with caution
OUTCOMES IMMEDIATELY AFTER ELT
Employment Status immediately following ELT Reported Outcomes Immediately following ELT 38% (975/2552)
  • Approximately 60% of individual records are missing data – likely significant biases in existing data due to quantity of missing data – should not be used
Employment Status at Follow-up Reported Outcomes According to Followup 39% (987/2552)
  • Approximately 60% of individual records are missing data – likely significant biases in existing data due to quantity of missing data – should not be used
Employed commensurate immediately after ELT Employed commensurate immediately after ELT 23% (592/2552)
  • Approximately three-quarters of individual records are missing data – likely significant biases in existing data due to quantity of missing data – should not be used
Employed not commensurate immediately after ELT Employed not commensurate immediately after ELT 23% (590/2552)
  • Approximately three-quarters of individual records are missing data – likely significant biases in existing data due to quantity of missing data – should not be used
Unemployed immediately after ELT Unemployed immediately after ELT 25% (627/2552)
  • Approximately three-quarters of individual records are missing data – likely significant biases in existing data due to quantity of missing data – should not be used
OUTCOMES AT FOLLOW-UP
Enrolled in further education immediately after ELT Enrolled in further education immediately after ELT 11% (282/2552)
  • Approximately nine out of ten individual records are missing data – likely significant biases in existing data due to quantity of missing data – should not be used
Employed commensurate at follow-up Employed commensurate at follow-up 15% (381/2552)
  • Approximately eight out of ten individual records are missing data – likely significant biases in existing data due to quantity of missing data – should not be used
Employed not commensurate at follow-up Employed not commensurate at follow-up 15% (382/2552)
  • Approximately eight out of ten individual records are missing data – likely significant biases in existing data due to quantity of missing data – should not be used
Unemployed at follow-up Unemployed at follow-up 17% (425/2552)
  • Approximately eight out of ten individual records are missing data – likely significant biases in existing data due to quantity of missing data – should not be used
Enrolled in further education at follow-up Enrolled in further education at follow-up 7% (192/2552)
  • Approximately nine out of ten individual records are missing data – likely significant biases in existing data due to quantity of missing data – should not be used

3. Project Level Data, by Key Variable

Area Variable % Complete Quality Considerations
Fiscal Year Fiscal Year 100% (253/253)
  • All cases are within the 4 year time frame
Service Provider Organization Service Provider Organization 100% (253/253)
  • There are 165 SPOs listed but 25 of these are duplicates given that there is not standardized spelling and entry for these text fields.
Project Title Project Title 100% (253/253)
  • Some dubious titles such as “ELT” or “ELT project”
Type of project such as development, delivery or both Project Type 100% (253/253)
  • Complete and standardized spelling/label employed
Region Region 100% (253/253)
  • There are some issues with overlapping categories with this variable (e.g., “British Columbia” and “British Columbia & Yukon”).
  • There is not correspondence with this variable and “province” variable for some cases
City City 97%
(245/253)
  • There are not standardized spellings or consensus on format. In some cases, multiple cities are entered in the field as one category. In other cases, the city is a broader area (e.g., Vancouver Island)
Province Province 100%
(253/253)
  • There is not correspondence with this variable and “region” variable for some cases
NOC NOC codes

43%
108/253)

  • Over one half of projects do not have associated NOC codes. Data should only be used with extreme caution.
  • Challenging to derive from occupation field given wide open text capture and no standardization of field.
Project $ amount Value of Project 79%
(200/253)
  • 18 cases have a value of $0
  • One in 5 cases do not have any associated $ (blank)
  • Total value of projects equal $24.1 M. This should be verified against actual funds allocated through CAs via the financial system.
Projected number of participants Participants Projected 63%
(116/183)
excludes development projects
  • 6 delivery projects have projected participants?
  • Wide range of participants from 3 to 5,000 – indicates that there may be some issues with definition of “project”
  • Data should only be used with extreme caution.
Participants enrolled Participants Enrolled 31%
(57/183)
excludes development projects
  • Only one in three projects reporting participants enrolled. Data should likely not be used.
Participants completed Participants Completed 22%
(40/183)
excludes development projects
  • Only one in five projects reporting participants completed. Data should likely not be used.

Page details

Date modified: