ARCHIVED – Health Status and Social Capital of Recent Immigrants in Canada: Evidence from the Longitudinal Survey of Immigrants to Canada

Econometric models

As mentioned above, the LSIC data is longitudinal, consisting of very large cross-sectional micro-units – which include thousands of individuals and three time periods. In order to model the probability of reporting as healthy among immigrants while taking into consideration individual heterogeneity, panel data models are applied to our regression analysis by controlling for the individual stock of social capital and other socio-demographic variables. The fundamental advantage of a panel data model is that it allows modelling differences in behaviour across individuals. Panel data modelling techniques focus on heterogeneity across units rather than time series autocorrelations.

The basic framework for the binary panel data models is a single equation model:

Explained below

    i = 1,…, n; t = 1,…, Ti.                                 

Explained below

Where y* is an unobserved latent variable of an immigrant’s likelihood of reporting being healthy. X is a collection of k independent time varying variables denoted by the vector x’ = (x1, x2, …, xk). Z is a collection of m independent time invariant variables denoted by the vector z’ = (z1, z2, …, zm). Both X and Z are observable. The regressors also include a set of dummy variables for each wave of the panel in order to capture time effects.  is an error term with mean zero and a standardized logistic distribution with variance π2/3. Subscript i is an index for cross section units and t is an index for time periods (T = 3). The unobserved individual effect v’ia capturing the heterogeneity across individuals that determine the good health probability includes a set of individual specific factors which are unobservable, such as individual difference in personality or ability, group or family specific characteristics, and health behaviours. It is assumed that vi and it are uncorrelated with each other.  

For the estimation of panel data model, the critical issue is whether the individual effects vi are correlated with the observed regressors X and Z (Greene 2002; Jones 2007). Compared to the random-effects model, the generalized estimating equations (GEE) approach proposed by Liang and Zeger (1986) and Zeger, Liang, and Albert (1988) can be used to estimate population-average effects. The GEE model is an extension of the generalized linear model (GLM) approach to longitudinal data analysis using quasi-likelihood estimation. The GEE model has consistent and asymptotically normal solutions, even with mis-specification of the correlation structure, because the assumption of independence of the unobserved individual effects with the explanatory variables is not required in the model (Hu et al. 1998). The GEE approach relaxes the strict independence assumption of random effects estimation and takes the dependence among units into consideration. Furthermore, time invariant variables such as immigration category, ethnic group, and region of origin can be included in the regression as part of X, which is impossible in the fixed effects model. The GEE model is appropriate when inferences about the population-average are the focus. In this paper, the average difference between groups with varied stock of social capital is of most importance, not the difference for any one immigrant. Thus we present our results from the GEE model framework in the current paper. [ Note 8 ]


8 In this paper, we only present the results from the GEE models, while the results from random effects and fixed effects models are available upon request.

Page details

Date modified: