Biological test method for measuring terrestrial plants exposed to contaminants in soil: appendix I
Appendix I - Instruction on the Derivation of ICps Using Linear and Nonlinear Regression Analyses
I.1 Introduction
This appendix provides instruction for the use of linear and nonlinear regression analyses to derive, based on the concentration-response relationships for quantitative endpoint data (in this instance, the mean length and dry mass of seedling shoots and roots), the most appropriate ICps. It represents an adaptation and modification of the approach described by Stephenson et al. (2000b). Instructions herein are provided using Version 11.0 of SYSTATFootnote75; however, any suitable software may be used. The regression techniques described in this appendix are most appropriately applied to continuous data from tests designed with ten or more concentrations or treatment levels (including the negative control treatment). The test design for measuring the effects of prolonged exposure on various plant species is summarized in Table I.1.
An overview of the general process used to select the most appropriate regression model for each data set under consideration is presented in Figure 3 within the main text (see Section 4.8.3.1).
The reader is encouraged to refer to the appropriate sections within this biological test method document, as well as the sections on regression analyses within the “Guidance Document on Statistical Methods for Environmental Toxicity Tests” (EC, 2004a) before data analyses. Environment Canada (2004a) also contains several additional references for the statistical analysis of quantitative test data using linear and nonlinear regression procedures. Some of the related guidance from these documents has been provided in this appendix, where appropriate.
Parameter | Description |
---|---|
Test type | whole soil toxicity test; no renewal (static test) |
Test duration | 14 days for barley, cucumber, durum wheat, lettuce, radish, red clover, and tomato; 21 days for alfalfa, blue grama grass, carrot, northern wheatgrass, and red fescue |
Test species | for monocotyledons: choose barley (Hordeum vulgare), blue grama grass (Bouteloua gracilis), durum wheat (Triticum durum), northern wheatgrass (Elymus lanceolatus; formerly named Agropyron dasystachyum), or red fescue (Festuca rubra); for dicotyledons: choose alfalfa (Medicago sativa), carrot (Daucus carota), cucumber (Cucumis sativus), lettuce (Lactuca sativa), radish (Raphanus sativus), red clover (Trifolium pratense), or tomato (Lycopersicon esculentum) |
Number of replicates | ≥ 4 replicates/treatment if equal replicate test design; if unequal replicates among test treatments, use:
|
Number of treatments | negative control soil and ≥ 9 test concentrations as a minimum; however, ≥ 11 concentrations plus a negative control are strongly recommended |
Statistical endpoints | Quantal:
|
Statistical endpoints | Quantitative:
|
I.2 Linear and Nonlinear Regression Analyses
I.2.1 Creating Data Tables
Note: The statistical analysis must encompass the transformation of the concentrations logarithmically (e.g., log10 or loge). If the concentrations fall below one (1) (e.g., 0.25), then the data can be transformed by transforming the units (e.g., from mg/kg to µg/g) with a multiplication factor (e.g., 1000); the modified data are then transformed logarithmically. The transformation can be done either in the original electronic spreadsheet, or when the original data are transferred to the SYSTAT data file.
- Open the appropriate file containing the data set in an electronic spreadsheet.
- Open the SYSTAT program. In the main screen, go to File, New, and then Data. This will open up an empty data table. Insert the variable names into the column heading by double-clicking on a variable name, which opens the ‘Variable Properties’ window. Insert an appropriate name for the variable of interest within the ‘Variable name’ box, and select the variable type; additional comments can be inserted within the ‘Comments:’ box. For example, the following variable names might be used:
conc = concentration or treatment level
logconc = log10 transformation of concentration or treatment level
rep = replicate within a treatment level
mnlengths = mean length of shoots
mnlengthr = mean length of roots
drywts = dry weight of shoots
drywtr = dry weight of roots - The data can now be transferred. To transfer the data, copy and paste each column from the electronic spreadsheet containing the concentrations, the replicates, and associated mean values, to the SYSTAT data table.
- Save the data by going to File, then Save As; a ‘Save As’ window will appear. Use appropriate coding to save the data file. Select Save when the file name has been entered.
- Record the file name of the SYSTAT data file in the electronic spreadsheet containing the original data.
- If the data (i.e., the test concentrations) require transformation, the data can be transformed by selecting Data, Transform, and then Let.... Once in the Let...function, select the column heading containing the appropriate header for the transformed data (e.g., logconc), and then select Variable within the ‘Add to’ box to insert the variable into the ‘Variable:’ box. Select the appropriate transformation (e.g., L10 for log10 transformation or LOG for the natural logarithm) in the ‘Functions:’ box (the ‘Function Type:’ box should be Mathematical), and then select Add to insert the function into the ‘Expression:’ box. Select the column heading containing the original untransformed data (i.e., ‘conc’ for concentration or treatment level), followed by Expression within the ‘Add to’ box to insert the variable into the ‘Expression:’ box. If a multiplication factor is required to adjust the concentration before the log-transformation, this step can be completed within the ‘Expression:’ box (e.g., L10[conc*1000]). Select OK when all desired transformations have been completed. The transformed data will appear in the appropriate column. Save the data (i.e., select File, followed by Save).
Note: The log10 of the negative control treatment cannot be determined (i.e., the log10 of zero is undefined); therefore, assign the negative control treatment level a very small number (e.g., 0.001) known or assumed to be a no-effect level, to include this treatment in the analysis and differentiate it from the other transformed treatment levels. - From the data table, calculate and record the mean of the negative controls for the variable under study; each measurement endpoint is statistically analyzed independently. The mean value of these control data will be required when estimating the model parameters. In addition, determine the maximum value within the data set for that particular variable and round up to the nearest whole number. This number is used as the maximum value of the y-axis (i.e., ‘ymax’) when creating a graph of the regressed data.
I.2.2 Creating a Scatter Plot or Line Graph
The scatter plots and line graphs provide an indication of the shape of the concentration-response curve for the data set. The shape of the concentration-response curve can then be compared to each model (Figure I.1) so that the appropriate model(s) likely to best suit the data is (are) selected. Each of the selected models should be used to analyze the data. Subsequently, each model is reviewed, and the model that demonstrates the best fit is selected.
Exponential Model
IC50: mnlengths = a*exp(log((a-a*0.5-b*0.5)/a)*(logconc/x))+b
IC25: mnlengths = a*exp(log((a-a*0.25-b*0.75)/a)*(logconc/x))+b
Where:
a = the y-intercept (the control response)
x = ICp for the data set
logconc = the logarithmic value of the exposure concentration
b = a scale parameter (estimated between 1 and 4)

Gompertz Model
IC50: mnlengths = g*exp((log(0.5))*(logconc/x)^b)
IC25: mnlengths = g*exp((log(0.75))*(logconc/x)^b)
Where:
g = the y-intercept (the control response)
x = ICp for the data set
logconc = the logarithmic value of the exposure concentration
b = a scale parameter (estimated between 1 and 4)

Hormesis Model
IC50: mnlengthr = (t*(1+h*logconc))/(1+((0.5+h*logconc)/0.5)*(logconc/x)^b)
IC25: mnlengthr = (t*(1+h*logconc))/(1+((0.25+h*logconc)/0.75)*(logconc/x)^b)
Where:
t = the y-intercept (the control response)
h = the hormetic effect (estimated between 0.1 and 1)
x = ICp for the data set
logconc = the logarithmic value of the exposure concentration
b = a scale parameter (estimated between 1 and 4)

Linear Model
IC50: drywtr = ((-b*0.5)/x)*logconc+b
IC25: drywtr = ((-b*0.25)/x)*logconc+b
Where:
b = the y-intercept (the control response)
x = ICp for the data set
logconc = the logarithmic value of the exposure concentration

Logistic Model
IC50: drywts = t/(1+(logconc/x)^b)
IC25: drywts = t/(1+(0.25/0.75)*(logconc/x)^b)
Where:
t = the y-intercept (the control response)
x = ICp for the data set
logconc = the logarithmic value of the exposure concentration
b = a scale parameter (estimated between 1 and 4)

Figure I.1 SYSTAT Version 11.0 Equations for Linear and Nonlinear Regression Models and Example Graphs of the Observed Trends for Each Model
“mnlengths” and “mnlengthr” refer to the mean length of shoo ts or roots, respectively, and “drywts” and “d rywtr” refer to the individual mean shoot or root dry weight, respectively
- Select Graph, Summary Charts, and then Line.... Select the independent variable (e.g., logconc), followed by Add to insert the variable into the ‘X-variable(s):’ box. Select the dependent variable under examination, followed by Add to insert the variable into the ‘Y-variable(s):’ box. Select OK. A graph will be displayed within the ‘Output Pane’ of the main SYSTAT screen containing the mean values for every treatment level; to view a larger version of the graph, simply select the ‘Graph Editor’ tab located below the central window. A scatter plot of the data can also be viewed by selecting Graph, Plots, and then Scatterplot... and following the same instructions for inserting the x- and y-variables. The graphs will provide an indication as to the general concentration-response trend allowing the selection of the potential model(s) of best fit to be chosen, in addition to an estimation of the ICp of interest.
Note: The main SYSTAT screen is divided into three parts. The left-hand side of the screen (i.e., ‘Output Organizer’ tab) provides a list of all of the functions completed (e.g., graphs) - each function can be viewed by simply selecting the desired icon. The right-hand side of the screen forms the central window in which the general output of all of the functions completed (e.g., regression, graphs) can be viewed. The tabs below this central window allow the user to toggle between the data file (i.e., ‘Data Editor’), individual graphs (i.e., ‘Graph Editor’) and the output (i.e., ‘Output Pane’). The various graphs produced can be viewed individually within the ‘Graph Editor’ tab by selecting the graph of interest within the left-hand side of the screen (i.e., ‘Output Organizer’ tab). The bottom portion of the screen displays the command codes used to derive the desired functions (e.g., regression and graphing codes). The ‘Log’ tab within this command screen displays a history of all of the functions that have been completed. - Visually estimate and record an estimate of the ICp of interest (e.g., IC25) for the data set. For example, for an IC25, divide the average of the controls by four, and find this value on the y-axis. Estimate a horizontal line from the y-axis until the line intercepts the data points. At this intersection point, extend a vertical line down towards the x-axis and record this concentration value as an estimate of the IC25.
- Using the scatter plots or line graphs, select the potential model(s) that will best describe the concentration- response trend (refer to Figure I.1 for an example of each model).
I.2.3 Estimating the Model Parameters
- Select File, Open, and then Command.
- Open the file containing the command codes for the particular model chosen from Section I.2.2 (i.e., select the appropriate file, followed by Open):
nonline.syc = exponential model
nonling.syc = gompertz model
nonlinh.syc = logistic with hormesis model
linear.syc = linear model
nonlinl.syc = logistic model
The file will provide the command codes for the selected model within the appropriate tab of the command editor box at the bottom of the main screen. All of the command codes for deriving IC50s and IC25s are provided in Table I.2; however, the equations can be formatted to derive any ICp. For example, the command codes for the logistic model to derive an IC25 would be:
nonlin
print = long
model drywts = t/(1+(0.25/0.75)*(logconc/x)^b)
save resid1/ resid
estimate/ start = 85, 0.6, 2 iter = 200
use resid1
pplot residual
plot residual*logconc
plot residual*estimate - Type in the header of the column in the data table containing the variable of interest to be analyzed within the line entitled ‘model y=’ (where ‘y’ is the dependent variable, e.g., drywts).
- The 4th line of the text should read ‘save resida/ resid’, where ‘a’ indicates a number to which the residual file is assigned. Substitute this same number into the 6th line (i.e., ‘use resida’) so that the same file is used to generate a normal probability plot and graphs of the residuals. The command lines that follow provide instruction for the generation of a probability plot (i.e., ‘pplot residual’), the generation of a graph of residuals against the concentration or treatment level (i.e., ‘plot residual*logconc’), and a graph of the residuals against the predicted and fitted values (i.e., ‘plot residual*estimate’). These graphs are used to aid in the assessment of the assumptions of normality (e.g., probability plot) and homogeneity of the residuals (e.g., graphs of the residuals) when evaluating for the model of best fit (Section I.2.4).
- Substitute the mean of the controls and the estimated ICp (e.g., IC25) within the fifth line entitled ‘estimate/start=’ (refer to Table I.2 for details on the substitution for each model). These values were initially derived from examination of the scatter plot or line graph. The model, once it converges, will provide a set of parameters from which the ICp, and its 95% confidence limits, are reported (i.e., parameter ‘x’). It is essential that accurate estimates for each parameter be provided before running the model, or the iterative procedure used to derive the reported parameters might not converge. The scale parameter (Table I.2) is typically estimated to range from one to four. The number of iterations can be changed, but for this example, has been set to 200 (i.e., ‘iter = 200'). Typically, 200 iterations are sufficient for a model to converge; if more iterations are required, it is likely that the most appropriate model is not being applied.
- Select File, and then Submit Window to run the commands; alternatively, right-click the mouse and select Submit Window. This will generate a printout of the iterations, the estimated parameters, and a list of the actual data points with the corresponding predicted values and residuals. A preliminary graph of the estimated regression line will also be presented; this preliminary graph should be deleted. The graph can be deleted by selecting the graph in the left-hand window within the main screen. A normal probability plot and graphs of the residuals will also be presented.
Model | Command Codes | |
---|---|---|
Exponential | nonlin print = long model mnlengths = a*exp(log((a-a*0.25-b*0.75)/a)*(logconc/x))+b save resid1/ resid estimate/ start = 25a, 1b, 0.3citer = 200 use resid1 pplot residual plot residual*logconc plot residual*estimate |
where: aRepresents the estimate of the y-intercept (i.e., ‘a’) (the control response) bRepresents the scale parameter (i.e., ‘b’) (estimated between 1 and 4) cRepresents the estimate of the ICp for the data set (i.e., ‘x’) |
Gompertz | nonlin print = long model mnlengths = g*exp((log(0.75))*(logconc/x)^b) save resid2/ resid estimate/ start = 16a, 0.8b, 1citer = 200 use resid2 pplot residual plot residual*logconc plot residual*estimate |
where: aRepresents the estimate of the y- intercept (i.e., ‘g’) (the control response) bRepresents the estimate of the ICp for the data set (i.e., ‘x’) cRepresents the scale parameter (i.e., ‘b’) (estimated between 1 and 4) |
Hormesis | nonlin print = long model mnlengthr = (t*(1+h*logconc))/(1+((0.25+h*logconc )/ 0.75)*(logconc/x)^b) save resid3/ resid estimate/start = 48a, 0.1b, 0.7c, 1d iter = 200 use resid3 pplot residual plot residual*logconc plot residual*estimate |
where: aRepresents the estimate of the y- intercept (i.e., ‘t’) (the control response) bRepresents the hormetic effect (i.e., ‘h’) (estimated between 0.1 and 1) cRepresents the estimate of the ICp for pplot residual the data set (i.e., ‘x’) dRepresents the scale parameter (i.e., ‘b’) (estimated between 1 and 4) |
Linear | nonlin print = long model drywtr = ((-b*0.25)/x)*logconc+b save resid4/ resid estimate/start = 5a, 0.7b iter = 200 use resid4 pplot residual plot residual*logconc plot residual*estimate |
where: aRepresents the estimate of the y- intercept (i.e., ‘b’) (the control response) bRepresents the estimate of the ICp for the data set (i.e., ‘x’) |
Logistic | nonlin print = long model drywts = t/(1+(0.25/0.75)*(logconc/x)^b) save resid5/resid estimate/start = 85a, 0.6b, 2citer = 200 use resid5 pplot residual plot residual*logconc plot residual*estimate |
where: aRepresents the estimate of the y- intercept (i.e., ‘t’) (the control response) bRepresents the estimate of the ICp for the data set (i.e., ‘x’) cRepresents the scale parameter (i.e., ‘b’) (estimated between 1 and 4) |
Note: “mnlengths” and “mnlengthr” refer to mean length of shoots or roots, respectively; “drywts” and “drywtr” refer to mean individual shoot or root dry weight, respectively; “pplot” refers the creation of a probability plot based on the residuals derived from the regression model under study.
I.2.4 Examining the Residuals and Test Assumptions
An examination of the residuals for each model tested helps to determine whether assumptions of normality and homoscedasticity have been met. If any of the assumptions cannot be met, regardless of the model examined, a statistician should be consulted for further guidance on assessing additional models or the data should be re- analyzed using the less desirable linear interpolation method of analysis (using ICPIN; see Section 4.8.3.2).
I.2.4.1 Assumptions of normality.
Normality should be assessed using Shapiro-Wilk’s test as described in EC (2004a); Section I.2.4.3 provides instructions for conducting this test. The normal probability plot, displayed in the ‘Output Pane’, can also be used to evaluate whether the assumption of normality is met. The residuals should form a fairly straight line diagonally across the graph; the presence of a curved line represents deviation from normality. The normal probability plot should not, however, be used as a stand-alone test for normality, since the detection of a ‘normal’ (e.g., straight) or ‘non-normal’ (e.g., curved) line depends on the subjective assessment of the user. If the data are not normally distributed, then the user should try another model, or should consult a statistician for further guidance or the data should be analyzed using the less desirable linear interpolation method of analysis.
I.2.4.2 Homogeneity of residuals.
Homoscedasticity (or homogeneity) of the residuals should be assessed using Levene’s test as described in EC (2004a) (Section I.2.4.3 provides instructions for conducting this test), and by examining the graphs of residuals against the actual and predicted (estimated) values. Homogeneity of the residuals is described by an equal distribution of the variance of the residuals across the independent variable (i.e., concentration or treatment level) (Figure I.2A). Levene’s test, if significant, will indicate that the data are not homogeneous. If the data (as indicated by Levene’s test) are heteroscedastic (i.e., not homogeneous), then the graphs of the residuals should be examined. If there is a significant change in the variance and the graphs of the residuals produce a distinct fan or ‘V’ pattern (refer to Figure I.2B for a plot of the ‘residual*estmate’; a corresponding ‘V’ pattern in the opposite direction also occurs in the plot of the ‘residual*logconc’), then the data analysis should be repeated using weighted regression. Alternatively, a divergent pattern suggestive of a systematic lack of fit (Figure I.2C) will indicate that an inappropriate or incorrect model was selected.
I.2.4.3 Assessing assumptions of normality and homogeneity of residuals.
SYSTAT Version 11.0 can perform both Shapiro-Wilk’s and Levene’s tests to assess the assumptions of normality and homogeneity of residuals. Levene’s test can only be performed by conducting an analysis of variance (ANOVA) on the absolute values of the residuals derived in Section I.2.3.
- Select File, Open, and then Data to open the data file containing the residuals created in Section I.2.3 (e.g., resid1.syd).
- Insert a new variable name into an empty column by double-clicking on the variable name, which opens the ‘Variable Properties’ window. In this window, insert an appropriate name for the transformed residuals (e.g., absresiduals) into the ‘Variable name:’ box. Transform the residuals by selecting Data, Transform, and then Let.... Once in the Let... function, select the column heading containing the appropriate header for the transformed data (e.g., absresiduals), and then select Variablewithin the ‘Add to’ box to insert the variable into the ‘Variable:’ box. Select the appropriate transformation (e.g., ABS for the transformation of data into its absolute form) in the ‘Functions:’ box (the ‘Function Type:’ box should be Mathematical), and then select Add to insert the function into the ‘Expression:’ box. Select the column heading containing the original untransformed data (i.e., residuals), followed by Expression within the ‘Add to’ box to insert the variable into the ‘Expression:’ box. Select OK; the transformed data will appear in the appropriate column. Save the data.
- To perform Shapiro-Wilk’s test, select Analysis, Descriptive Statistics, and then Basic Statistics.... A‘Column Statistics’ window will appear. Select the residuals from the ‘Available variable(s):’ box, followed by Add to insert this variable into the ‘Selected variable(s):’ box. Within the ‘Options’ box, select the Shapiro-Wilk normality test, followed by OK. A small table will appear within the SYSTAT Output Organizer window, where the Shapiro-Wilk critical value (i.e., ‘SW Statistic’) and probability value (i.e., SW P-Value’) will be displayed. A probability value greater than the usual criterion of p > 0.05 indicates that the data are normally distributed.
- To perform Levene’s test, select Analysis, Analysis of Variance (ANOVA), and then Estimate Model..., an ‘Analysis of Variance: Estimate Model’ window will appear.
- Select the variable within which the data are to be grouped (e.g., logconc), and place this variable into the ‘Factor(s):’ box by selecting Add.
- Select the transformed residuals (i.e., absresiduals), followed by Add, to insert the variable into the ‘Dependent(s):’ box. Select OK. A graph of the data and a printout of the output will appear within the ‘Output Pane’ tab. A probability value greater than the usual criterion of p > 0.05 indicates that the data are homogeneous.
Figure I.2 Graph of the Residuals Against the Predicted (Estimated) Values (i.e., ‘residuals*estimate’) Indicating Homoscedasticity (A), and Two Types of Heteroscedasticity; One Demonstrating a Fan or ‘V’ Shape (B) Requiring Further Examination Using Weighted Regression, and a Second Demonstrating a Systematic Lack of Fit (C) as a Result of the Selection of an Incorrect Model
(A)

(B)

(C)

I.2.5 Weighting the Data
If the residuals are heteroscedastic, as indicated by Levene’s test, and there is a significant change in variance across treatment levels (i.e., the presence of a distinct fan or ‘V’ shape; refer to Figure I.2B), the data should be re-analyzed using weighted regression. Weighted regression involves using the inverse of the variance of observations within each concentration or treatment level as the weights. When performing the weighted regression, the standard error for the ICp (presented in SYSTAT as the asymptotic standard error (‘A.S.E.’; refer to Figure I.3) is compared to that derived from the unweighted regression. If there is a difference of greater than 10% between the two standard errors, then the weighted regression is selected as the regression of best choice. However, if there is a significant change in variance across all treatment levels, and there is less than a 10% difference in the standard error between the weighted and unweighted regressions**, then the user should consult a statistician for further guidance and the application of additional models, or the data could be re-analyzed using the less desirable linear interpolation method of analysis. The comparison between weighted and unweighted regression is completed for each of the selected models while proceeding through the process of final model selection (i.e., model and regression of best choice). Alternatively, if Levene’s test demonstrates that the data are not homogeneous, and the graphs of the residuals demonstrate a non-divergent pattern (e.g., Figure I.2C), an inappropriate or incorrect model might have been selected. The user is then advised to consult a statistician for further guidance on the use and application of alternate models.
** The value of 10% is only a “rule-of-thumb” based up on experience. Objective tests for the improvement due to weighting are available, but beyond the scope of this document. Weighting should be used only when necessary, since the procedure can introduce additional complications to the modeling procedure. A statistician should be consulted when weighting is necessary, but the parameter estimates are nonsensical.
- Select File, Open, and then Data. Select the file containing the data set to be weighted. Insert the two new variable names into the column heading by double-clicking on a variable name, which opens the ‘Variable Properties’ window. In this window, insert an appropriate name for the variable of interest, select the variable type, and specify comments if desired. The two new column headings should indicate the variance of a particular variable (e.g., vardrywts), and the inverse of the variance for that variable (e.g., varinvsdrywts). Save the data file by selecting File, and then Save.
- Select Data, followed by By Groups.... Select the independent variable (i.e., logconc), followed by Add, to insert this variable into the ‘Selected variable(s):’ box; this will enable the determination of the variance of the variable of interest by concentration or treatment level (i.e., “group”). Select OK.
- Select Analysis, Descriptive Statistics, and then Basic Statistics.... Select the variable of interest to be weighted (e.g., drywts), followed by Add to insert this variable into the ‘Selected variable(s):’ box. Select Variancewithin the ‘Options’ box, followed by OK. This function will display the variance for the variable of interest, grouped by concentration or treatment level within the ‘Output Pane’ tab of the main screen.
- Select Data, By Groups..., and then click on the box beside Turn off, and select OK so that any analysis that follow will not be analyzed according to each individual concentration or treatment level; the analysis should consider the entire data set as a whole.
- Return to the data file by selecting the ‘Data Editor’ tab within the main screen. Transfer the variances for each concentration or treatment level to the corresponding concentration within the variance column (e.g., vardrywts). Note that the variance is the same among replicates within a treatment.
- Select Data, Transform, and then Let..., and select the column heading containing the inverse of the variance (e.g., varinvsdrywts) for the variable of interest, followed by Variablewithin the ‘Add to’ box to insert the variable into the ‘Variable:’ box. Select the ‘Expression:’ box and type in ‘1/’, and then select the column heading containing the variances (e.g., vardrywts) of the variable of interest for each replicate and concentration, followed by Expression within the ‘Add to’ box to insert the variable into the ‘Expression:’ box. Select OK. The inverse of the variance for each replicate and concentration will be displayed in the appropriate column. Save the data by selecting File, and then Save.
- Select File, Open, and then Command; open the file containing the command codes for estimating the equation parameters (e.g., Section I.2.3, step 2) for the same model selected for the unweightedanalysis.
- Insert an additional row after the third line by typing ‘weight=varinvsy’, where ‘y’ is the dependent variable to be weighted (e.g., weight=varinvsdrywts), as per the shaded area below:
nonlin
print= long
model drywts = t/(1+(0.25/0.75)*(logconc/x)^b)
weight=varinvsdrywts
save resid2/ resid
estimate/ start = 85, 0.6, 2 iter=200
use resid2
pp lot residual
plot residual*logconc
plot residual*estimate - Assign a new number for the residuals within the line entitled ‘save resida’ (where ‘a’ represents the assigned number).
- Substitute the mean of the controls and the estimated ICp within the line entitled ‘estimate/ start...’ (refer to Table I.2 for details on the substitution for each model). These estimates will be the same as those used for the unweighted analysis.
- Select File, and then Submit Window to run the commands. This will generate output of the iterations, the estimated parameters, and a list of the data points with the corresponding predicted data points and residuals within the ‘Output Pane’ tab of the main screen. A preliminary graph of the estimated regression line will also be presented; this should be deleted. A normal probability plot and graphs of the residuals will also be presented.
- Proceed with the analysis as described in Section I.2.4 to ensure that all model assumptions have been met.
- Compare the weighted regression analysis with the unweighted regression analysis. Select the weighted regression if weighting reduced the standard error for the ICp by 10%, relative to the unweighted regression analysis.
Figure I.3 Example of the Initial Output Derived using the Logistic Model in SYSTAT Version 11.0.The initial output provides the residual mean square error used to select the model of best choice, as well as the ICps, the standard error for the estimate, and the upper and lower 95% confidence limits. The number of cases displayed has been shortened for the purpose of this diagram; however, the output within SYSTAT displays all cases including the actual variable measurement and the corresponding predicted estimate and residual.

I.2.6 The Presence of Outlier(s) and Unusual Observations
Outliers are indicative of a measurement that does not seem to fit the other values derived from the test. Outliers and unusual observations can be identified by examining the fit of the concentration-response curve relative to all data points, and by examining the graphs of the residuals. If an outlier has been observed, the test records (e.g., hand-recorded and electronic data sheets and experimental conditions) should be scrutinized for human error. If the outlier is a data point that has been obtained through a transcription error that cannot be corrected, or through a faulty procedure, then the data point should be removed from the analysis. If an outlier has been identified, the analysis should be completed with and without the presence of the outlier. The decision on whether or not to remove the outlier should also take into consideration natural biological variation, and biological reasons that might have caused the apparent anomaly. Regardless of whether or not the outlier is removed, a description of the data, outliers, analyses with and without the outlier, and interpretive conclusions, must accompany the final analysis. If it appears as if there is more than one outlier present, the selected model should be re-assessed for appropriateness and alternative models considered. Additional guidance on the presence of outliers and unusual observations is provided in EC (2004a) and should be consulted for further details.
The Analysis of Variance (ANOVA) function within SYSTAT can be performed to determine whether or not the data contain outliers. However, ANOVA assumes that the residuals are normally distributed, and therefore, assumptions of normality must be met before to using the ANOVA to detect outliers. The presence of outliers can also be determined from the graphs of residuals.
- Perform an Analysis of Variance (ANOVA) as described in Section I.4 of this appendix, to determine whether any outliers exist. Any outlier(s) will be identified as a case number that corresponds with the row number in the SYSTAT data file. The program uses the studentized residuals as an indication of outliers; values >3 indicate the possibility of an outlier. This should be confirmed with the graphs of the residuals.
- If a decision is made to remove the outlier(s), delete the value from the original data table (file), and re-save the file under a new name (i.e., select File, and then Save As...). For example, the new file name might contain the letter ‘o’ (for outlier(s) removed) at the end of the file’s original name.
- Repeat the regression analysis with the outlier(s) removed, using the same model and estimated parameters that were used before the outlier(s) were removed. Alternatively, additional models may be used for analysis if the alternative model results in a better fit and smaller residual mean square error. If the removal of the outlier(s) does not result in a significant change to both the residual mean square error and the ICp (including its corresponding confidence intervals), then the individual performing the analysis must make a subjective decision (i.e., professional judgement) as to whether or not to include the outlier(s). Justification for the removal or inclusion of the outlier(s) must be recorded along with the final analysis.
I.2.7 Selection of the Most Appropriate Model
Once all of the contending models have been fit, each one should be assessed for normality, homogeneity of the residuals, and the residual mean square error. The model which meets all of the assumptions and has the smallest residual mean square error (refer to Figure I.3) should be selected as the most appropriate model. However, in the case where more than one model has the same residual mean square error, and all other factors are equivalent, the simplest model should be selected as the model of best choice. If a weighted regression was performed, the weighted and unweighted analyses should be compared and the weighted analysis selected if weighting reduced the standard error for the ICp by more than 10%. The residual mean square error is presented in the ‘Output Pane’ tab just following the iterations, and preceding the parameter estimates. However, if none of the models adequately fit the data, then the user is advised to consult a statistician for the application of additional models, or the data should be re-analyzed using the less desirable linear interpolation method of analysis (see Section 4.8.3.2).
Note: Since the concentration or treatment levels were logarithms in the calculations, the ICps and their confidence limits should be transformed to arithmetic values for the purpose of reporting them.
I.2.8 Creating the Concentration-Response Curve
Once an appropriate model has been selected, the concentration-response curve for that particular model must be generated.
- Within the command editor window at the bottom of the screen, copy the model equation (i.e., the equation after the ‘=’ sign, third line of the command codes depicted in Table I.2) from the command codes used to derive the estimates for the selected model; the equation should consist of the original alphabetic characters (e.g., t, b, h, etc.). The equation can be copied by highlighting the equation and selecting Edit, followed by Copy (or right-clicking the mouse and selecting Copy).
- Select File, Open, and then Command and open an existing graph command file (i.e., any file with ‘*.cmd’) similar to the following example (or, if and as necessary, create a new one), using the logistic model. The first plot (i.e., ‘plot’) is a scatter plot of the dependent variable against the log concentration series. The second plot (i.e., ‘fplot’) is the regression equation, which is superimposed upon the scatter plot.
Graph
begin
plot drywts*logconc/ title = ’Dry Mass of Barley Shoots’, xlab = ’Log(mg boric acid/kg soil d.wt)’,
ylab = ’Mass (mg)’,
xmax = 2, xmin = 0, ymax = 90, ymin = 0
fplot y = 80.741/(1+(0.25/0.75)*(logconc/0.611)^2.533); xmin = 0,
xmax = 2, xlab = ’‘ ymin = 0, ylab = ’‘, ymax = 90
end - Paste the previously copied equation in place of the pre-existing equation (as seen in the shaded area above) by highlighting the previous equation, and then selecting Edit, followed by Paste (or right-clicking the mouse and selecting Paste). Replace all of the alphabetical characters (e.g., t, b, h, x, a, etc.), together with the respective estimates, provided in the ‘Output Pane’ tab generated by the application of the selected model.
- Type in the correct information within the line entitled ‘plot y*logconc...’, where ‘y’ is the dependent variable under study (e.g., drywts). Adjust the ‘xmax’ (i.e., the maximum log-concentration used) and ‘ymax’ (refer to Section I.2.1, Step 7) numerical values accordingly. Ensure that all ‘xlab’ and ‘ylab’ (i.e., axis labels) entries are correct, if not, then adjust accordingly. Ensure that all quotation marks and commas are placed within the command program as depicted in the previous example; SYSTAT is case- and space- insensitive.
Note:
‘title’ refers to the title of the graph
‘xlab’ refers to the x-axis label
‘xmin’ refers to the minimum value requested for the x-axis
‘xmax’ refers to the maximum value requested for the x-axis
‘ylab’ refers the y-axis label
‘ymax’ refers to the maximum value requested for the y-axis
‘ymin’ refers to the minimum value requested for the y-axis
The ‘xmin’, ‘xmax’, ‘ymin’, and ‘ymax’ must be the same for both plots to superimpose the regression line accurately on the scatter plot of the data. An example of the final regression graph is provided in Figure I.1 for each of the five proposed models. - Select File, then Save As to save the graph command codes in an appropriate working folder using the same coding used to generate the data file, with indication as to which model the regression corresponds to. Select Save to save the file.
- Select File, then Submit Window to process the command codes. A graph of the regression, using the model estimate parameters for the selected model, will appear.
I.3 Determining Additional ICps
In some cases, it might be desirable to estimate another value for ‘p’ (besides or instead of an IC25). The models proposed by Stephenson et al. (2000b) enable the selection and determination of any ICp. The following section provides guidance on determining an IC20, however, the models can be changed to suit any ‘p’ value.
- Select File, Open, and then Command and open the file corresponding to the command codes used to generate the estimate parameters (refer to Table I.2 for the command codes for each model). Change the model equation such that it will calculate the desired ICp (e.g., IC20) by modifying the fractions used in each model. For example, to calculate an IC20 using the logistic model, the equation would change from ‘t/(1+(0.25/0.75)*(logconc/x)^b)’ (for calculating an IC25) to ‘t/(1(0.20/0.80)*(logconc/x)^b)’ (for calculating an IC20).
- Once the equation has been adjusted for the ICp of interest, follow each step outlined in Section I.2.3 of this appendix. However, substitute the estimated ICp (e.g., IC20) within the fifth line entitled ‘estimate/ start=’ (refer to Figure I.1 for details on the substitution for each model). These values were initially derived from an examination of the scatter plot or line graph. The model, once it converges, will provide a set of parameters from which the ICp, and its corresponding 95% confidence limits, are reported (i.e., parameter ‘x’).
- Proceed with the analysis as described in Sections I.2.4 to I.2.8 herein.
I.4 Analysis of Variance (ANOVA)
- Select File, Open, and then Data to open the data file containing all of the observations for the data set under examination.
- Select Analysis, Analysis of Variance (ANOVA), and then Estimate Model....
- Select the variable within which the data are to be grouped (e.g., logconc), and place this variable into the ‘Factor(s):’ box by selecting Add.
- Select the variable of interest (e.g.,drywts), followed by Add, to insert the variable into the ‘Dependent(s):’box.
- Select the box beside ‘Save’ (bottom left-hand corner of the ‘Analysis of Variance: Estimate Model’ window) and scroll down the accompanying selections to choose Residuals/Data. Type in an appropriate file name within the adjacent empty box to save the residuals (e.g., anova1). Select OK. A graph of the data and the generate output will appear within the ‘Output Pane’ tab. At this point, any outlier(s), based on the studentized residuals, will also be identified (refer to Section I.2.6 of this appendix for guidance on assessing outlier(s)).
- Assess the assumptions of normality and homogeneity of the residuals as per Section I.2.4 using the data file that was created to save the Residuals/Data prior to conducting the ANOVA (i.e., anova1). After assessing normality and homogeneity of the residuals using Shapiro-Wilk’s and Levene’s tests, respectively, the following coding may be used to examine the graphs of the residuals:
graph
use anova1
plot residual*logconc
plot residual*estimate