-
Stepwise Procedures In Discriminant Analysis
-
-
-
INTERPRETATION OF LINEAR DISCRIMINANT FUNCTION
A close correspondence exists between interpreting a discriminant function and determining the contribution of each variable. The signs of the discriminant function coefficient are taken into account during the interpretation of a discriminant function. For example Y1 = 5x1 – 3x2 + 2x3 has a different meaning than does
Y2 = 5x1 + 3x2 + 2x3. This is because Y1 depends on the difference between x1 and x2, while Y2 is related to the sum of x1 and x2.
In ascertaining the contribution of each variable, the signs are ignored and coefficients are ranked in absolute value.
LIMITATIONS OF DISCRMINANT FUNCTIONS
Coefficient of the discriminant functions may change if variable(s) is/are added or deleted.
Coefficients of the discriminant functions may not be stable from sample to sample unless the sample size is large relative to the number of variables.
LIMITATIONS OF STEPWISE METHODS OF DISCRIMINANT ANALYSIS
There are three basic problems associated with the use of stepwise discriminant analysis.
(1). Incorrect Degrees of Freedom
Most computer packages like SPSS, MINITAB, that employ stepwise methods of discriminant analysis use incorrect degrees of freedom in the calculation of statistical tests of discriminant function analysis.
The use of incorrect degrees of freedom in the calculation of statistical tests for discriminant function makes the result of statistical tests of significance systematically biased in favour of spuriously high statistical significance. Caution should be taken by researchers or students against interpreting wrong results commonly obtained from computer packages.
In statistical analysis, degrees of freedom reflect the number of unique pieces of information available for a given research situation. Degrees of freedom constraint the number of enquiries we may direct at our data and they are the currency we spend in analysis. (Thompson 1988). The pre-set degrees of freedom in any computerized stepwise procedure are one for each variable selected or included in the analysis.
Pre-set degrees of freedom are like coins that we can spend to explore out data. Every predictor variable used is charged one degree of freedom. If the original number of predictor variable was four then the correct “charge†is four.
The correct number of degrees of freedom should be same as the total number of variables from the predictor set.
Sampling ErrorVariables are entered within the context of previously added variables one at a time in a stepwise method of discriminant analysis. The variable with the most variance explained is chosen first, followed by the variable that has the next best explained variance (unique variance) which does not overlap with the first entered variable.
Sampling error may cause variables say n1 and n2 with similar explanatory power to have variance accounted for to be only slightly different from each other
Sampling errors occur in stepwise methods of discriminant analysis due to the way through which variables are chosen (forward selection).
It is possible that worthy variables are often excluded from the analysis altogether and assumed to have no explanatory or predictive potential.
It is also possible to see that a variable ignored in the first step, might be more practical or economical or even its time population effect was longer.
Different explanatory ability of different variables on different functions is likely to be due to sampling error. These small differences due to sampling error will replicate. Results replicability is important in research endeavor, Thompson (1989).
Failure to Choose or Select the Best Subset of Variables
Variables’ selection and variables ordering which are the main reasons for stepwise discriminant analysis may not yield accurate results owing to the fact that these methods of variable selection capitalize on small amount of sampling error.
The importance of variable selection becomes clearer when the original variables set needs to be reduced for a particular purpose.
Stepwise methods of discriminant analysis do not specify or identify the best predictor set of a given size for the sample data being analyzed.
In fact, the true best set of variables may produce considerably higher effect sizes, and they may even include none of the variables selected by the stepwise algorithm.
WAYS OF DEALING WITH THE PROBLEMS INHERENT WITH STEPWISE ANALYSIS
The problems associated with the use of stepwise methods of discriminant analysis enlisted above can be managed in the following ways;
Incorrect degrees of freedom can be corrected by manually changing the values to the correct ones and then recalculating the F-statistics. This simply means that the incorrect degrees of freedom calculated by the computer packages can simply be corrected by hand. It is wise for the individual researchers to correct the incorrect degrees of freedom calculated by the computer package before the interpretation of their results.
The problem of sampling error capitalization can be minimized by reducing the number of variables to manageable size. Huberty (1989) suggested that variables that do not provide or have predictive validity should be discarded (variables that have contributed little to predictive validity in past studies), variables judged not relevant to the present study and variables highly correlated with other variables. The use of cross validation method can reduce the problem of sampling error to the barest minimum.
The problem of the inability of the stepwise methods of discriminant analysis to select the best subset of variable can be solved by conducting an all possible subsets of each subsets so as to determine the best subset of any given size. There are available computer packages like SPSS that can run the all-possible-subsets easily.
TEST OF THE EQUALITY OF THE TWO MEAN VECTORS Mahalanobis D2 test is used to test the hypothesis that differences in population means are zero. That is, Ho : µ1 = µ2
Mahalanobis D2 is defined by:
where p = number of variables. Then discriminant analysis can be carried out in order to identify the variables that contributed to the differences in group means.
-
-
-
ABSRACT - [ Total Page(s): 1 ]
Abstract
Several multivariate measurements require variables
selection and ordering. Stepwise procedures ensure a step by step method
through which these variables are selected and ordered usually for
discrimination and classification purposes. Stepwise procedures in discriminant
analysis show that only important variables are selected, while redundant
variables (variables that contribute less in the presence of other variables) are
discarded. The use of stepwise procedures ... Continue reading---
-
ABSRACT - [ Total Page(s): 1 ]
Abstract
Several multivariate measurements require variables
selection and ordering. Stepwise procedures ensure a step by step method
through which these variables are selected and ordered usually for
discrimination and classification purposes. Stepwise procedures in discriminant
analysis show that only important variables are selected, while redundant
variables (variables that contribute less in the presence of other variables) are
discarded. The use of stepwise procedures ... Continue reading---