• Stepwise Procedures In Discriminant Analysis

  • CHAPTER THREE -- [Total Page(s) 5]

    Page 4 of 5

    Previous   1 2 3 4 5    Next
    • INTERPRETATION   OF LINEAR DISCRIMINANT FUNCTION

      A close correspondence exists between interpreting a discriminant function and determining the contribution of each variable. The signs of the discriminant function coefficient are taken into account during the interpretation of a discriminant function. For example Y1 = 5x1 – 3x2 + 2x3 has a different meaning than does

      Y2 = 5x1 + 3x2 + 2x3. This is because Y1 depends on the difference between x1 and x2, while Y2 is related to the sum of x1 and x2.

      In ascertaining the contribution of each variable, the signs are ignored and coefficients are ranked in absolute value.

       LIMITATIONS OF DISCRMINANT FUNCTIONS

       Coefficient of the discriminant functions may change if variable(s) is/are added or deleted.

      Coefficients of the discriminant functions may not be stable from sample to sample unless the sample size is large relative to the number of variables.

      LIMITATIONS OF       STEPWISE       METHODS      OF DISCRIMINANT ANALYSIS

      There are three basic problems associated with the use of stepwise discriminant analysis.

      (1).    Incorrect Degrees of Freedom

      Most computer packages like SPSS, MINITAB, that employ stepwise methods of discriminant analysis use incorrect degrees of freedom in the calculation of statistical tests of discriminant function analysis.

      The use of incorrect degrees of freedom in the calculation of statistical tests for discriminant function makes the result of statistical tests of significance systematically biased in favour of spuriously high statistical significance. Caution should be taken by researchers or students against interpreting wrong results commonly obtained from computer packages.

      In statistical analysis, degrees of freedom reflect the number of unique pieces of information available for a given research situation. Degrees of freedom constraint the number of enquiries we may direct at our data and they are the currency we spend in analysis. (Thompson 1988). The pre-set degrees of freedom in   any computerized stepwise procedure are one for each variable selected or included in the analysis.

      Pre-set degrees of freedom are like coins that we can spend to explore out data. Every predictor variable used is charged one degree of freedom. If the original number of predictor variable was four then the correct “charge” is four.

      The correct number of degrees of freedom should be same as the total number of variables from the predictor set.

      Sampling Error

      Variables are entered within the context of previously added variables one at a time in a stepwise method of discriminant analysis. The variable with the most variance explained is chosen first, followed by the variable that has the next best explained variance (unique variance) which does not overlap with the first entered variable.

      Sampling error may cause variables say n1 and n2 with similar explanatory power to have variance accounted for to be only slightly different from each other

      Sampling errors occur in stepwise methods of discriminant analysis due to the way through which variables are chosen (forward selection).

      It is possible that worthy variables are often excluded from the analysis altogether and assumed to have no explanatory or predictive potential.

      It is also possible to see that a variable ignored in the first step, might be more practical or economical or even its time population effect was longer.

      Different explanatory ability of different variables on different functions is likely to be due to sampling error. These small differences due to sampling error will replicate. Results replicability is important in research endeavor, Thompson (1989).

      Failure to Choose or Select the Best Subset of Variables

      Variables’ selection and variables ordering which are the main reasons for stepwise discriminant analysis may not yield accurate results owing to the fact that these methods of variable selection capitalize on small amount of sampling error.

      The importance of variable selection becomes clearer when the original variables set needs to be reduced for a particular purpose.

      Stepwise methods of discriminant analysis do not specify or identify the best predictor set of a given size for the sample data being analyzed.

      In fact, the true best set of variables may produce considerably higher effect sizes, and they may even include none of the variables selected by the stepwise algorithm.

      WAYS OF DEALING WITH THE PROBLEMS INHERENT WITH STEPWISE ANALYSIS

      The problems associated with the use of stepwise methods of discriminant analysis enlisted above can be managed in the following ways;

      Incorrect degrees of freedom can be corrected by manually changing the values to the correct ones and then recalculating the F-statistics. This simply means that the incorrect degrees of freedom calculated by the computer packages can simply be corrected by hand. It is wise for the individual researchers to correct the incorrect degrees of freedom calculated by the computer package before the interpretation of their results.

      The problem of sampling error capitalization can be minimized by reducing the number of variables to manageable size. Huberty (1989) suggested that variables that do not provide or have predictive validity should be discarded (variables that have contributed little to predictive validity in past studies), variables judged not relevant to the present study and variables highly correlated with other variables. The use of cross validation method can reduce the problem of sampling error to the barest minimum.

      The problem of the inability of the stepwise methods of discriminant analysis to select the best subset of variable can be solved by conducting an all possible subsets of each subsets so as to determine the best subset of any given size. There are available computer packages like SPSS that can run the all-possible-subsets easily.

      TEST OF THE EQUALITY OF THE TWO MEAN VECTORS Mahalanobis D2 test is used to test the hypothesis that differences in population means are zero. That is, Ho : µ1 = µ2

      Mahalanobis D2 is defined by:

      where p = number of variables. Then discriminant analysis can be carried out in order to identify the variables that contributed to the differences in group means.


  • CHAPTER THREE -- [Total Page(s) 5]

    Page 4 of 5

    Previous   1 2 3 4 5    Next
    • ABSRACT - [ Total Page(s): 1 ] Abstract Several multivariate measurements require variables selection and ordering. Stepwise procedures ensure a step by step method through which these variables are selected and ordered usually for discrimination and classification purposes. Stepwise procedures in discriminant analysis show that only important variables are selected, while redundant variables (variables that contribute less in the presence of other variables) are discarded. The use of stepwise procedures ... Continue reading---

         

      APPENDIX A - [ Total Page(s): 1 ] ... Continue reading---

         

      APPENDIX B - [ Total Page(s): 1 ] APPENDIX II BACKWARD ELIMINATION METHOD The procedure for the backward elimination of variables starts with all the x’s included in the model and deletes one at a time using a partial  or F. At the first step, the partial  for each xi isThe variable with the smallest F or the largest  is deleted. At the second step of backward elimination of variables, a partial  or F is calculated for each q-1 remaining variables and again, the variable which is th ... Continue reading---

         

      TABLE OF CONTENTS - [ Total Page(s): 1 ]TABLE OF CONTENTSPageTitle PageApproval pageDedicationAcknowledgementAbstractTable of ContentsCHAPTER 1: INTRODUCTION1.1    Discriminant Analysis1.2    Stepwise Discriminant analysis1.3    Steps Involved in discriminant Analysis1.4    Goals for Discriminant Analysis1.5    Examples of Discriminant analysis problems1.6    Aims and Obj ectives1.7    Definition of Terms1.7.1    Discriminant function1.7.2    The eigenvalue1.7.3    Discriminant Score1.7.4    Cut off1.7 ... Continue reading---

         

      CHAPTER ONE - [ Total Page(s): 2 ] DEFINITION OF TERMS Discriminant Function This is a latent variable which is created as a linear combination of discriminating variables, such that Y =      L1X1 + L2X2 +          + Lp Xp where the L’s are the discriminant coefficients, the x’s are the discriminating variables. The eigenvalue: This is the ratio of importance of the dimensions which classifies cases of the dependent variables. There is one eigenvalue for each discriminant functio ... Continue reading---

         

      CHAPTER TWO - [ Total Page(s): 3 ] 5 is called the mahalanobis (squared) distance for known parameters. For unknown parameters, the Mahalanobis (squared) distance is obtained by estimating p1, p2 and S by X1, X2 and S, respectively. Following the same technique the Mahalanobis (Squared) distance, D , for the unknown parameters is D2 = (X- X)+S-1 (X1- X2) . The distribution of D can be used to test if there are significant differences between the two groups.2.4 WELCH’S CRITERION Welch (1939) suggest ... Continue reading---

         

      CHAPTER FOUR - [ Total Page(s): 3 ]CHAPTER FOUR DATA ANALYSISMETHOD OF DATA COLLECTIONThe data employed in this work are as collected by G.R. Bryce andR.M. Barker of Brigham Young University as part of a preliminary study of a possible link between football helmet design and neck injuries.Five head measurements were made on each subject, about 30 subjects per group:Group 1    =    High School Football players Group 2    =    Non-football playersThe five variables areWDIM    =    X1    =    head width at wi ... Continue reading---

         

      CHAPTER FIVE - [ Total Page(s): 1 ]CHAPTER FIVERESULTS, CONCLUSION AND RECOMMENDATIONRESULTSAs can be observed from the results of the analysis, when discriminant analysis was employed, the variable CIRCUM(X2) has the highest Wilks’ lambda of 0.999 followed by FBEYE (X2) (0.959). The variable EYEHD (X4) has the least Wilks’ lambda of 0.517 followed by EARHD (X5) (0.705). Also the least F-value was recorded with the variable CIRCUM (X2) (0.074) followed by the variable FBEYE (X2) (2.474), while the variable EYEHD (X4 ... Continue reading---

         

      REFRENCES - [ Total Page(s): 1 ] REFERENCES Anderson, T.W. (1958). An introduction to multivariate statistical Analysis. John Wiley & Sons Inc., New York. Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin 70, 426-443. Cooley W.W. and Lohnes P.R. (1962). Multivariate procedures for the Behavioural Sciences, New York John Wiley and Sons Inc. Efroymson, M.A. (1960). Multiple regression analysis. In A. Raston & H.S. Wilfs (Eds.) Mathematical methods for ... Continue reading---