2.4.5.4 Regression Analysis
This is a statistical tool that uses the relation between two or more quantitative variables so that one variable (dependent variable) can be predicted from the other(s) (independent variables). But no matter how strong the statistical relations are between the variables, no cause-and-effect pattern is necessarily implied by the regression model. Regression analysis comes in many flavours, including simple linear, multiple linear, curvilinear, and multiple curvilinear regression models (Jackson, 2002).
2.4.6 Data Mining Analysis and Techniques
Several data mining problem types or analysis tasks are typically encountered during a data mining project. Depending on the desired outcome, several data analysis techniques with different goals may be applied successively to achieve a desired result. For example, to determine which customers are likely to buy a new product, a business analyst may need ï¬rst to use cluster analysis to segment the customer database, then apply regression analysis to predict the behaviour for each cluster. The data mining analysis tasks typically fall into the general categories listed below (Jackson, 2002).
Data Summarization: This gives the user an overview of the structure of the data and is generally carried out in the early stages of a project. This type of initial exploratory data analysis can help to understand the nature of the data and to ï¬nd potential hypotheses for hidden information. Simple descriptive statistical and visualization techniques generally apply (Jackson, 2002).
Segmentation: This separates the data into interesting and meaningful sub-groups or classes. In this case, the analyst can hypothesize certain subgroups as relevant for the business question based on prior knowledge or based on the outcome of data description and summarization. Automatic clustering techniques can detect previously unsuspected and hidden structures in data that allow segmentation. Clustering techniques, visualization and neural nets generally apply (Jackson, 2002).
Classiï¬cation: This assumes that a set of objects—characterized by some attributes or features—belong to different classes. The class label is a discrete qualitative identiï¬er; for example, large, medium, or small. The objective is to build classiï¬cation models that assign the correct class to previously unseen and unlabelled objects. Classiï¬cation models are mostly used for predictive modelling. Discriminant analysis, decision tree, rule induction methods, and genetic algorithms generally apply (Jackson, 2002).
Prediction: is very similar to classiï¬cation. The difference is that in prediction, the class is not a qualitative discrete attribute but a continuous one. The goal of prediction is to ï¬nd the numerical value of the target attribute for unseen objects; this problem type is also known as regression, and if the prediction deals with time series data, then it is often called forecasting. Regression analysis, decision trees, and neural nets generally apply (Jackson, 2002).
2.5 Data Mining in Human Resource Applications
Knowledge Discovery in Database (KDD) or Data mining (DM) is an approach that is now receiving great attention and is being recognized as a newly emerging analysis tool [Tso and Yao 2008].
Data mining has given a great deal of concern and attention in the information industry and in society as a whole recently. This is due to the wide accessibility of enormous amounts of data and the important need for turning such data into useful information and knowledge [Han and Kamber 2006].
Computer application such as DSS that interfaces with DM tool can help executives to make more informed and objectives decisions and help managers retrieve, summarize and analyse decision related data to make wiser and more informed decisions. Data mining has been applied in many ï¬elds such as ï¬nance, marketing, manufacturing, health care, customer relationship and etc. Nevertheless, its application in HRM is not as vast [Chien and Chen 2008].
Prediction applications in HRM are infrequent, there are some examples such as to predict the length of service, sales premiums, to persistence indices of insurance agents and analyse disoperation behaviours of operators [Chien and Chen 2008]. For that reasons, in this study, we attempts to use Data mining techniques to forecast potential employees as a part of talent management task. Table 2.5.1 lists some of the HR applications that use Data Mining, and it shows that there are few discussions about performance predictions that use DM technique in human resource domain.

