• Development Of An Information Retrieval System Using Tree-structured Clustering
    [FRSC Benue State]

  • CHAPTER TWO -- [Total Page(s) 4]

    Page 4 of 4

    Previous   1 2 3 4
    • 2.6    Mixed Clustering
      Clustering algorithms are designed for either categorical data or numeric data. However, in the real world, a majority of datasets are described by a combination of continuous and categorical features. A general method is to transform one data type to another. In most cases, nominal attributes are encoded by simple matching or binarymapping, and then clustering is performed on the new-computed numeric proximity. Binary encoding transforms each categorical attribute to a set of binary attributes and then encodes a categorical value to this set of binary values.
      Simple matching generates distance measurement in such a way that yields a difference of zero when comparing two identical categorical values, and a difference of one while comparing two distinct values. However, the coding methods have the disadvantages of losing information derivable from the ordering of different values, losing the structure of categorical value with different levels of similarity, requiring more space and time when the domain of the categorical attribute is large, ignoring the context of a pair of values, e.g., the co- occurrence with other attributes, and giving different weight to the attributes according to the number of different values they may take. Moreover, if quantitative and binary attributes are included in the same index, these procedures will generally give the latter excessive weight.
      An alternative approach is to discrete numeric values and then apply symbolic clustering algorithms. The discretization process often loses the important information especially the relative difference of two values for numeric features. In addition, it causes boundary problem when two close values near the discretization boundary may be assigned to two different ranges. Another difficult problem is to estimate the optimal intervals during discretization.
      Huang (1998) extended k-modes to mixed datasets and developed k-prototype algorithm. The distances of two types of features are separately calculated. The numerical distances are measured by Euclidean distances, while the categorical distances are measured by simple matching. The centers of categorical attributes are defined as the modes in the cluster. (Ahmad and Dey, 2007) proposed a fuzzy prototype k-means algorithm. Similar to k-prototype,the cost function is made up of two components. The difference is that the categorical distances are measured by the co-occurrence of two attributes and the categorical cluster centers are the lists of values in every attribute with their frequencies in the cluster.
  • CHAPTER TWO -- [Total Page(s) 4]

    Page 4 of 4

    Previous   1 2 3 4
    • ABSRACT - [ Total Page(s): 1 ]Coming Soon ... Continue reading---

         

      APPENDIX A - [ Total Page(s): 2 ]REGISTRATION PAGE ... Continue reading---

         

      CHAPTER ONE - [ Total Page(s): 2 ]1.3    Justification for the StudyThis study provides a means of easy storage and retrieval of information of vehicles and their owners for the FRSC in Benue State. It eases the stress of searching through the entire directory when retrieving information on an existing record; it will ensure the provision of a clear statistics of vehicle owners in a particular local government in the state. The output of the study shall serve as a benchmark for the Federal Road Safety Corps on the ... Continue reading---

         

      CHAPTER THREE - [ Total Page(s): 7 ]Quality improvement and cost reduction:platform.due to a central communicationv.        Use of Less Space for Record Storage: There will be elimination of much space used in storing records by introducing a computer storage media (disks) which can keep vast volume of information in a less space.vi.Speed Optimization:This will eliminate the problems of time wasting in registering records, checking from one line to the next as well as preparing a revenue report which is faster than using man ... Continue reading---

         

      CHAPTER FOUR - [ Total Page(s): 2 ]CHAPTER FOURRESULT AND IMPLEMENTATION4.1    IntroductionSystems design could be seen as the application of systems theory to product development. According to Wikipedia it is defined as the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements.4.2    System RequirementIn developing any system, there is need to specify some system requirements for minimum performance. However, with respect to this work the system requi ... Continue reading---

         

      CHAPTER FIVE - [ Total Page(s): 1 ]CHAPTER FIVESUMMARY, CONCLUSION AND RECOMMENDATION5.1    SUMMARYThis project work is aimed at providing a software model for grouping a set of related records in the Federal Road Safety Commission. The system has been designed to automate data for which vehicle owners are being registered. Consistency, reliability, fairness and quick turnaround time is ensured with the use of this system. Based on the model used in this software, further improvements can be made in order to include other feat ... Continue reading---

         

      REFRENCES - [ Total Page(s): 1 ]REFERENCES1.    William B. Frakes and Ricardo Baeza-Yates.(1992). Information Retrieval    Data Structures & Algorithms. Prentice-Hall, Inc. ISBN 0-13-463837-9.2.    Ahmad, A. and Dey, L. (2007). A method to compute distance between two categorical values of some attributes in unsupervised learning for categorical data set.3.    Anderberg M.R. (1973). Cluster Analysis for Applications. Academic Press, New York.4.        Chandola Varun, Boriah Shyam and Kumar Vipin (2007). Simil ... Continue reading---