Feature selection for software effort estimation with localized neighborhood mutual information
- 141 Downloads
Feature selection is usually employed before applying case based reasoning (CBR) for Software Effort Estimation (SEE). Unfortunately, most feature selection methods treat CBR as a black box method so there is no guarantee on the appropriateness of CBR on selected feature subset. The key to solve the problem is to measure the appropriateness of CBR assumption for a given feature set. In this paper, a measure called localized neighborhood mutual information (LNI) is proposed for this purpose and a greedy method called LNI based feature selection (LFS) is designed for feature selection. Experiment with leave-one-out cross validation (LOOCV) on 6 benchmark datasets demonstrates that: (1) CBR makes effective estimation with the LFS selected subset compared with a randomized baseline method. Compared with three representative feature selection methods, (2) LFS achieves optimal MAR value on 3 out of 6 datasets with a 14% average improvement and (3) LFS achieves optimal MMRE on 5 out of 6 datasets with a 24% average improvement.
KeywordsFeature selection Case based reasoning Neighborhood mutual information Software effort estimation
- 6.Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In; proceedings of the ICML (2003)Google Scholar
- 12.Hall, M.A.: Correlation-based feature selection for machine learning. The University of Waikato, Hamilton (1999)Google Scholar
- 15.Menzies, T., Krishna, R., Pryor D.: The promise repository of empirical software engineering data. (2015)Google Scholar
- 21.Rosenthal, R.: Parametric measures of effect size. In: Cooper, H., Hedges, L.V., Valentine, J.C. (eds.) The Handbook of Research Synthesis, pp. 231–244. Russell Sage Foundation, New York (1994)Google Scholar