The Imbalanced Problem in Morphological Galaxy Classification
In this paper we present an experimental study of the performance of six machine learning algorithms applied to morphological galaxy classification. We also address the learning approach from imbalanced data sets, inherent to many real-world applications, such as astronomical data analysis problems. We used two over-sampling techniques: SMOTE and Resampling, and we vary the amount of generated instances for classification. Our experimental results show that the learning method Random Forest with Resampling obtain the best results for three, five and seven galaxy types, with a F-measure about .99 for all cases.
Keywordsmachine learning imbalanced data sets galaxies
- 8.Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186 (1997)Google Scholar
- 11.Mohamed, M.A., Atta, M.M.: Classification of galaxies using transformed domain features. Internartional Journal of Computer Science and Network Security 10(2), 86–91 (2010)Google Scholar
- 17.Zhang, Y., Zhao, Y.: Automated clustering algorithms for classification of astronomical objects. The Astrophysical Journal 422, 1113–1121 (2004)Google Scholar