Abstract
In this paper we present an experimental study of the performance of six machine learning algorithms applied to morphological galaxy classification. We also address the learning approach from imbalanced data sets, inherent to many real-world applications, such as astronomical data analysis problems. We used two over-sampling techniques: SMOTE and Resampling, and we vary the amount of generated instances for classification. Our experimental results show that the learning method Random Forest with Resampling obtain the best results for three, five and seven galaxy types, with a F-measure about .99 for all cases.
Chapter PDF
Similar content being viewed by others
References
Bazell, D., Aha, D.: Ensembles of classifiers for morphological galaxy classificacion. The Astrophysical Journal 548, 219–233 (2001)
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.P.: SMOTE: synthetic minority oversampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Chawla, N., Lazarevic, A., Hall, L., Bowyer, K.: SMOTEBoost: Improving Prediction of the Minority Class in Boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)
De la Calleja, J., Fuentes, O.: Machine learning and image analysis for morphological galaxy classification. Montly Notices of the Royal Astronomical Society 349, 87–93 (2004)
Han, H., Wang, W., Mao, B.: Borderline-smote: A new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)
Hongyu, G., Herna, L.V.: Learning from imbalanced data sets with boosting and data generation: The databoost-IM approach. SIGKDD Explor. Newsl. 6(1), 30–39 (2004)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186 (1997)
Liu, Y., An, A., Huang, X.: Boosting predicion accuracy on imbalanced datasets with svm ensembles. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 107–118. Springer, Heidelberg (2006)
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
Mohamed, M.A., Atta, M.M.: Classification of galaxies using transformed domain features. Internartional Journal of Computer Science and Network Security 10(2), 86–91 (2010)
Naim, A., Lahav, O., Sodre Jr., L., Storrie-Lombardi, M.: Automated morphological classification of apm galaxies by supervised artificial neural networks. Monthly Notices of the Royal Astronomical Society 275, 567 (1995)
Philip, N., Wadadekar, Y., Kembhavi, A., Joseph, K.: A difference boosting neural network for automated star-galaxy classification. Astronomy and Astrophysics 385, 1119–1126 (2002)
Vapnik, V.: The nature of statistical learning theory. Springer, New York (1995)
Wang, B., Japkowicz, N.: Boosting support vector machines for imbalanced data sets. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds.) Foundations of Intelligent Systems. LNCS (LNAI), vol. 4994, pp. 38–47. Springer, Heidelberg (2008)
Yagi, M., Nakamura, Y., Doi, M., Shimasaku, K., Okamura, S.: Morphological classification of nearby galaxies based on asymmetry and luminosity concentration. Monthly Notices of the Royal Astronomical Society 368(1), 211–220 (2006)
Zhang, Y., Zhao, Y.: Automated clustering algorithms for classification of astronomical objects. The Astrophysical Journal 422, 1113–1121 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de la Calleja, J., Huerta, G., Fuentes, O., Benitez, A., Domínguez, E.L., Medina, M.A. (2010). The Imbalanced Problem in Morphological Galaxy Classification. In: Bloch, I., Cesar, R.M. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2010. Lecture Notes in Computer Science, vol 6419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16687-7_70
Download citation
DOI: https://doi.org/10.1007/978-3-642-16687-7_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16686-0
Online ISBN: 978-3-642-16687-7
eBook Packages: Computer ScienceComputer Science (R0)