Abstract
Incremental learning has been used extensively for data stream classification. Most attention on the data stream classification paid on non-evolutionary methods. In this paper, we introduce new incremental learning algorithms based on harmony search. We first propose a new classification algorithm for the classification of batch data called harmony-based classifier and then give its incremental version for classification of data streams called incremental harmony-based classifier. Finally, we improve it to reduce its computational overhead in absence of drifts and increase its robustness in presence of noise. This improved version is called improved incremental harmony-based classifier. The proposed methods are evaluated on some real world and synthetic data sets. Experimental results show that the proposed batch classifier outperforms some batch classifiers and also the proposed incremental methods can effectively address the issues usually encountered in the data stream environments. Improved incremental harmony-based classifier has significantly better speed and accuracy on capturing concept drifts than the non-incremental harmony based method and its accuracy is comparable to non-evolutionary algorithms. The experimental results also show the robustness of improved incremental harmony-based classifier.
Similar content being viewed by others
References
Beyer, H., & Schwefel, H. (2002). Evolution strategies: A comprehensive introduction. Natural Computing, 1, 3–52.
Bifet, A., & Gavaldà, R. (2009a). Adaptive parameter-free learning from evolving data streams. In IDA.
Bifet A., & Gavaldà, R. (2009b). Adaptive XML tree classification on evolving data streams. In Proc. of European conference on machine learning and knowledge discovery in databases, ECML/PKDD.
Cunningham, P., Nowlan, N., Delany, S. J., & Haahr, M. (2003). A case-based approach to spam filtering that can track concept drift. Technical Report TCD-CS-2003-16, Ireland, Trinity College Dublin.
EGEE: Enabling Grids for E-science in Europe. http://www.euegee.org. Accessed October 2011.
Fan, W. (2004a). StreamMiner: A classifier ensemble-based engine to mine concept-drifting data streams. In Proc. of 2004 international conference on Very Large Data Bases (VLDB’2004) (Vol. 30, pp. 1257–1260). Toronto, Canada.
Fan, W. (2004b). Systematic data selection to mine concept-drifting data stream. In Proc. of ACM SIGKDD (pp. 128–137). Seattle, Washington USA.
Fogel, L. (1994). Evolutionary programming in perspective: The top-down view. In: J. M. Zurada, R. Marks II, C. Robinson (Eds.), Computational intelligence: Imitating life (pp. 135–146). Piscataway: IEEE Press.
Gama, J., Medas, P., & Rocha, R. (2004). Forest trees for on-line data. In Proc. ACM symp. applied computing (SAC’04) (pp. 632–636).
Geem, Z. W., Kim, J. H., & Loganathan, G. V. (2002). A new heuristic optimization algorithm: Harmony search. Simulation, 76(2), 60–68.
Geem, Z. W., Tseng, C., & Park, Y. (2005). Harmony search for generalized orienteering problem: Best touring in China, Springer. Lecture Notes in Computer Science, 3412, 741–750.
Goldberg, D. (1989). Genetic algorithms in search, optimization and machine learning. Addison-Wesley.
Guan, S. U., & Zhucollard, F. (2005). An incremental approach to genetic-algorithm-based classification. IEEE Transactions on Systems, Man and Cybernetics, Part B–Cybernetics, 35(2), 227–239.
Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques (2nd Edn.). Morgan Kaufmann Publisher.
Hashemi, S., Yang, Y., Mirzamomen, Z., & Kangavari, M. (2009). Adapted one-versus-all decision trees for data stream classification. IEEE Transactions on Knowledge and Data Engineering, 21(5), 624–637.
Hettich, S., & Bay, S. D. (2010). The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine, CA. http://www.kdd.ics.uci.edu.
Holland, J. H. (1986). Escaping brittleness: The possibilities of general purpose learning algorithms applied to parallel rule-based systems. In Machine learning: An artificial intelligence approach (Vol. II, pp. 593–623). Morgan Kaufmann.
Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. In F. Provost (Ed.), Knowledge discovery and data mining (pp. 97–106). AAAI Press.
Klinkenberg, R. (2004). Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, 8(3), 281–300.
Kolter, J. Z., & Maloof, M. A. (2003). Dynamic weighted majority: A new ensemble method for tracking concept drift. In Proc. Of the 3rd IEEE int. conf. on data mining ICDM-2003 (pp. 123–130). IEEE CS Press: Los Alamitos, CA.
Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge: MIT Press.
Koza, J., & Poli, R. (2005). Genetic programming. In E. Burke & G. Kendall (Eds.), Introductory tutorials in optimization, decision support and search methodology (Chapter 5, pp. 127–164). Kluwer Press.
Lee, K. S., & Geem, Z. W (2004). A new meta-heuristic algorithm for continues engineering optimization: Harmony search theory and practice. Computer Methods in Applied Mechanics and Engineering, 194, 3902–3933.
Liu, J., Li, X., & Zhong, W. (2009). Ambiguous decision trees for mining concept-drifting data streams. Pattern Recognition Letters, 30, 1347–1355.
Mahdavi, M., Fesanghary M., & Damangir, E. (2007). An improved harmony search algorithm for solving optimization problems. Applied Mathematics and Computation, 188, 1567–1579.
Mukhopadhyay, A., Roy, A., Das, S., & Abraham, A. (2008). Population-variance and explorative power of harmony search: an analysis. In Proceedings of 3rd IEEE international conference on digital information management (ICDIM 2008) (pp. 13–16). London, United Kingdom.
Omran, M. G. H., & Mahdavi, M. (2008). Global-best harmony search. Applied Mathematics and Computation, 198, 643–656.
Polikar, R., Udpa, L., Udpa, S., & Honavar, V. (2001). Learn+ +: An incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man and Cybernetics; Part C–Cybernetics, 31, 497–508.
Storn R., & Price, K. (1997). Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11, 341–359.
Street, W., & Kim, Y. (2001). A streaming ensemble algorithm for large scale classification. In Proceeding of the seventh international conference on knowledge discovery and data mining (pp. 377–382). NY.
Wang, H., Fan, W., Yu, P., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. In Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD2003) (pp. 226–235). Washington, D.C.
Widmer G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1), 69–101.
Witten, I. H., & Frank, E. (1999). Data mining: Practical machine learning tools with Java implementations. San Francisco: Morgan Kaufmann.
Zhang, Y., & Bhattacharyya, S. (2004). Genetic programming in classifying large-scale data: An ensemble method. Information Sciences, 163, 85–101.
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions which improved the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Karimi, Z., Abolhassani, H. & Beigy, H. A new method of mining data streams using harmony search. J Intell Inf Syst 39, 491–511 (2012). https://doi.org/10.1007/s10844-012-0199-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-012-0199-2