Skip to main content
Log in

A new method of mining data streams using harmony search

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Incremental learning has been used extensively for data stream classification. Most attention on the data stream classification paid on non-evolutionary methods. In this paper, we introduce new incremental learning algorithms based on harmony search. We first propose a new classification algorithm for the classification of batch data called harmony-based classifier and then give its incremental version for classification of data streams called incremental harmony-based classifier. Finally, we improve it to reduce its computational overhead in absence of drifts and increase its robustness in presence of noise. This improved version is called improved incremental harmony-based classifier. The proposed methods are evaluated on some real world and synthetic data sets. Experimental results show that the proposed batch classifier outperforms some batch classifiers and also the proposed incremental methods can effectively address the issues usually encountered in the data stream environments. Improved incremental harmony-based classifier has significantly better speed and accuracy on capturing concept drifts than the non-incremental harmony based method and its accuracy is comparable to non-evolutionary algorithms. The experimental results also show the robustness of improved incremental harmony-based classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Beyer, H., & Schwefel, H. (2002). Evolution strategies: A comprehensive introduction. Natural Computing, 1, 3–52.

    Article  MathSciNet  MATH  Google Scholar 

  • Bifet, A., & Gavaldà, R. (2009a). Adaptive parameter-free learning from evolving data streams. In IDA.

  • Bifet A., & Gavaldà, R. (2009b). Adaptive XML tree classification on evolving data streams. In Proc. of European conference on machine learning and knowledge discovery in databases, ECML/PKDD.

  • Cunningham, P., Nowlan, N., Delany, S. J., & Haahr, M. (2003). A case-based approach to spam filtering that can track concept drift. Technical Report TCD-CS-2003-16, Ireland, Trinity College Dublin.

  • EGEE: Enabling Grids for E-science in Europe. http://www.euegee.org. Accessed October 2011.

  • Fan, W. (2004a). StreamMiner: A classifier ensemble-based engine to mine concept-drifting data streams. In Proc. of 2004 international conference on Very Large Data Bases (VLDB’2004) (Vol. 30, pp. 1257–1260). Toronto, Canada.

  • Fan, W. (2004b). Systematic data selection to mine concept-drifting data stream. In Proc. of ACM SIGKDD (pp. 128–137). Seattle, Washington USA.

  • Fogel, L. (1994). Evolutionary programming in perspective: The top-down view. In: J. M. Zurada, R. Marks II, C. Robinson (Eds.), Computational intelligence: Imitating life (pp. 135–146). Piscataway: IEEE Press.

    Google Scholar 

  • Gama, J., Medas, P., & Rocha, R. (2004). Forest trees for on-line data. In Proc. ACM symp. applied computing (SAC’04) (pp. 632–636).

  • Geem, Z. W., Kim, J. H., & Loganathan, G. V. (2002). A new heuristic optimization algorithm: Harmony search. Simulation, 76(2), 60–68.

    Article  Google Scholar 

  • Geem, Z. W., Tseng, C., & Park, Y. (2005). Harmony search for generalized orienteering problem: Best touring in China, Springer. Lecture Notes in Computer Science, 3412, 741–750.

    Article  Google Scholar 

  • Goldberg, D. (1989). Genetic algorithms in search, optimization and machine learning. Addison-Wesley.

  • Guan, S. U., & Zhucollard, F. (2005). An incremental approach to genetic-algorithm-based classification. IEEE Transactions on Systems, Man and Cybernetics, Part B–Cybernetics, 35(2), 227–239.

    Article  Google Scholar 

  • Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques (2nd Edn.). Morgan Kaufmann Publisher.

  • Hashemi, S., Yang, Y., Mirzamomen, Z., & Kangavari, M. (2009). Adapted one-versus-all decision trees for data stream classification. IEEE Transactions on Knowledge and Data Engineering, 21(5), 624–637.

    Article  Google Scholar 

  • Hettich, S., & Bay, S. D. (2010). The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine, CA. http://www.kdd.ics.uci.edu.

  • Holland, J. H. (1986). Escaping brittleness: The possibilities of general purpose learning algorithms applied to parallel rule-based systems. In Machine learning: An artificial intelligence approach (Vol. II, pp. 593–623). Morgan Kaufmann.

  • Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. In F. Provost (Ed.), Knowledge discovery and data mining (pp. 97–106). AAAI Press.

  • Klinkenberg, R. (2004). Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, 8(3), 281–300.

    Google Scholar 

  • Kolter, J. Z., & Maloof, M. A. (2003). Dynamic weighted majority: A new ensemble method for tracking concept drift. In Proc. Of the 3rd IEEE int. conf. on data mining ICDM-2003 (pp. 123–130). IEEE CS Press: Los Alamitos, CA.

    Chapter  Google Scholar 

  • Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Koza, J., & Poli, R. (2005). Genetic programming. In E. Burke & G. Kendall (Eds.), Introductory tutorials in optimization, decision support and search methodology (Chapter 5, pp. 127–164). Kluwer Press.

  • Lee, K. S., & Geem, Z. W (2004). A new meta-heuristic algorithm for continues engineering optimization: Harmony search theory and practice. Computer Methods in Applied Mechanics and Engineering, 194, 3902–3933.

    Article  Google Scholar 

  • Liu, J., Li, X., & Zhong, W. (2009). Ambiguous decision trees for mining concept-drifting data streams. Pattern Recognition Letters, 30, 1347–1355.

    Article  Google Scholar 

  • Mahdavi, M., Fesanghary M., & Damangir, E. (2007). An improved harmony search algorithm for solving optimization problems. Applied Mathematics and Computation, 188, 1567–1579.

    Article  MathSciNet  MATH  Google Scholar 

  • Mukhopadhyay, A., Roy, A., Das, S., & Abraham, A. (2008). Population-variance and explorative power of harmony search: an analysis. In Proceedings of 3rd IEEE international conference on digital information management (ICDIM 2008) (pp. 13–16). London, United Kingdom.

  • Omran, M. G. H., & Mahdavi, M. (2008). Global-best harmony search. Applied Mathematics and Computation, 198, 643–656.

    Article  MathSciNet  MATH  Google Scholar 

  • Polikar, R., Udpa, L., Udpa, S., & Honavar, V. (2001). Learn+ +: An incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man and Cybernetics; Part C–Cybernetics, 31, 497–508.

    Article  Google Scholar 

  • Storn R., & Price, K. (1997). Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11, 341–359.

    Article  MathSciNet  MATH  Google Scholar 

  • Street, W., & Kim, Y. (2001). A streaming ensemble algorithm for large scale classification. In Proceeding of the seventh international conference on knowledge discovery and data mining (pp. 377–382). NY.

  • Wang, H., Fan, W., Yu, P., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. In Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD2003) (pp. 226–235). Washington, D.C.

  • Widmer G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1), 69–101.

    Google Scholar 

  • Witten, I. H., & Frank, E. (1999). Data mining: Practical machine learning tools with Java implementations. San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Zhang, Y., & Bhattacharyya, S. (2004). Genetic programming in classifying large-scale data: An ensemble method. Information Sciences, 163, 85–101.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions which improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Beigy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karimi, Z., Abolhassani, H. & Beigy, H. A new method of mining data streams using harmony search. J Intell Inf Syst 39, 491–511 (2012). https://doi.org/10.1007/s10844-012-0199-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-012-0199-2

Keywords

Navigation