Data Mining and Knowledge Discovery

, Volume 15, Issue 1, pp 87–97 | Cite as

Future trends in data mining

  • Hans-Peter Kriegel
  • Karsten M. Borgwardt
  • Peer Kröger
  • Alexey Pryakhin
  • Matthias Schubert
  • Arthur Zimek
Open Access


Over recent years data mining has been establishing itself as one of the major disciplines in computer science with growing industrial impact. Undoubtedly, research in data mining will continue and even increase over coming decades. In this article, we sketch our vision of the future of data mining. Starting from the classic definition of “data mining”, we elaborate on topics that — in our opinion — will set trends in data mining.


Data Mining Knowledge discovery Future trends 


  1. Achtert E, Böhm C, Kriegel H-P, Kröger P (2005) Online hierarchical clustering in a data warehouse environment. In: Proceedings of the 5th international conference on data mining (ICDM), Houston, TX, pp 10–17Google Scholar
  2. Bille P (2005) A survey on tree edit distance and related problems. Theor Comput Sci 337(1–3):217–239MATHCrossRefGoogle Scholar
  3. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with Co-training. In: Proceedings of the 11th annual conference on computational learning theory (COLT), Madison, WI, pp 92–100Google Scholar
  4. Bø TH, Dysvik B, Jonassen I (2004) LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res 32(3)Google Scholar
  5. Böhm C, Kailing K, Kröger P, Zimek A (2004) Computing clusters of correlation connected objects. In: Proceedings of the SIGMOD conference, Paris, France, pp 455–466Google Scholar
  6. Cronea SF, Lessmann S, Stahlbock R (2005) The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur J Oper ResGoogle Scholar
  7. Dietterich TG, Lathrop RH, Lozano-Perez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89:31–71MATHCrossRefGoogle Scholar
  8. Domeniconi C, Gunopulos D (2001) Incremental support vector machine construction. In: Proceedings of the 1st international conference on data mining (ICDM), San Jose, CA, pp 589–592Google Scholar
  9. Eiter T, Mannila H (1997) Distance measures for point sets and their computation. Acta Informatica 34(2):103–133CrossRefGoogle Scholar
  10. Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) Knowledge discovery and data mining: Towards a unifying framework. In: Proceedings of the 2nd ACM international conference on knowledge discovery and data mining (KDD), Portland, OR, pp 82–88Google Scholar
  11. Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. SIGMOD Records 34(2)Google Scholar
  12. Gärtner T, Flach PA, Kowalczyk A, Smola A (2002) Multi-instance kernels. In: Proceedings of the 19th international conference on machine learning (ICML), Sydney, Australia, pp 179–186Google Scholar
  13. Halevy AY (2003) Data integration: a status report. In: BTW, pp 24–29Google Scholar
  14. Han J, Kamber M (2001) Data mining: concepts and techniques. Academic Press, San DiegoGoogle Scholar
  15. Jörnsten R, Wang H-Y, Welsh WJ, Ouyang M (2005) DNA microarray data imputation and significance analysis of differential expression. Bioinformatics 21(22):4155–4161CrossRefGoogle Scholar
  16. Kailing K, Kriegel H-P, Pryakhin A, Schubert M (2004) Clustering multi-represented objects with noise. In: Proceedings of the 8th pacific-asia conference on knowledge discovery and data mining (PAKDD), Sydney, Australia, pp 394–403Google Scholar
  17. Kanellopoulos Y, Dimopulos T, Tjortjis C, Makris C (2006) Mining source code elements for comprehending object-oriented systems and evaluating their maintainability. SIGKDD Explorations 8(1):33–40CrossRefGoogle Scholar
  18. Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: A survey and empirical demonstration. In: Proceedings of the 8th ACM international conference on knowledge discovery and data mining (SIGKDD), Edmonton, Alberta, pp 102–111Google Scholar
  19. Kittler J, Hatef M, Duin R, Matas J (1998) On combining classifiers. IEEE Trans Pattern Analysis and Machine Intelligence 20(3):226–239CrossRefGoogle Scholar
  20. Kriegel H-P, Kröger P, Pryakhin A, Schubert M (2004) Using support vector machines for classifying large sets of multi-represented objects. In: Proceedings of the 4th SIAM international conference on data mining (SDM), Orlando, FL, pp 102–113Google Scholar
  21. Kriegel H-P, Pryakhin A, Schubert M (2005) Multi-represented kNN-classification for large class sets. In: Proceedings of the 10th international conference on database systems for advanced applications (DASFAA), Beijing, China, pp 511–522Google Scholar
  22. Kriegel H-P, Pryakhin A, Schubert M (2006) An EM-approach for clustering multi-instance objects. In: Proceedings of the 10th pacific-asia conference on knowledge discovery and data mining (PAKDD), Singapore, pp 139–148Google Scholar
  23. Liu C, Yan X, Yu H, Han J, Yu PS (2005) Mining behaviour graphs for “backtrace” of noncrashing bugs. In: Proceedings of the 5th SIAM international conference on data mining (SDM), Newport Beach, CA, pp 286–297Google Scholar
  24. Liu K, Kargupta H, Bhaduri K, Ryan J (2006a) Distributed data mining bibliography, January 2006. hillol/DDMBIB/Google Scholar
  25. Liu C, Yan X, Han J (2006) Mining control flow abnormality for logic error isolation. In: Proceedings of the 6th SIAM international conference on data mining (SDM), Bethesda, MD, pp 106–117Google Scholar
  26. Pyle D (1999) Data preparation for data mining. Morgan Kaufmann Publishers Inc.Google Scholar
  27. Ramon J, Bruynooghe M (2001) A polynomial time computable metric between points sets. Acta Informatica 37:765–780MATHCrossRefGoogle Scholar
  28. Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the 3rd ACM international conference on knowledge discovery and data mining (KDD), Newport Beach, CA, pp 67–73Google Scholar
  29. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525CrossRefGoogle Scholar
  30. Weidmann N, Frank E, Pfahringer B (2003) A two-level learning method for generalized multi-instance problems. In: Proceedings of the 14th european conference on machine learning (ECML), Cavtat-Dubrovnik, Croatia, pp 468–479Google Scholar
  31. Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD Explorations Newslett 5(1):59–68CrossRefGoogle Scholar
  32. Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Meeting of the association for computational linguisticsGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Hans-Peter Kriegel
    • 1
  • Karsten M. Borgwardt
    • 1
  • Peer Kröger
    • 1
  • Alexey Pryakhin
    • 1
  • Matthias Schubert
    • 1
  • Arthur Zimek
    • 1
  1. 1.Ludwig-Maximilians-UniversitätMunichGermany

Personalised recommendations