Abstract
Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We also discuss possibilities for compiling data mining queries into algorithms, and look at the use of sampling in data mining. We conclude by listing several open research problems in data mining and knowledge discovery.
Part of this work was done while the author was visiting the Max Planck Institut für Informatik in Saarbrücken, Germany. Work supported by the Academy of Finland and by the Alexander von Humboldt Stiftung.
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'93), pages 207–216, May 1993.
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.
R. Agrawal and K. Shim. Developing tightly-coupled data mining applications on a relational database system. In Proc. of the 2nd Int'l Conference on Knowledge Discovery in Databases and Data Mining, pages 287–290, 1996.
S. Berchtold, D. A. Keim, and H. P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), pages 28–29, Mumbay, India, 1996. Morgan Kaufmann.
P. A. Boncz, W. Quak, and M. L. Kersten. Monet and its geographical extensions: a novel approach to high-performance GIS processing. In P. M. G. Apers, M. Bouzeghoub, and G. Gardarin, editors, Advances in Database Technology — EDBT'96, pages 147–166, 1996.
L. De Raedt and M. Bruynooghe. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93), pages 1058–1053, Chambéry, France, 1993. Morgan Kaufmann.
L. De Raedt and S. Džeroski. First-order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392, 1994.
T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304, Dec. 1995.
U. M. Fayyad, S. G. Djorgovski, and N. Weir. Automating the analysis and cataloging of sky surveys. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 471–494. AAAI Press, Menlo Park, CA, 1996.
U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 1–34. AAAI Press, Menlo Park, CA, 1996.
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996.
T. Fukuda et al. Data mining using two-dimensional optimized association rules: Scheme, algorithms, visualization. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'96), pages 13–23, 1996.
T. Fukuda et al. Mining optimized association rules for numeric attributes. In Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'96), 1996.
Z. Galil and E. Ukkonen, editors. 6th Annual Symposium on Combinatorial Patttern Matching (CPM 95), volume 937 of Lecture Notes in Computer Science, Berlin, 1995. Springer.
J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In 12th International Conference on Data Engineering (ICDE'96), pages 152–159, New Orleans, Louisiana, Feb. 1996.
J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 420–431, Zurich, Swizerland, 1995.
K. Hätönen, M. Klemettinen, H. Mannila, P. Ronkainen, and H. Toivonen. Knowledge discovery from telecommunication network alarm databases. In 12th International Conference on Data Engineering (ICDE'96), pages 115–122, New Orleans, Louisiana, Feb. 1996.
M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 150–155, Montreal, Canada, Aug. 1995.
M. Holsheimer, M. Kersten, and A. Siebes. Data surveyor: Searching the nuggets in parallel. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 447–467. AAAI Press, Menlo Park, CA, 1996.
T. Imielinski. A database view on data mining. Invited talk at the KDD'95 conference.
T. Imielinski and H. Mannila. Database mining: a new frontier. Communications of the ACM, 1996. To appear.
T. Imielinski and A. Virmani. M-sql: Query language for database mining. Technical report, Rutgers University, January 1996.
M. Jaeger, H. Mannila, and E. Weydert. Data mining as selective theory extraction in probabilistic logic. In R. Ng, editor, SIGMOD'96 Data Mining Workshop, The University of British Columbia, Department of Computer Science, TR 96-08, pages 41–46, 1996.
M. Kantola, H. Mannila, K.-J. Räihä, and H. Siirtola. Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7(7):591–607, Sept. 1992.
D. Keim and H. Kriegel. Visualization techniques for mining large databases: A comparison. IEEE Transactions on Knowledge and Data Engineering, 1996. to appear.
J.-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, editor, Inductive Logic Programming, pages 335–359. Academic Press, London, 1992.
J. Kivinen and H. Mannila. The power of sampling in knowledge discovery. In Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'94), pages 77–85, Minneapolis, MN, May 1994.
J. Kivinen and H. Mannila. Approximate dependency inference from relations. Theoretical Computer Science, 149(1):129–149, 1995.
M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. In Proceedings of the Third International Conference on Information and Knowledge Mantagement (CIKM'94), pages 401–407, Gaithersburg, MD, Nov. 1994. ACM.
W. Kloesgen. Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems, 4(1):53–69, 1995.
H. Mannila. Data mining: machine learning, statistics, and databases. In Proceedings of the 8th International Conference on Scientific and Statistical Database Management, Stockholm, pages 1–6, 1996.
H. Mannila and K.-J. Räihä. Design by example: An application of Armstrong relations. Journal of Computer and System Sciences, 33(2):126–141, 1986.
H. Mannila and K.-J. Räihä. Design of Relational Databases. Addison-Wesley Publishing Company, Wokingham, UK, 1992.
H. Mannila and K.-J. Räihä. On the complexity of dependency inference. Discrete Applied Mathematics, 40:237–243, 1992.
H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 146–151, Portland, Oregon, Aug. 1996. AAAI Press.
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 189–194, Portland, Oregon, Aug. 1996. AAAI Press.
H. Mannila and H. Toivonen. On an algorithm for finding all interesting sentences. In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research, pages 973–978, Vienna, Austria, Apr. 1996.
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 210–215, Montreal, Canada, Aug. 1995.
C. J. Matheus, G. Piatetsky-Shapiro, and D. McNeill. Selecting and reporting what is interesting. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 495–515. AAAI Press, Menlo Park, CA, 1996.
R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), 1996. To appear.
K. Mulmuley. Computational Geometry: An Introduction Through Randomized Algorithms. Prentice Hall, New York, 1993.
B. Padmanabhan and A. Tuzhilin. Pattern discovery in temporal databases: A temporal logic approach. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 351–354, 1996.
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 432–444, Zurich, Swizerland, 1995.
J. Schlimmer. Using learned dependencies to automatically construct sufficient and sensible editing views. In Knowledge Discovery in Databases, Papers from the 1993 AAAI Workshop (KDD'93), pages 186–196, Washington, D.C., 1993.
W. Shen, K. Ong, B. Mitbander, and C. Zaniolo. Metaqueries for data mining. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 375–398. AAAI Press, Menlo Park, CA, 1996.
M. Siegel. Automatic rule derivation for semantic query optimization. Technical Report BUCS Tech Report # 86–013, Boston University, Computer Science Department, Dec. 1986.
R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'96), pages 1–12, Montreal, Canada, 1996.
H. Toivonen. Sampling large databases for association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), pages 134–145, Mumbay, India, Sept. 1996. Morgan Kaufmann.
D. A. White and R. Jain. Algorithms and strategies for similarity retrieval. Technical Report VCL-96-101, Visual Computing Laboratory, University of California, San Diego, 9500 Gilman Drive, Mail Code 0407, La Jolla, CA 92093-0407, July 1996.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mannila, H. (1996). Methods and problems in data mining. In: Afrati, F., Kolaitis, P. (eds) Database Theory — ICDT '97. ICDT 1997. Lecture Notes in Computer Science, vol 1186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62222-5_35
Download citation
DOI: https://doi.org/10.1007/3-540-62222-5_35
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62222-2
Online ISBN: 978-3-540-49682-3
eBook Packages: Springer Book Archive