Methods and problems in data mining

Mannila, Heikki

doi:10.1007/3-540-62222-5_35

Heikki Mannila¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1186))

Included in the following conference series:

International Conference on Database Theory

498 Accesses
53 Citations

Abstract

Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We also discuss possibilities for compiling data mining queries into algorithms, and look at the use of sampling in data mining. We conclude by listing several open research problems in data mining and knowledge discovery.

Part of this work was done while the author was visiting the Max Planck Institut für Informatik in Saarbrücken, Germany. Work supported by the Academy of Finland and by the Alexander von Humboldt Stiftung.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'93), pages 207–216, May 1993.
Google Scholar
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
R. Agrawal and K. Shim. Developing tightly-coupled data mining applications on a relational database system. In Proc. of the 2nd Int'l Conference on Knowledge Discovery in Databases and Data Mining, pages 287–290, 1996.
Google Scholar
S. Berchtold, D. A. Keim, and H. P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), pages 28–29, Mumbay, India, 1996. Morgan Kaufmann.
Google Scholar
P. A. Boncz, W. Quak, and M. L. Kersten. Monet and its geographical extensions: a novel approach to high-performance GIS processing. In P. M. G. Apers, M. Bouzeghoub, and G. Gardarin, editors, Advances in Database Technology — EDBT'96, pages 147–166, 1996.
Google Scholar
L. De Raedt and M. Bruynooghe. A theory of clausal discovery. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93), pages 1058–1053, Chambéry, France, 1993. Morgan Kaufmann.
Google Scholar
L. De Raedt and S. Džeroski. First-order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375–392, 1994.
Article Google Scholar
T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304, Dec. 1995.
Google Scholar
U. M. Fayyad, S. G. Djorgovski, and N. Weir. Automating the analysis and cataloging of sky surveys. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 471–494. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 1–34. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
T. Fukuda et al. Data mining using two-dimensional optimized association rules: Scheme, algorithms, visualization. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'96), pages 13–23, 1996.
Google Scholar
T. Fukuda et al. Mining optimized association rules for numeric attributes. In Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'96), 1996.
Google Scholar
Z. Galil and E. Ukkonen, editors. 6th Annual Symposium on Combinatorial Patttern Matching (CPM 95), volume 937 of Lecture Notes in Computer Science, Berlin, 1995. Springer.
Google Scholar
J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In 12th International Conference on Data Engineering (ICDE'96), pages 152–159, New Orleans, Louisiana, Feb. 1996.
Google Scholar
J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 420–431, Zurich, Swizerland, 1995.
Google Scholar
K. Hätönen, M. Klemettinen, H. Mannila, P. Ronkainen, and H. Toivonen. Knowledge discovery from telecommunication network alarm databases. In 12th International Conference on Data Engineering (ICDE'96), pages 115–122, New Orleans, Louisiana, Feb. 1996.
Google Scholar
M. Holsheimer, M. Kersten, H. Mannila, and H. Toivonen. A perspective on databases and data mining. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 150–155, Montreal, Canada, Aug. 1995.
Google Scholar
M. Holsheimer, M. Kersten, and A. Siebes. Data surveyor: Searching the nuggets in parallel. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 447–467. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
T. Imielinski. A database view on data mining. Invited talk at the KDD'95 conference.
Google Scholar
T. Imielinski and H. Mannila. Database mining: a new frontier. Communications of the ACM, 1996. To appear.
Google Scholar
T. Imielinski and A. Virmani. M-sql: Query language for database mining. Technical report, Rutgers University, January 1996.
Google Scholar
M. Jaeger, H. Mannila, and E. Weydert. Data mining as selective theory extraction in probabilistic logic. In R. Ng, editor, SIGMOD'96 Data Mining Workshop, The University of British Columbia, Department of Computer Science, TR 96-08, pages 41–46, 1996.
Google Scholar
M. Kantola, H. Mannila, K.-J. Räihä, and H. Siirtola. Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7(7):591–607, Sept. 1992.
Google Scholar
D. Keim and H. Kriegel. Visualization techniques for mining large databases: A comparison. IEEE Transactions on Knowledge and Data Engineering, 1996. to appear.
Google Scholar
J.-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, editor, Inductive Logic Programming, pages 335–359. Academic Press, London, 1992.
Google Scholar
J. Kivinen and H. Mannila. The power of sampling in knowledge discovery. In Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'94), pages 77–85, Minneapolis, MN, May 1994.
Google Scholar
J. Kivinen and H. Mannila. Approximate dependency inference from relations. Theoretical Computer Science, 149(1):129–149, 1995.
Google Scholar
M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. In Proceedings of the Third International Conference on Information and Knowledge Mantagement (CIKM'94), pages 401–407, Gaithersburg, MD, Nov. 1994. ACM.
Google Scholar
W. Kloesgen. Efficient discovery of interesting statements in databases. Journal of Intelligent Information Systems, 4(1):53–69, 1995.
Google Scholar
H. Mannila. Data mining: machine learning, statistics, and databases. In Proceedings of the 8th International Conference on Scientific and Statistical Database Management, Stockholm, pages 1–6, 1996.
Google Scholar
H. Mannila and K.-J. Räihä. Design by example: An application of Armstrong relations. Journal of Computer and System Sciences, 33(2):126–141, 1986.
Google Scholar
H. Mannila and K.-J. Räihä. Design of Relational Databases. Addison-Wesley Publishing Company, Wokingham, UK, 1992.
Google Scholar
H. Mannila and K.-J. Räihä. On the complexity of dependency inference. Discrete Applied Mathematics, 40:237–243, 1992.
Google Scholar
H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 146–151, Portland, Oregon, Aug. 1996. AAAI Press.
Google Scholar
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 189–194, Portland, Oregon, Aug. 1996. AAAI Press.
Google Scholar
H. Mannila and H. Toivonen. On an algorithm for finding all interesting sentences. In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research, pages 973–978, Vienna, Austria, Apr. 1996.
Google Scholar
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD'95), pages 210–215, Montreal, Canada, Aug. 1995.
Google Scholar
C. J. Matheus, G. Piatetsky-Shapiro, and D. McNeill. Selecting and reporting what is interesting. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 495–515. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), 1996. To appear.
Google Scholar
K. Mulmuley. Computational Geometry: An Introduction Through Randomized Algorithms. Prentice Hall, New York, 1993.
Google Scholar
B. Padmanabhan and A. Tuzhilin. Pattern discovery in temporal databases: A temporal logic approach. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pages 351–354, 1996.
Google Scholar
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95), pages 432–444, Zurich, Swizerland, 1995.
Google Scholar
J. Schlimmer. Using learned dependencies to automatically construct sufficient and sensible editing views. In Knowledge Discovery in Databases, Papers from the 1993 AAAI Workshop (KDD'93), pages 186–196, Washington, D.C., 1993.
Google Scholar
W. Shen, K. Ong, B. Mitbander, and C. Zaniolo. Metaqueries for data mining. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 375–398. AAAI Press, Menlo Park, CA, 1996.
Google Scholar
M. Siegel. Automatic rule derivation for semantic query optimization. Technical Report BUCS Tech Report # 86–013, Boston University, Computer Science Department, Dec. 1986.
Google Scholar
R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD'96), pages 1–12, Montreal, Canada, 1996.
Google Scholar
H. Toivonen. Sampling large databases for association rules. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), pages 134–145, Mumbay, India, Sept. 1996. Morgan Kaufmann.
Google Scholar
D. A. White and R. Jain. Algorithms and strategies for similarity retrieval. Technical Report VCL-96-101, Visual Computing Laboratory, University of California, San Diego, 9500 Gilman Drive, Mail Code 0407, La Jolla, CA 92093-0407, July 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Uiversity of Helsinki, FIN-00014, Helsinki, Finland
Heikki Mannila

Authors

Heikki Mannila
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Foto Afrati Phokion Kolaitis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mannila, H. (1996). Methods and problems in data mining. In: Afrati, F., Kolaitis, P. (eds) Database Theory — ICDT '97. ICDT 1997. Lecture Notes in Computer Science, vol 1186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62222-5_35

Download citation

DOI: https://doi.org/10.1007/3-540-62222-5_35
Published: 03 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62222-2
Online ISBN: 978-3-540-49682-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics