Abstract
Existing research in machine learning and data mining has been focused on finding rules or regularities among the data cases. Recently, it was shown that those associations that are missing in data may also be interesting. These missing associations are the holes or empty regions. The existing algorithm for discovering holes has a number of shortcomings. It requires each hole to contain no data point, which is too restrictive for many real-life applications. It also has a very high complexity, and produces a huge number of holes. Additionally, the algorithm only works in a continuous space, and does not allow any discrete/nominal attribute. These drawbacks limit its applications. In this paper, we propose a novel approach to overcome these shortcomings. This approach transforms the holes-discovery problem into a supervised learning task, and then uses the decision tree induction technique for discovering holes in data.
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., and Srikant R. 1994. Fast algorithms for mining association rules. VLDB-94, 1994.
Chazelle, B., Drysdale, R. L., and Lee, D. T. 1986. Computing the largest empty rectangle. SIAM Journal of Computing, 15(1) 300–315.
Falkenhainer, F., and Michalski, R. 1986. Integrating quantitative and qualitative discovery: the ABACUS system. Machine Learning, 1(4):367–401.
Fayyad, U., Piatesky-Shapiro, G., and Smyth, P. 1996. From data mining to knowledge discovery in databases. AI Magazine 37–54.
Fisher D. 1987. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172.
Langley, P., Simon, H., Bradshaw, G., and Zytkow, Jan. 1987. Scientific discovery: computational explorations of the creative process, The MIT press.
Liu, B., Ku, L. P., and Hsu, W. 1997. Discovering Interesting Holes in Data. IJCAI-97, 930–935.
Merz, C. J. and Murphy, P. 1996. UCI repository of machine learning database [http://www.cs.uci.edu/~mlearn/MLRepository.html].
Mun, L. F. 1998. Discovering missing and understandable patterns in databases. MSc thesis, National University of Singapore.
Orlowski, M. 1990. A new algorithm for the largest empty rectangle problem. Algorithmica, 5:65–73.
Quinlan, J. R. 1992. C4.5: program for machine learning. Morgan Kaufmann.
Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. Chapman and Hall.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, B., Wang, K., Mun, LF., Qi, XZ. (1998). Using decision tree induction for discovering holes in data. In: Lee, HY., Motoda, H. (eds) PRICAI’98: Topics in Artificial Intelligence. PRICAI 1998. Lecture Notes in Computer Science, vol 1531. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0095268
Download citation
DOI: https://doi.org/10.1007/BFb0095268
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65271-7
Online ISBN: 978-3-540-49461-4
eBook Packages: Springer Book Archive