Discretization Numbers for Multiple-Instances Problem in Relational Database

Alfred, Rayner; Kazakov, Dimitar

doi:10.1007/978-3-540-75185-4_6

Rayner Alfred^1,2 &
Dimitar Kazakov¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4690))

Included in the following conference series:

East European Conference on Advances in Databases and Information Systems

586 Accesses
8 Citations

Abstract

Handling numerical data stored in a relational database is different from handling those numerical data stored in a single table due to the multiple occurrences of an individual record in the non-target table and non-determinate relations between tables. Most traditional data mining methods only deal with a single table and discretize columns that contain continuous numbers into nominal values. In a relational database, multiple records with numerical attributes are stored separately from the target table, and these records are usually associated with a single structured individual stored in the target table. Numbers in multi-relational data mining (MRDM) are often discretized, after considering the schema of the relational database, in order to reduce the continuous domains to more manageable symbolic domains of low cardinality, and the loss of precision is assumed to be acceptable. In this paper, we consider different alternatives for dealing with continuous attributes in MRDM. The discretization procedures considered in this paper include algorithms that do not depend on the multi-relational structure of the data and also that are sensitive to this structure. In this experiment, we study the effects of taking the one-to-many association issue into consideration in the process of discretizing continuous numbers. We implement a new method of discretization, called the entropyinstance-based discretization method, and we evaluate this discretization method with respect to C4.5 on three varieties of a well-known multi-relational database (Mutagenesis), where numeric attributes play an important role. We demonstrate on the empirical results obtained that entropy-based discretization can be improved by taking into consideration the multiple-instance problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alfred, R., Kazakov, D.: Weighted Pattern-Based Transformation Approach to Relational Data Mining. In: Proc of ICAIET 2006, Kota Kinabalu, Sabah, Malaysia (November 2006)
Google Scholar
Alfred, R., Kazakov, D.: Data Summarization Approach to Relational Domain Learning Based on Frequent Pattern to Support the Development of Decision Making. In: Li, X., Zaïane, O.R., Li, Z. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 889–898. Springer, Heidelberg (2006)
Chapter Google Scholar
Alfred, R., Kazakov, D.: Pattern-Based Transformation Approach to Relational Domain Learning Using DARA. In: the Proc DMIN 2006, USA, pp. 296–302 (2006)
Google Scholar
Srinivasan, A., Muggleton, S.H., Sternberg, M.J.E., King, R.D.: Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intelligence 85 (1996)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Alamitos, California
Google Scholar
Kramer, S., Lavrač, N., Flach, P.: Propositionalization approaches to relational data mining. In: Dzeroski, S., Lavrač, N. (eds.) Relational Data mining, Springer, Heidelberg (2001)
Google Scholar
Salton, G., Michael, J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)
Google Scholar
Bezdek, J.C.: Some new indexes of cluster validiy. IEEE Transaction System, Man, Cybern. B 28, 301–315 (1998)
Article Google Scholar
Boley, D.: Principal direction divisive partitioning. Data Mining and Knowledge Discovery 2(4), 325–344 (1998)
Article Google Scholar
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufman, San Francisco (1999)
Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 94–105. ACM Press, New York (1998)
Chapter Google Scholar
Hofmann, T., Buhnmann, J.M.: Active data clustering. In: Advance in Neural Information Processing System (1998)
Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
MATH Google Scholar
Van Laer, W., De Raedt, L., Deroski, S.: On multi-class problems and discretization in inductive logic programming. In: Raś, Z.W., Skowron, A. (eds.) ISMIS 1997. LNCS, vol. 1325, Springer, Heidelberg (1997)
Google Scholar
Kohavi, R., Sahami, M.: Error-based and entropy-based discretisation of continuous features. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, AAAI Press (1996)
Google Scholar
Perner, P., Trautzsch, S.: Multi-interval discretization methods for decision tree learning. In: Advances in Pattern Recognition, Joint IAPR International Workshops SSPR ’98 and SPR 1998, pp. 475–482 (1998)
Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous valued attributes for classification learning. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)
Google Scholar
Srinivasan, A., Muggleton, S., King, R.: Comparing the use of background knowledge by inductive logic programming systems. In: Proceedings of the 5th International Workshop on Inductive Logic Programming (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

University of York, Computer Science Department, Heslington, YO105DD York, United Kingdom
Rayner Alfred & Dimitar Kazakov
On Study Leave from Universiti Malaysia Sabah, School of Engineering and Information Technology, 88999, Kota Kinabalu, Sabah, Malaysia
Rayner Alfred

Authors

Rayner Alfred
View author publications
You can also search for this author in PubMed Google Scholar
Dimitar Kazakov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Yannis Ioannidis Boris Novikov Boris Rachev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alfred, R., Kazakov, D. (2007). Discretization Numbers for Multiple-Instances Problem in Relational Database. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds) Advances in Databases and Information Systems. ADBIS 2007. Lecture Notes in Computer Science, vol 4690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75185-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-75185-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75184-7
Online ISBN: 978-3-540-75185-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics