Abstract
An important step in the knowledge discovery in databases (KDD) process is the attribute selection procedure, which aims at choosing a subset of attributes that can represent the important information within the data. Most of the existing attribute selection methods can only handle simple attribute types, such as categorical and numerical. In particular, these methods cannot be applied to multivalued attributes, which are attributes that take multiple values simultaneously for the same instance in the dataset. This article proposes two relevance measures for multivalued attributes, which aim at measuring their importance for classification. The proposed measures are adaptations of two widely used relevance measures for categorical attributes: information gain and gain ratio. In order to evaluate the proposed measures, experiments were conducted with multiclass datasets submitted to multi-relational classifiers. The experiments show that the proposed measures are good indicators of the relevance of multivalued attributes for multiclass classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D.W.: Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies 36(2), 267–287 (1992)
Bache, K., Lichman, M.: UCI machine learning repository (2013), http://archive.ics.uci.edu/ml
Dehaspe, L., Toivonen, H.: Discovery of relational association rules. In: Dĕzeroski, S. (ed.) Relational Data Mining, pp. 189–208. Springer, New York (2001)
Deng, H., Runger, G., Tuv, E.: Bias of Importance Measures for Multi-valued Attributes and Solutions. In: Honkela, T. (ed.) ICANN 2011, Part II. LNCS, vol. 6792, pp. 293–300. Springer, Heidelberg (2011)
Deza, M.M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2009)
Duda, R., Hart, P., Stork, D.: Pattern Classification and Scene Analysis. John Willey and Sons, New York (2001)
Dzeroski, S.: Multi-relational data mining: an introduction. SIGKDD Explorations Newsletter 5(1), 1–16 (2003)
Dzeroski, S., Lavrac, N.: Relational Data Mining, 1st edn. Springer, Secaucus (2001)
Elmasri, R., Navathe, S.B.: Fundamentals of Database System, 6th edn. Addison-Wesley, USA (2010)
Emde, W., Wettschereck, D.: Multi-relational data mining using probabilistic relational models: research summary. In: Proceedings of the Workshop in Multi-relational Data Mining, Freiburg (2001)
Garriga, G.C., Khardon, R., De Raedt, L.: On mining closed sets in multi-relational data. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 804–809. Morgan Kaufmann Publishers Inc., San Francisco (2007)
Goethals, B., Page, W., Mampaey, M.: Mining interesting sets and rules in relational databases. In: Proceedings of the ACM Symposium on Applied Computing, pp. 997–1001. ACM, New York (2010)
Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15(3), 1437–1447 (2003)
Harris, E.: Information gain versus gain ratio: a study of split method biases. In: Proceedings of International Symposium on Artificial Intelligence and Mathematics (2002)
IBGE: Instituto Brasileiro de Geografia e Estatística (2008), http://loja.ibge.gov.br/
Kalousis, A., Woznica, A., Hilario, M.: A unifying framework for relational distance-based learning. Tech. rep., University of Geneva, Switzerland, (2005)
Kersting, K., De Raedt, L.: Interpreting bayesian logic programs. In: Proceedings of the Work-in-Progress Track at the International Conference on Inductive Logic Programming, Szeged, Hungary, pp. 138–155 (2001)
Kramer, S., Lavrac, N., Flach, P.: Propositionalization approaches to relational data mining. In: Dĕzeroski, S. (ed.) Relational Data Mining, pp. 262–286. Springer, New York (2001)
Leiva, H.: MRDTL: a Multi-Relational Decision Tree Learning Algorithm. Master’s thesis, Iowa State University, Ames, USA (2002)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell (1998)
Nijssen, S., Jimenez, A., Guns, T.: Constraint-based pattern mining in multi-relational databases. In: Proceedings of the 11th IEEE International Conference on Data Mining Workshops, pp. 1120–1127 (2011)
Perlich, C., Provost, F.: Distribution-based aggregation for relational learning from identifier attributes. Machine Learning 62(1–2), 65–105 (2006)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Siebes, A., Koopman, A.: Discovering relational item sets efficiently. In: Proceedings of the SIAM International Conference on Data Mining, pp. 108–119 (2008)
Spyropoulou, E., De Bie, T., Boley, M.: Interesting pattern mining in multi-relational data. Data Mining and Knowledge Discovery 28(3), 808–849 (2014)
Tasca, M., Zadrozny, B., Plastino, A.: A relevance measure for multivalued attributes. Journal of Information and Data Management 4(3), 421–436 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tasca, M., Zadrozny, B., Plastino, A. (2014). Relevance Measures for Multivalued Attributes in Multiclass Datasets. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-12027-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12026-3
Online ISBN: 978-3-319-12027-0
eBook Packages: Computer ScienceComputer Science (R0)