Skip to main content

Relevance Measures for Multivalued Attributes in Multiclass Datasets

  • Conference paper
  • First Online:
Advances in Artificial Intelligence -- IBERAMIA 2014 (IBERAMIA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8864))

Included in the following conference series:

Abstract

An important step in the knowledge discovery in databases (KDD) process is the attribute selection procedure, which aims at choosing a subset of attributes that can represent the important information within the data. Most of the existing attribute selection methods can only handle simple attribute types, such as categorical and numerical. In particular, these methods cannot be applied to multivalued attributes, which are attributes that take multiple values simultaneously for the same instance in the dataset. This article proposes two relevance measures for multivalued attributes, which aim at measuring their importance for classification. The proposed measures are adaptations of two widely used relevance measures for categorical attributes: information gain and gain ratio. In order to evaluate the proposed measures, experiments were conducted with multiclass datasets submitted to multi-relational classifiers. The experiments show that the proposed measures are good indicators of the relevance of multivalued attributes for multiclass classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D.W.: Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies 36(2), 267–287 (1992)

    Article  Google Scholar 

  2. Bache, K., Lichman, M.: UCI machine learning repository (2013), http://archive.ics.uci.edu/ml

  3. Dehaspe, L., Toivonen, H.: Discovery of relational association rules. In: Dĕzeroski, S. (ed.) Relational Data Mining, pp. 189–208. Springer, New York (2001)

    Chapter  Google Scholar 

  4. Deng, H., Runger, G., Tuv, E.: Bias of Importance Measures for Multi-valued Attributes and Solutions. In: Honkela, T. (ed.) ICANN 2011, Part II. LNCS, vol. 6792, pp. 293–300. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Deza, M.M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2009)

    Google Scholar 

  6. Duda, R., Hart, P., Stork, D.: Pattern Classification and Scene Analysis. John Willey and Sons, New York (2001)

    Google Scholar 

  7. Dzeroski, S.: Multi-relational data mining: an introduction. SIGKDD Explorations Newsletter 5(1), 1–16 (2003)

    Article  Google Scholar 

  8. Dzeroski, S., Lavrac, N.: Relational Data Mining, 1st edn. Springer, Secaucus (2001)

    Book  MATH  Google Scholar 

  9. Elmasri, R., Navathe, S.B.: Fundamentals of Database System, 6th edn. Addison-Wesley, USA (2010)

    Google Scholar 

  10. Emde, W., Wettschereck, D.: Multi-relational data mining using probabilistic relational models: research summary. In: Proceedings of the Workshop in Multi-relational Data Mining, Freiburg (2001)

    Google Scholar 

  11. Garriga, G.C., Khardon, R., De Raedt, L.: On mining closed sets in multi-relational data. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 804–809. Morgan Kaufmann Publishers Inc., San Francisco (2007)

    Google Scholar 

  12. Goethals, B., Page, W., Mampaey, M.: Mining interesting sets and rules in relational databases. In: Proceedings of the ACM Symposium on Applied Computing, pp. 997–1001. ACM, New York (2010)

    Google Scholar 

  13. Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15(3), 1437–1447 (2003)

    Article  Google Scholar 

  14. Harris, E.: Information gain versus gain ratio: a study of split method biases. In: Proceedings of International Symposium on Artificial Intelligence and Mathematics (2002)

    Google Scholar 

  15. IBGE: Instituto Brasileiro de Geografia e Estatística (2008), http://loja.ibge.gov.br/

  16. Kalousis, A., Woznica, A., Hilario, M.: A unifying framework for relational distance-based learning. Tech. rep., University of Geneva, Switzerland, (2005)

    Google Scholar 

  17. Kersting, K., De Raedt, L.: Interpreting bayesian logic programs. In: Proceedings of the Work-in-Progress Track at the International Conference on Inductive Logic Programming, Szeged, Hungary, pp. 138–155 (2001)

    Google Scholar 

  18. Kramer, S., Lavrac, N., Flach, P.: Propositionalization approaches to relational data mining. In: Dĕzeroski, S. (ed.) Relational Data Mining, pp. 262–286. Springer, New York (2001)

    Chapter  Google Scholar 

  19. Leiva, H.: MRDTL: a Multi-Relational Decision Tree Learning Algorithm. Master’s thesis, Iowa State University, Ames, USA (2002)

    Google Scholar 

  20. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell (1998)

    Book  MATH  Google Scholar 

  21. Nijssen, S., Jimenez, A., Guns, T.: Constraint-based pattern mining in multi-relational databases. In: Proceedings of the 11th IEEE International Conference on Data Mining Workshops, pp. 1120–1127 (2011)

    Google Scholar 

  22. Perlich, C., Provost, F.: Distribution-based aggregation for relational learning from identifier attributes. Machine Learning 62(1–2), 65–105 (2006)

    Article  Google Scholar 

  23. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  24. Siebes, A., Koopman, A.: Discovering relational item sets efficiently. In: Proceedings of the SIAM International Conference on Data Mining, pp. 108–119 (2008)

    Google Scholar 

  25. Spyropoulou, E., De Bie, T., Boley, M.: Interesting pattern mining in multi-relational data. Data Mining and Knowledge Discovery 28(3), 808–849 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  26. Tasca, M., Zadrozny, B., Plastino, A.: A relevance measure for multivalued attributes. Journal of Information and Data Management 4(3), 421–436 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mariana Tasca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tasca, M., Zadrozny, B., Plastino, A. (2014). Relevance Measures for Multivalued Attributes in Multiclass Datasets. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12027-0_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12026-3

  • Online ISBN: 978-3-319-12027-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics