Skip to main content

Exploiting Label Dependency for Hierarchical Multi-label Classification

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7301))

Included in the following conference series:

Abstract

Hierarchical multi-label classification is a variant of traditional classification in which the instances can belong to several labels, that are in turn organized in a hierarchy. Existing hierarchical multi-label classification algorithms ignore possible correlations between the labels. Moreover, most of the current methods predict instance labels in a “flat” fashion without employing the ontological structures among the classes. In this paper, we propose HiBLADE (Hierarchical multi-label Boosting with LAbel DEpendency), a novel algorithm that takes advantage of not only the pre-established hierarchical taxonomy of the classes, but also effectively exploits the hidden correlation among the classes that is not shown through the class hierarchy, thereby improving the quality of the predictions. According to our approach, first, the pre-defined hierarchical taxonomy of the labels is used to decide upon the training set for each classifier. Second, the dependencies of the children for each label in the hierarchy are captured and analyzed using Bayes method and instance-based similarity. Our experimental results on several real-world biomolecular datasets show that the proposed method can improve the performance of hierarchical multi-label classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alaydie, N., Reddy, C.K., Fotouhi, F.: Hierarchical boosting for gene function prediction. In: Proceedings of the 9th International Conference on Computational Systems Bioinformatics (CSB), Stanford, CA, USA, pp. 14–25 (August 2010)

    Google Scholar 

  2. Alaydie, N., Reddy, C.K., Fotouhi, F.: A Bayesian Integration Model of Heterogeneous Data Sources for Improved Gene Functional Inference. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM-BCB), Chicago, IL, USA, pp. 376–380 (August 2011)

    Google Scholar 

  3. Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)

    Article  Google Scholar 

  4. Bi, W., Kwok, J.: Multi-Label Classification on Tree- and DAG-Structured Hierarchies. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 17–24. ACM, New York (2011)

    Google Scholar 

  5. Cesa-Bianchi, N., Valentini, G.: Hierarchical cost-sensitive algorithms for genome-wide gene function prediction. In: Proceedings of the Third International Workshop on Machine Learning in Systems Biology, Ljubljana, Slovenia, pp. 25–34 (2009)

    Google Scholar 

  6. Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Machine Learning 76(2-3), 211–225 (2009)

    Article  Google Scholar 

  7. The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)

    Google Scholar 

  8. Deng, M., Chen, T., Sun, F.: An integrated probabilistic model for functional prediction of proteins. In: Proc. 7th Int. Conf. Comp. Mol. Biol., pp. 95–103 (2003)

    Google Scholar 

  9. Esuli, A., Fagni, T., Sebastiani, F.: Boosting multi-label hierarchical text categorization. Information Retrieval 11, 287–313 (2008)

    Article  Google Scholar 

  10. Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.O.: Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257 (2000)

    Google Scholar 

  11. Jun, G., Ghosh, J.: Multi-class Boosting with Class Hierarchies. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds.) MCS 2009. LNCS, vol. 5519, pp. 32–41. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Mostafavi, S., Morris, Q.: Using the gene ontology hierarchy when predicting gene function. In: Conference on Uncertainty in Artificial Intelligence (UAI), Montreal, Canada, pp. 22–26 (September 2009)

    Google Scholar 

  13. Palit, I., Reddy, C.K.: Scalable and Parallel Boosting with MapReduce. IEEE Transactions on Knowledge and Data Engineering, TKDE (in press, 2012)

    Google Scholar 

  14. Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier Chains for Multi-label Classification. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 254–269. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  15. Reddy, C.K., Park, J.-H.: Multi-resolution Boosting for Classification and Regression Problems. Knowledge and Information Systems (KAIS) 29(2), 435–456 (2011)

    Article  Google Scholar 

  16. Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Kernel-Based Learning of Hierarchical Multilabel Classification Models. The Journal of Machine Learning Research 7, 1601–1626 (2006)

    MathSciNet  MATH  Google Scholar 

  17. Ruepp, A., Zollner, A., Maier, D., Albermann, K., Hani, J., Mokrejs, M., Tetko, I., Güldener, U., Mannhaupt, G., Münsterkötter, M., Mewes, H.W.: The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research 32(18), 5539–5545 (2004)

    Article  Google Scholar 

  18. Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22, 31–72 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  19. Stark, C., Breitkreutz, B., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Research 34, D535–D539 (2006)

    Article  Google Scholar 

  20. Valentini, G.: True path rule hierarchical ensembles for genome-wide gene function prediction. IEEE ACM Transactions on Computational Biology and Bioinformatics 8(3), 832–847 (2011)

    Article  MathSciNet  Google Scholar 

  21. Vens, C., Struyf, J., Schietgat, L., Dz̃eroski, S., Blockeel, H.: Decision trees for hierarchical multi-label classification. Machine Learning 73, 185–214 (2008)

    Article  Google Scholar 

  22. Von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S., Fields, S., Bork, P.: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002)

    Article  Google Scholar 

  23. Yan, R., Tesic, J., Smith, J.R.: Model-Shared Subspace Boosting for Multi-label Classification. In: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), New York, NY, USA, pp. 834–843 (2007)

    Google Scholar 

  24. Zhang, M.-L., Zhang, K.: Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2010), Washington, D.C., USA, pp. 999–1007 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alaydie, N., Reddy, C.K., Fotouhi, F. (2012). Exploiting Label Dependency for Hierarchical Multi-label Classification. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7301. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30217-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30217-6_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30216-9

  • Online ISBN: 978-3-642-30217-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics