Journal of Intelligent Information Systems

, Volume 51, Issue 2, pp 341–365 | Cite as

Network representation with clustering tree features

  • Konstantinos PliakosEmail author
  • Celine Vens


Representing and inferring interaction networks is a challenging and long-standing problem. Modern technological advances have led to a great increase in both volume and complexity of generated network data. The size of networks such as drug protein interaction networks or gene regulatory networks is constantly growing and multiple sources of information are exploited to extract features describing the nodes in such networks. Modern information systems need therefore methods that are able to mine these networks and exploit the available features. Here, a novel data mining framework for network representation and mining is proposed. It is based on decision tree learning and ensembles of trees. The proposed scheme introduces an efficient network data representation, capable of addressing different data types, tackling as well data volume and complexity. The learning process follows the inductive setup and it can be performed in both a supervised or unsupervised manner. Experiments were conducted on six biomedical network datasets. The experimental evaluation demonstrates the merits of the proposed approach, confirming its efficiency.


Tree-ensembles Extremely randomized trees Interaction data representation Biomedical network mining Graph embedding 


  1. Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G. (2008). Finding high-quality content in social media. In Proceedings of ACM international conference on Web search and data mining (pp. 183–194).Google Scholar
  2. Asuncion, A., & Newman, D. UCI machine learning repository. [Online] Available:
  3. Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.zbMATHCrossRefGoogle Scholar
  4. Bleakley, K., Biau, G., Vert, J.P. (2007). Supervised reconstruction of biological networks with local models. Bioinformatics, 23(13), i57–i65.CrossRefGoogle Scholar
  5. Blockeel, H., & De Raedt, L. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1), 285–297.MathSciNetzbMATHCrossRefGoogle Scholar
  6. Blockeel, H., De Raedt, L., Ramon, J. (1998). Top-down induction of clustering trees. In Proceedings of the 15th international conference on machine learning (pp. 55–63).Google Scholar
  7. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.zbMATHCrossRefGoogle Scholar
  8. Burges, C.J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.CrossRefGoogle Scholar
  9. Cai, H., Zheng, V.W., Chang, K. (2018). A comprehensive survey of graph embedding: problems, techniques and applications. IEEE Transactions on Knowledge and Data Engineering.Google Scholar
  10. Cao, L.J., Chua, K.S., Chong, W.K., Lee, H.P., Gu, Q.M. (2003). A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing, 55(1), 321–336.Google Scholar
  11. Ceci, M., Pio, G., Kuzmanovski, V., Džeroski, S. (2015). Semi-supervised multi-view learning for gene network reconstruction. PloS One, 10(12), e0144031.CrossRefGoogle Scholar
  12. Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., Gardner, T.S. (2007). Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology, 5(1), e8.CrossRefGoogle Scholar
  13. Geurts, P., Ernst, D., Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.zbMATHCrossRefGoogle Scholar
  14. Geurts, P., Irrthum, A., Wehenkel, L. (2009). Supervised learning with decision tree-based methods in computational and systems biology. Molecular Biosystems, 5 (12), 1593–1605.CrossRefGoogle Scholar
  15. Hase, T., Ghosh, S., Yamanaka, R., Kitano, H. (2013). Harnessing diversity towards the reconstructing of large scale gene regulatory networks. PLoS Computational Biology, 9(11), e1003361.CrossRefGoogle Scholar
  16. He, H, & Garcia, E.A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge Data Engineering, 21(9), 1263—1284.Google Scholar
  17. Irrthum, A., Wehenkel, L., Geurts, P. (2010). Inferring regulatory networks from expression data using tree-based methods. PloS One, 5(9), e12776.CrossRefGoogle Scholar
  18. Joly, A., Geurts, P., Wehenkel, L. (2014). Random forests with random projections of the output space for high dimensional multi-label classification. In Machine learning and knowledge discovery in databases (ECML PKDD) (pp. 607–622). Nancy.Google Scholar
  19. Kocev, D., & Ceci, M. (2015). Ensembles of extremely randomized trees for multi-target regression. In Japkowicz, N., & Matwin, S. (Eds.) Discovery science. Lecture notes in computer science, Vol. 9356. Cham: Springer.Google Scholar
  20. Kocev, D., Vens, C., Struyf, J., Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.CrossRefGoogle Scholar
  21. Lanckriet, G.R., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72.MathSciNetzbMATHGoogle Scholar
  22. MacIsaac, K.D., Wang, T., Gordon, D.B., Gifford, D.K., Stormo, G.D., Fraenkel, E. (2006). An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics, 7(1), 1.CrossRefGoogle Scholar
  23. Maetschke, S.R., Madhamshettiwar, P.B., Davis, M.J., Ragan, M.A. (2014). Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Briefings in Bioinformatics, 15(2), 195–211.CrossRefGoogle Scholar
  24. Marbach, D., Costello, J.C., Küffner, R., Vega, N.M., Prill, R.J., Camacho, D.M., Allison, K. (2012). The DREAM5 Consortium, Kellis M., Collins J. J., Stolovitzky G.: Wisdom of crowds for robust gene network inference. Nature Methods, 9(8), 796–804.CrossRefGoogle Scholar
  25. Moosmann, F., Triggs, B., Jurie, F. (2006). Fast discriminative visual codebooks using randomized clustering forests. In Proceedings of the 20th conference on neural information processing systems (NIPS) (pp. 985–992).Google Scholar
  26. Moosmann, F., Triggs, B., Jurie, F. (2008). Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9), 1632–1646.CrossRefGoogle Scholar
  27. Park, Y., & Marcotte, E.M. (2012). Flaws in evaluation schemes for pair-input computational predictions. Nature Methods, 9(12), 1134–1136.CrossRefGoogle Scholar
  28. Pio, G., Ceci, M., Malerba, D., D’Elia, D. (2015). ComiRNet: a web-based system for the analysis of miRNA-gene regulatory networks. BMC Bioinformatics, 16 (9), S7.CrossRefGoogle Scholar
  29. Pliakos, K., & Vens, C. (2017). Feature induction and network mining with clustering tree ensembles. New frontiers in mining complex patterns. (NFMCP 2016). Lecture Notes in Computer Science, 10312, 3–18.CrossRefGoogle Scholar
  30. Pliakos, K., & Vens, C. (2018). Mining Features for Biomedical Data using Clustering Tree Ensembles (under review).Google Scholar
  31. Roweis, S.T., & Saul, L.K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRefGoogle Scholar
  32. Schölkopf, B., Smola, A., Müller, K.R. (1997). Kernel principal component analysis. In International conference on artificial neural networks (pp. 583–588).Google Scholar
  33. Schrynemackers, M., Kuener, R., Geurts, P. (2013). On protocols and measures for the validation of supervised methods for the inference of biological networks. Frontiers in Genetics, 4, 262.CrossRefGoogle Scholar
  34. Schrynemackers, M., Wehenkel, L., Babu, M.M., Geurts, P. (2015). Classifying pairs with trees for supervised biological network inference. Molecular BioSystems, 11 (8), 2116–2125.CrossRefGoogle Scholar
  35. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.zbMATHCrossRefGoogle Scholar
  36. Stojanova, D., Ceci, M., Malerba, D., Dzeroski, S. (2013). Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction. BMC Bioinformatics, 14(1), 285.CrossRefGoogle Scholar
  37. Sun, Y., & Han, J. (2012). Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery, 3 (2), 1–159.CrossRefGoogle Scholar
  38. Sun, Y., & Han, J. (2013). Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explorations Newsletter, 14(2), 20–28.CrossRefGoogle Scholar
  39. Tenenbaum, J.B., De Silva, V., Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.CrossRefGoogle Scholar
  40. Van Der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.zbMATHGoogle Scholar
  41. Van Der Maaten, L., Postma, E., Van den Herik, J. (2009). Dimensionality reduction: a comparative review. Journal of Machine Learning Research, 10, 66–71.Google Scholar
  42. Vens, C., & Costa, F. (2011). Random forest based feature induction. In Proceedings of IEEE 11th international conference on data mining (ICDM) (pp. 744–753).Google Scholar
  43. Vert, J.P. (2010). Reconstruction of biological networks by supervised machine learning approaches. In Elements of computational systems biology (pp. 165–188). Oxford: Wiley.Google Scholar
  44. Vert, J.P., Qiu, J., Noble, W.S. (2007). A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics, 8(10), 1.Google Scholar
  45. Wang, Y.R., & Huang, H. (2014). Review on statistical methods for gene network reconstruction using expression data. Journal of Theoretical Biology, 362, 53–61.zbMATHCrossRefGoogle Scholar
  46. Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W., Kanehisa, M. (2008). Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13), i232–i240.CrossRefGoogle Scholar
  47. Yan, S., Xu, D., Zhang, B., Zhang, H.J., Yang, Q., Lin, S. (2007). Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 40–51.CrossRefGoogle Scholar
  48. Zhang, M., & Wu, L. (2015). LIFT: Multi-Label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 107–120.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Public Health and Primary CareKU Leuven, Campus KULAKKortrijkBelgium

Personalised recommendations