Advertisement

Analysis of Text-Enriched Heterogeneous Information Networks

  • Jan KraljEmail author
  • Anita Valmarska
  • Miha Grčar
  • Marko Robnik-Šikonja
  • Nada Lavrač
Chapter
Part of the Studies in Big Data book series (SBD, volume 16)

Abstract

This chapter addresses the analysis of information networks, focusing on heterogeneous information networks with more than one type of nodes and arcs. After an overview of tasks and approaches to mining heterogeneous information networks, the presentation focuses on text-enriched heterogeneous information networks whose distinguishing property is that certain nodes are enriched with text information. A particular approach to mining text-enriched heterogeneous information networks is presented that combines text mining and network mining approaches. The approach decomposes a heterogeneous network into separate homogeneous networks, followed by concatenating the structural context vectors calculated from separate homogeneous networks with the bag-of-words vectors obtained from textual information contained in certain network nodes. The approach is show-cased on the analysis of two real-life text-enriched heterogeneous citation networks.

Keywords

Heterogeneous Information Networks Homogeneous Network Minimum Term Frequency Link Prediction Page Rank 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

The presented work was partially supported by the European Commission through the Human Brain Project (Grant number 604102) and by the Slovenian Research Agency project “Development and applications of new semantic data mining methods in life sciences” (Grant number J2-5478).

References

  1. 1.
    Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25(3), 211–230 (2003)Google Scholar
  2. 2.
    Barabási, A.L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., Vicsek, T.: Evolution of the social network of scientific collaborations. Phys. A: Stat. Mech. Appl. 311(3–4), 590–614 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI (1997)Google Scholar
  4. 4.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  5. 5.
    Burt, R., Minor, M.: Applied Network Analysis: a Methodological Introduction. Sage PublicationsGoogle Scholar
  6. 6.
    Chen, B., Ding, Y., Wild, D.J.: Assessing drug target association using semantic linked data. PLoS Comput. Biol. 8(7), (2012)Google Scholar
  7. 7.
    Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining pubmed abstracts. BMC Bioinf. 5, 147 (2004)CrossRefGoogle Scholar
  8. 8.
    Cichocki, A.: Era of big data processing: a new approach via tensor networks and tensor decompositions (2014)Google Scholar
  9. 9.
    Consortium. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat. Genet. 25(1), 25–29 (2000)Google Scholar
  10. 10.
    Crestani, F.: Application of spreading activation techniques in information retrieval. Artif. Intell. Rev. 11(6), 453–482 (1997)CrossRefGoogle Scholar
  11. 11.
    Davis, D., Lichtenwalter, R., Chawla, N.V.: Multi-relational link prediction in heterogeneous information networks. In: Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 281–288 (2011)Google Scholar
  12. 12.
    Dutkowski, J., Ideker, T.: Protein networks as logic functions in development and cancer. PLoS Comput. Biol. 7(9), (2011)Google Scholar
  13. 13.
    Grcar, M., Trdin, N., and Lavrac, N. A methodology for mining document-enriched heterogeneous information networks. The Computer Journal, 56(3), 321–335 (2013)Google Scholar
  14. 14.
    Hofree, M., Shen, J.P., Carter, H., Gross, A., Ideker, T.: Network-based stratification of tumor mutations. Nat. Meth. 10(11), 1108–1115 (2013)CrossRefGoogle Scholar
  15. 15.
    Hwang, T., Kuang, R.: A heterogeneous label propagation algorithm for disease gene discovery. In: Proceedings of SIAM International Conference on Data Mining, pp. 583–594 (2010)Google Scholar
  16. 16.
    Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543 (2002). ACMGoogle Scholar
  17. 17.
    Jenssen, T.-K., Laegreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28(1), 21–28 (2001)Google Scholar
  18. 18.
    Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Proceedings of the 25th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 570–586 (2010)Google Scholar
  19. 19.
    Joachims, T., Finley, T., Yu, C.-N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1), 27–59 (2009)CrossRefzbMATHGoogle Scholar
  20. 20.
    Kanehisa, M., Goto, S.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)CrossRefGoogle Scholar
  21. 21.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Kok, S., Domingos, P.: Extracting semantic networks from text via relational clustering. In: Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases—Part I, ECML PKDD ’08, pp. 624–639. Springer, Heidelberg (2008)Google Scholar
  23. 23.
    Kondor, R.I., Lafferty, J.D.: Diffusion kernels on graphs and other discrete input spaces. In: Proceedings of the 19th International Conference on Machine Learning, pp. 315–322 (2002)Google Scholar
  24. 24.
    Kralj, J., Valmarska, A., Robnik Šikonja, M., Lavrač, N.: Mining text enriched heterogeneous citation networks. In: Proceedings of the 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining (2015)Google Scholar
  25. 25.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefGoogle Scholar
  26. 26.
    Lytras, M., Sheth, A.: Progressive Concepts for Semantic Web Evolution: Applications and Developments. IGI Global (2010)Google Scholar
  27. 27.
    Newman, M.: Clustering and preferential attachment in growing networks. Phys. Rev. E 64(2), 025102 (2001a)CrossRefGoogle Scholar
  28. 28.
    Newman, M.E.J.: The structure of scientific collaboration networks. Proc. Natl Acad. Sci. USA 98(2), 404–409 (2001b)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Nickel, M.: Tensor Factorization for Relational Learning. PhD thesis, Ludwig–Maximilians–Universitaet Muenchen (2013)Google Scholar
  30. 30.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing Order to the web. Technical report, Stanford InfoLab (1999)Google Scholar
  31. 31.
    Plantie, , M., Crampes, M.: Survey on social community detection. In: Ramzan, N., Zwol, R., Lee, J.-S., Cluver, K., Hua, X.-S. (eds) Social Media Retrieval, Computer Communications and Networks, pp. 65–85. Springer, London (2013)Google Scholar
  32. 32.
    Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. J. Mach. Learn. Res. 9, 2491–2521 (2008)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Storn, R., Price, K.: Differential evolution; a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Sun, Y., Han, J.: Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan and Claypool Publishers (2012)Google Scholar
  35. 35.
    Sun, Y., Han, J., Zhao, P., Yin, Z., Cheng, H., Wu, T.: RankClus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the International Conference on Extending Data Base Technology, pp. 565–576 (2009a)Google Scholar
  36. 36.
    Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806 (2009b)Google Scholar
  37. 37.
    Van Landeghem, S., De Bodt, S., Drebert, Z.J., Inze, D., Van de Peer, Y.: The potential of text mining in data integration and network biology for plant research: a case study on arabidopsis. Plant Cell 25(3), 794–807 (2013)CrossRefGoogle Scholar
  38. 38.
    Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., Sharan, R.: Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6(1), (2010)Google Scholar
  39. 39.
    Vervliet, N., Debals, O., Sorber, L., De Lathauwer, L.: Breaking the curse of dimensionality using decompositions of incomplete tensors: tensor-based scientific computing in big data analysis. Sign. Process. Mag. IEEE 31(5), 71–79 (2014)CrossRefGoogle Scholar
  40. 40.
    Watts, D.J., Strogatz, S.H.: Collective dynamics of ’small-world’ networks. Nature 393(6684), 440–442 (1998)CrossRefGoogle Scholar
  41. 41.
    Yang, B., Liu, D., Liu, J.: Discovering communities from social networks: methodologies and applications. In: Handbook of Social Network Technologies and Applications, pp. 331–346. Springer, Heidelberg (2010)Google Scholar
  42. 42.
    Zachary, W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)CrossRefGoogle Scholar
  43. 43.
    Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16(16), 321–328 (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Jan Kralj
    • 1
    • 2
    Email author
  • Anita Valmarska
    • 1
    • 2
  • Miha Grčar
    • 3
  • Marko Robnik-Šikonja
    • 3
  • Nada Lavrač
    • 1
    • 2
    • 4
  1. 1.Jožef Stefan InstituteLjubljanaSlovenia
  2. 2.Jožef Stefan International Postgraduate SchoolLjubljanaSlovenia
  3. 3.Faculty of Computer and Information ScienceLjubljanaSlovenia
  4. 4.University of Nova GoricaNova GoricaSlovenia

Personalised recommendations