Analysis of Text-Enriched Heterogeneous Information Networks
- 1 Citations
- 3.4k Downloads
Abstract
This chapter addresses the analysis of information networks, focusing on heterogeneous information networks with more than one type of nodes and arcs. After an overview of tasks and approaches to mining heterogeneous information networks, the presentation focuses on text-enriched heterogeneous information networks whose distinguishing property is that certain nodes are enriched with text information. A particular approach to mining text-enriched heterogeneous information networks is presented that combines text mining and network mining approaches. The approach decomposes a heterogeneous network into separate homogeneous networks, followed by concatenating the structural context vectors calculated from separate homogeneous networks with the bag-of-words vectors obtained from textual information contained in certain network nodes. The approach is show-cased on the analysis of two real-life text-enriched heterogeneous citation networks.
Keywords
Heterogeneous Information Networks Homogeneous Network Minimum Term Frequency Link Prediction Page RankNotes
Acknowledgments
The presented work was partially supported by the European Commission through the Human Brain Project (Grant number 604102) and by the Slovenian Research Agency project “Development and applications of new semantic data mining methods in life sciences” (Grant number J2-5478).
References
- 1.Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25(3), 211–230 (2003)Google Scholar
- 2.Barabási, A.L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., Vicsek, T.: Evolution of the social network of scientific collaborations. Phys. A: Stat. Mech. Appl. 311(3–4), 590–614 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
- 3.Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI (1997)Google Scholar
- 4.Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
- 5.Burt, R., Minor, M.: Applied Network Analysis: a Methodological Introduction. Sage PublicationsGoogle Scholar
- 6.Chen, B., Ding, Y., Wild, D.J.: Assessing drug target association using semantic linked data. PLoS Comput. Biol. 8(7), (2012)Google Scholar
- 7.Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining pubmed abstracts. BMC Bioinf. 5, 147 (2004)CrossRefGoogle Scholar
- 8.Cichocki, A.: Era of big data processing: a new approach via tensor networks and tensor decompositions (2014)Google Scholar
- 9.Consortium. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat. Genet. 25(1), 25–29 (2000)Google Scholar
- 10.Crestani, F.: Application of spreading activation techniques in information retrieval. Artif. Intell. Rev. 11(6), 453–482 (1997)CrossRefGoogle Scholar
- 11.Davis, D., Lichtenwalter, R., Chawla, N.V.: Multi-relational link prediction in heterogeneous information networks. In: Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 281–288 (2011)Google Scholar
- 12.Dutkowski, J., Ideker, T.: Protein networks as logic functions in development and cancer. PLoS Comput. Biol. 7(9), (2011)Google Scholar
- 13.Grcar, M., Trdin, N., and Lavrac, N. A methodology for mining document-enriched heterogeneous information networks. The Computer Journal, 56(3), 321–335 (2013)Google Scholar
- 14.Hofree, M., Shen, J.P., Carter, H., Gross, A., Ideker, T.: Network-based stratification of tumor mutations. Nat. Meth. 10(11), 1108–1115 (2013)CrossRefGoogle Scholar
- 15.Hwang, T., Kuang, R.: A heterogeneous label propagation algorithm for disease gene discovery. In: Proceedings of SIAM International Conference on Data Mining, pp. 583–594 (2010)Google Scholar
- 16.Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543 (2002). ACMGoogle Scholar
- 17.Jenssen, T.-K., Laegreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28(1), 21–28 (2001)Google Scholar
- 18.Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Proceedings of the 25th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 570–586 (2010)Google Scholar
- 19.Joachims, T., Finley, T., Yu, C.-N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1), 27–59 (2009)CrossRefzbMATHGoogle Scholar
- 20.Kanehisa, M., Goto, S.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)CrossRefGoogle Scholar
- 21.Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
- 22.Kok, S., Domingos, P.: Extracting semantic networks from text via relational clustering. In: Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases—Part I, ECML PKDD ’08, pp. 624–639. Springer, Heidelberg (2008)Google Scholar
- 23.Kondor, R.I., Lafferty, J.D.: Diffusion kernels on graphs and other discrete input spaces. In: Proceedings of the 19th International Conference on Machine Learning, pp. 315–322 (2002)Google Scholar
- 24.Kralj, J., Valmarska, A., Robnik Šikonja, M., Lavrač, N.: Mining text enriched heterogeneous citation networks. In: Proceedings of the 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining (2015)Google Scholar
- 25.Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefGoogle Scholar
- 26.Lytras, M., Sheth, A.: Progressive Concepts for Semantic Web Evolution: Applications and Developments. IGI Global (2010)Google Scholar
- 27.Newman, M.: Clustering and preferential attachment in growing networks. Phys. Rev. E 64(2), 025102 (2001a)CrossRefGoogle Scholar
- 28.Newman, M.E.J.: The structure of scientific collaboration networks. Proc. Natl Acad. Sci. USA 98(2), 404–409 (2001b)MathSciNetCrossRefzbMATHGoogle Scholar
- 29.Nickel, M.: Tensor Factorization for Relational Learning. PhD thesis, Ludwig–Maximilians–Universitaet Muenchen (2013)Google Scholar
- 30.Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing Order to the web. Technical report, Stanford InfoLab (1999)Google Scholar
- 31.Plantie, , M., Crampes, M.: Survey on social community detection. In: Ramzan, N., Zwol, R., Lee, J.-S., Cluver, K., Hua, X.-S. (eds) Social Media Retrieval, Computer Communications and Networks, pp. 65–85. Springer, London (2013)Google Scholar
- 32.Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. J. Mach. Learn. Res. 9, 2491–2521 (2008)MathSciNetzbMATHGoogle Scholar
- 33.Storn, R., Price, K.: Differential evolution; a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
- 34.Sun, Y., Han, J.: Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan and Claypool Publishers (2012)Google Scholar
- 35.Sun, Y., Han, J., Zhao, P., Yin, Z., Cheng, H., Wu, T.: RankClus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the International Conference on Extending Data Base Technology, pp. 565–576 (2009a)Google Scholar
- 36.Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806 (2009b)Google Scholar
- 37.Van Landeghem, S., De Bodt, S., Drebert, Z.J., Inze, D., Van de Peer, Y.: The potential of text mining in data integration and network biology for plant research: a case study on arabidopsis. Plant Cell 25(3), 794–807 (2013)CrossRefGoogle Scholar
- 38.Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., Sharan, R.: Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6(1), (2010)Google Scholar
- 39.Vervliet, N., Debals, O., Sorber, L., De Lathauwer, L.: Breaking the curse of dimensionality using decompositions of incomplete tensors: tensor-based scientific computing in big data analysis. Sign. Process. Mag. IEEE 31(5), 71–79 (2014)CrossRefGoogle Scholar
- 40.Watts, D.J., Strogatz, S.H.: Collective dynamics of ’small-world’ networks. Nature 393(6684), 440–442 (1998)CrossRefGoogle Scholar
- 41.Yang, B., Liu, D., Liu, J.: Discovering communities from social networks: methodologies and applications. In: Handbook of Social Network Technologies and Applications, pp. 331–346. Springer, Heidelberg (2010)Google Scholar
- 42.Zachary, W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)CrossRefGoogle Scholar
- 43.Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16(16), 321–328 (2004)Google Scholar