Abstract
Measuring relatedness between objects (nodes) in a heterogeneous network is a challenging and an interesting problem. Many people transform a heterogeneous network into a homogeneous network before applying a similarity measure. However, such transformation results in information loss as path semantics are lost. In this paper, we study the problem of measuring relatedness between objects in a heterogeneous network using only link information and propose a meta-path based novel measure for relevance measurement in a general heterogeneous network with a specified network schema. The proposed measure is semi-metric and incorporates the path semantics by following the specified meta-path. For relevance measurement, using the specified meta-path, the given heterogeneous network is converted into a bipartite network consisting only of source and target type objects between which relatedness is to be measured. In order to validate the effectiveness of the proposed measure, we compared its performance with existing relevance measures which are semi-metric and applicable to heterogeneous networks. To show the viability and the effectiveness of the proposed measure, experiments were performed on real world bibliographic dataset DBLP. Experimental results show that the proposed measure effectively measures the relatedness between objects in a heterogeneous network and it outperforms earlier measures in clustering and query task.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Huang, Y., Gao, X.: Clustering on heterogeneous networks. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 4(3), 213–233 (2014)
Sun, Y., Han, J.: Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explor. Newsl. 14(2), 20–28 (2013)
Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: meta path-based top-k similarity search in heterogeneous information networks. In: VLDB (2011)
Shi, C., Kong, X., Huang, Y., Philip, S.Y., Wu, B.: HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 26(10), 2479–2492 (2014)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, London (2009)
Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 570–586. Springer, Heidelberg (2010)
Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543. ACM (2002)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford University Database Group (1998)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Kumar, P., Raju, B.S., Radha Krishna, P.: A new similarity metric for sequential data. Int. J. Data Warehouse. Min. 6(4), 16–32 (2010)
Lao, N., Cohen, W.W.: Relational retrieval using a combination of path-constrained random walks. Mach. Learn. 81(1), 53–67 (2010)
Meng, X., Shi, C., Li, Y., Zhang, L., Wu, B.: Relevance measure in large-scale heterogeneous networks. In: Chen, L., Jia, Y., Sellis, T., Liu, G. (eds.) APWeb 2014. LNCS, vol. 8709, pp. 636–643. Springer, Heidelberg (2014)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gupta, M., Kumar, P., Bhasker, B. (2015). A New Relevance Measure for Heterogeneous Networks. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2015. Lecture Notes in Computer Science(), vol 9263. Springer, Cham. https://doi.org/10.1007/978-3-319-22729-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-22729-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22728-3
Online ISBN: 978-3-319-22729-0
eBook Packages: Computer ScienceComputer Science (R0)