Abstract
In the real world, it is inevitable that some people share a name. However, the ambiguity of the author’s name has brought many difficulties to the retrieval of academic works. Existing author name disambiguation works generally rely on the feature engineering or graph topology of the academic networks (e.g., the collaboration relationships). However, the features may be costly to obtain due to the availability or privacy of data. What’s more, the simple relational data cannot capture the rich semantics underlying the heterogeneous academic graphs. Therefore, in this paper, we study the problem of author name disambiguation in the setting of heterogeneous information network, and a novel network representation learning based author name disambiguation method is proposed. Firstly, we extract the heterogeneous information networks and meta-path channels based on the selected meta-paths. Secondly, two meta-path based proximities are proposed to measure the neighboring and structural similarities between nodes. Thirdly, the embeddings of various types of nodes are sampled and jointly updated according to the extracted meta-path channels. Finally, the disambiguation task is completed by employing an effective clustering method on the generated paper related vector space. Experimental results based on well-known Aminer dataset show that the proposed method can obtain better results compared to state-of-the-art author name disambiguation methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The concept of meta-path channel is similar to the color channel in image processing. For example, an RGB picture can be viewed as a combination of three color channels i.e., red, green, and blue. Similarly, a heterogeneous information network can also be considered as a combination of meta-path channels which contains multiple meta-path instances with respect to different meta-paths.
- 2.
References
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT’2010, pp. 177–186. Physica-Verlag HD, Heidelberg (2010)
Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the Conference of 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, EACL 2006, 3–7 April 2006
Cen, L., Dragut, E.C., Si, L., Ouzzani, M.: Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 741–744. ACM (2013)
Chen, H., Perozzi, B., Hu, Y., Skiena, S.: HARP: hierarchical representation learning for networks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Ferreira, A.A., Gonçalves, M.A., Laender, A.H.: A brief survey of automatic methods for author name disambiguation. ACM Sigmod Rec. 41(2), 15–26 (2012)
Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, 25–29 July 2011, pp. 765–774 (2011)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Li, Y., Li, C., Chen, W.: Research on influence ranking of Chinese movie heterogeneous network based on PageRank algorithm. In: Proceedings of the 15th International Conference on Web Information Systems and Applications, pp. 344–356 (2018)
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)
Shi, C., Hu, B., Zhao, X., Yu, P.: Heterogeneous information network embedding for recommendation. IEEE Trans. Knowl. Data Eng. 31, 357–370 (2018)
Shi, C., Li, Y., Zhang, J., Sun, Y., Philip, S.Y.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017)
Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining, Boston, vol. 400, pp. 525–526 (2000)
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. International World Wide Web Conferences Steering Committee (2015)
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 990–998. ACM (2008)
Zhang, B., Al Hasan, M.: Name disambiguation in anonymized graphs using network embedding. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1239–1248. ACM (2017)
Zhang, B., Saha, T.K., Al Hasan, M.: Name disambiguation from link data in a collaboration graph. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 81–84. IEEE (2014)
Zhang, Y., Zhang, F., Yao, P., Tang, J.: Name disambiguation in AMiner: clustering, maintenance, and human in the loop. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1002–1011. ACM (2018)
Acknowledgements
This research is funded by the National Natural Science Foundation of China under grant No. 61802440 and No. 61702553. We are also supported by the Fundamental Research Funds for the Central Universities, ZUEL: 2722019JCT037 and the Opening Project of State Key Laboratory of Digital Publishing Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, X., Wang, R., Zhang, Y. (2019). Author Name Disambiguation in Heterogeneous Academic Networks. In: Ni, W., Wang, X., Song, W., Li, Y. (eds) Web Information Systems and Applications. WISA 2019. Lecture Notes in Computer Science(), vol 11817. Springer, Cham. https://doi.org/10.1007/978-3-030-30952-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-30952-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30951-0
Online ISBN: 978-3-030-30952-7
eBook Packages: Computer ScienceComputer Science (R0)