A Unified Semi-supervised Framework for Author Disambiguation in Academic Social Network

Wang, Peng; Zhao, Jianyu; Huang, Kai; Xu, Baowen

doi:10.1007/978-3-319-10085-2_1

Peng Wang^20,22,23,
Jianyu Zhao²⁰,
Kai Huang²⁰ &
…
Baowen Xu^21,22

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8645))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1441 Accesses
3 Citations

Abstract

This paper addresses the author disambiguation problem in academic social network, namely, resolves the phenomenon of synonym problem “multiple names refer to one person” and polysemy problem “one name refers to multiple persons”. A unified semi-supervised framework is proposed to deal with both the synonym and polysemy problems. First, the framework uses semi-supervised approach to solve the cold-start problem in author disambiguation. Second, robust training data generating method based on multi-aspect similarity indicator is used and a way based on support vector machine is employed to model different kinds of feature combinations. Third, a self-taught procedure is proposed to solve ambiguity in coauthor information to boost the performances from other models. The proposed framework is verified on a large-scale real-world dataset, and obtains promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chang, C.H., Kayed, M., et al.: A survey of web information extraction systems. IEEE Trans. on Knowledge and Data Engineering 18(10), 1411–1428 (2006)
Google Scholar
Ferreira, A.A., Gonalves, M.A., Laender, A.H.: A brief survey of automatic methods for author name disambiguation. ACM SIGMOD Record 41(2), 15–26 (2012)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Yin, X., Han, J., Yu, P.S.: Object distinction: Distinguishing objects with identical names. In: Proceedings of ICDE 2007, Istanbul, Turkey (2007)
Google Scholar
Kanani, P., McCallum, A.: Efficient strategies for improving partitioning-based author coreference by incorporating web pages as graph nodes. In: Proceedings of AAAI 2007 Workshop on Information Integration on the Web, Canada (2007)
Google Scholar
Qian, Y., Hu, Y., Cui, J., Zheng, Q., et al.: Combining machine learning and human judgment in author disambiguation. In: Proceedings of the CIKM 2011, Glasgow, UK (2011)
Google Scholar
Tang, J., Fong, A.C.M., et al.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. on Knowledge and Data Engineering 24(6), 975–987 (2012)
Article Google Scholar
Gurney, T., Horlings, E., Besselaar, P.V.D.: Author disambiguation using multi-aspect similarity indicators. Scientometrics 91(2), 435–449 (2012)
Article Google Scholar
Tan, Y.F., Kan, M.Y., Lee, D.: Search engine driven author disambiguation. In: Proceedings of JCDL 2006, USA (2006)
Google Scholar
Minkov, E., Cohen, W.W., Ng, A.Y.: Ucontextual search and name disambiguation in email using graphs. In: Proceedings of SIGIR 2006 (2006)
Google Scholar
Bekkerman, R., McCallum, A.: Disambiguating web appearances of people in a social network. In: Proceedings of WWW 2005 (2005)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Darling, W.M.: A theoretical and practical implementation tutorial on topic modeling and gibbs sampling. In: Proceedings of ACL 2011 (2011)
Google Scholar
Blondel, V.D., Guillaume, J.L., et al.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008(10), 10008 (2008)
Article Google Scholar
Breunig, M.M., Kriegel, H.P., et al.: Lof: identifying density-based local outliers. ACM Sigmod Record 29(2), 93–104 (2000)
Article Google Scholar
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3), 27 (2011)
Article Google Scholar
Tarjan, R.E., Leeuwen, J.V.: Worst-case analysis of set union algorithms. Journal of the ACM 31(2), 245–281 (1984)
Article MATH Google Scholar
Roy, B.S., Cock, D.M., Mandava, V., et al.: The microsoft academic search dataset and kdd cup 2013. In: KDD Cup 2013 Workshop, Chicago, USA (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, China
Peng Wang, Jianyu Zhao & Kai Huang
State Key Laboratory for Novel Software Technology, Nanjing University, China
Baowen Xu
State Key Laboratory of Software Engineering, Wuhan University, China
Peng Wang & Baowen Xu
Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, China
Peng Wang

Authors

Peng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Kai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Baowen Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Tecnológico de Informática, 46022, Valencia, Spain
Hendrik Decker
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, 166 27, Prague 6, Czech Republic
Lenka Lhotská
Department of Computer Science, The University of Auckland, 1010, Auckland, New Zealand
Sebastian Link
Knowledge Management, LMU University of Munich, Leopoldstraße 13, 80802, Munich, Germany
Marcus Spies
FAW, University of Linz, Altenbergerstrasse 69, 4040, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, P., Zhao, J., Huang, K., Xu, B. (2014). A Unified Semi-supervised Framework for Author Disambiguation in Academic Social Network. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8645. Springer, Cham. https://doi.org/10.1007/978-3-319-10085-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-10085-2_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10084-5
Online ISBN: 978-3-319-10085-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics