Skip to main content

A Unified Semi-supervised Framework for Author Disambiguation in Academic Social Network

  • Conference paper
Database and Expert Systems Applications (DEXA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8645))

Included in the following conference series:

Abstract

This paper addresses the author disambiguation problem in academic social network, namely, resolves the phenomenon of synonym problem “multiple names refer to one person” and polysemy problem “one name refers to multiple persons”. A unified semi-supervised framework is proposed to deal with both the synonym and polysemy problems. First, the framework uses semi-supervised approach to solve the cold-start problem in author disambiguation. Second, robust training data generating method based on multi-aspect similarity indicator is used and a way based on support vector machine is employed to model different kinds of feature combinations. Third, a self-taught procedure is proposed to solve ambiguity in coauthor information to boost the performances from other models. The proposed framework is verified on a large-scale real-world dataset, and obtains promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chang, C.H., Kayed, M., et al.: A survey of web information extraction systems. IEEE Trans. on Knowledge and Data Engineering 18(10), 1411–1428 (2006)

    Google Scholar 

  2. Ferreira, A.A., Gonalves, M.A., Laender, A.H.: A brief survey of automatic methods for author name disambiguation. ACM SIGMOD Record 41(2), 15–26 (2012)

    Article  Google Scholar 

  3. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  4. Yin, X., Han, J., Yu, P.S.: Object distinction: Distinguishing objects with identical names. In: Proceedings of ICDE 2007, Istanbul, Turkey (2007)

    Google Scholar 

  5. Kanani, P., McCallum, A.: Efficient strategies for improving partitioning-based author coreference by incorporating web pages as graph nodes. In: Proceedings of AAAI 2007 Workshop on Information Integration on the Web, Canada (2007)

    Google Scholar 

  6. Qian, Y., Hu, Y., Cui, J., Zheng, Q., et al.: Combining machine learning and human judgment in author disambiguation. In: Proceedings of the CIKM 2011, Glasgow, UK (2011)

    Google Scholar 

  7. Tang, J., Fong, A.C.M., et al.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. on Knowledge and Data Engineering 24(6), 975–987 (2012)

    Article  Google Scholar 

  8. Gurney, T., Horlings, E., Besselaar, P.V.D.: Author disambiguation using multi-aspect similarity indicators. Scientometrics 91(2), 435–449 (2012)

    Article  Google Scholar 

  9. Tan, Y.F., Kan, M.Y., Lee, D.: Search engine driven author disambiguation. In: Proceedings of JCDL 2006, USA (2006)

    Google Scholar 

  10. Minkov, E., Cohen, W.W., Ng, A.Y.: Ucontextual search and name disambiguation in email using graphs. In: Proceedings of SIGIR 2006 (2006)

    Google Scholar 

  11. Bekkerman, R., McCallum, A.: Disambiguating web appearances of people in a social network. In: Proceedings of WWW 2005 (2005)

    Google Scholar 

  12. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  13. Darling, W.M.: A theoretical and practical implementation tutorial on topic modeling and gibbs sampling. In: Proceedings of ACL 2011 (2011)

    Google Scholar 

  14. Blondel, V.D., Guillaume, J.L., et al.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008(10), 10008 (2008)

    Article  Google Scholar 

  15. Breunig, M.M., Kriegel, H.P., et al.: Lof: identifying density-based local outliers. ACM Sigmod Record 29(2), 93–104 (2000)

    Article  Google Scholar 

  16. Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3), 27 (2011)

    Article  Google Scholar 

  17. Tarjan, R.E., Leeuwen, J.V.: Worst-case analysis of set union algorithms. Journal of the ACM 31(2), 245–281 (1984)

    Article  MATH  Google Scholar 

  18. Roy, B.S., Cock, D.M., Mandava, V., et al.: The microsoft academic search dataset and kdd cup 2013. In: KDD Cup 2013 Workshop, Chicago, USA (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, P., Zhao, J., Huang, K., Xu, B. (2014). A Unified Semi-supervised Framework for Author Disambiguation in Academic Social Network. In: Decker, H., Lhotská, L., Link, S., Spies, M., Wagner, R.R. (eds) Database and Expert Systems Applications. DEXA 2014. Lecture Notes in Computer Science, vol 8645. Springer, Cham. https://doi.org/10.1007/978-3-319-10085-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10085-2_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10084-5

  • Online ISBN: 978-3-319-10085-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics