Skip to main content

ANDMC: An Algorithm for Author Name Disambiguation Based on Molecular Cross Clustering

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11448))

Abstract

With the rapid development of information technology, the problem of name ambiguity has become one of the main problems in the fields of information retrieval, data mining and scientific measurement, which inevitably affects the accuracy of information calculations, reduces the credibility of the literature retrieval system, and affect the quality of information. To deal with this, name disambiguation technology has been proposed, which maps virtual relational networks to real social networks. However, most existing related work did not consider the problem of name coreference and the inability to correctly match due to the different writing formats between two same strings. This paper mainly proposes an algorithm for Author Name Disambiguation based on Molecular Cross Clustering (ANDMC) considering name coreference. Meanwhile, we explored the string matching algorithm called Improved Levenshtein Distance (ILD), which solves the problem of matching between two same strings with different writing format. The experimental results show that our algorithm outperforms the baseline method. (F1-score 9.48% 21.45% higher than SC and HAC).

This work is supported by National Natural Science Foundation of China (NSFC) (61702049).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Hussain, I., Asghar, S.: A survey of author name disambiguation techniques. Knowl. Eng. Rev. 32, 1–24 (2018)

    Google Scholar 

  2. Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J.: Swoosh: a generic approach to entity resolution. The VLDB J. 18, 255–276 (2008)

    Article  Google Scholar 

  3. Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1 (2007) Article no. 5

    Google Scholar 

  4. Li, X., Morie, P., Roth, D.: Identification and tracing of ambiguous names: discriminative and generative approaches. In: Proceedings of 19th National Conference on Artificial Intelligence (AAAI 2004), pp. 419–424 (2004)

    Google Scholar 

  5. Shen, Q., Wu, T., Yang, H., Wu, Y., Qu, H., Cui, W.: NameClarifier: a visual analytics system for author name disambiguation. IEEE Trans. Vis. Comput. Graph. 23(1), 141–150 (2017)

    Article  Google Scholar 

  6. Kim, K., Khabsa, M., Giles, C.L.: Random Forest DBSCAN for USPTO inventor name disambiguation, pp. 269–270 (2016)

    Google Scholar 

  7. Lin, X., Zhu, J., Tang, Y., Yang, F., Peng, B., Li, W.: A novel approach for author name disambiguation using ranking confidence. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds.) DASFAA 2017. LNCS, vol. 10179, pp. 169–182. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55705-2_13

    Chapter  Google Scholar 

  8. Xu, X., Li, Y., Liptrott, M., Bessis, N.: NDFMF: an author name disambiguation algorithm based on the fusion of multiple features. In: IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo 2018, pp. 187–190 (2018)

    Google Scholar 

  9. Ferreira, A., Goncalves, M.A., Laender, A.H.: A brief survey of automatic methods for author name disambiguation. ACM Sigmod Rec. 41(2), 15–26 (2012)

    Article  Google Scholar 

  10. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)

    Article  Google Scholar 

  11. Han, H., Giles, L., Zha, H., et al.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of JCDL (2004)

    Google Scholar 

  12. Huang, J., Ertekin, S., Giles, C.L.: Efficient name disambiguation for large-scale databases. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 536–544. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_53

    Chapter  Google Scholar 

  13. Quan, L., Bo, W., Yuan, D.U., Wang, X., Yuhua, L.I.: Disambiguating authors by pairwise classification. Tsinghua Sci. Technol. 15(6), 668–677 (2010)

    Article  Google Scholar 

  14. Malin, B.: Unsupervised name disambiguation via social network similarity. In: SIAM SDM Workshop on Link Analysis, Counterterrorism and Security (2005)

    Google Scholar 

  15. Pedersen, T., Purandare, A., Kulkarni, A.: Name discrimination by clustering similar contexts. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 226–237. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30586-6_24

    Chapter  Google Scholar 

  16. Cen, L., Dragut, E.C., Si, L., Ouzzani, M.: Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion. In: SIGIR 2013, 28 July–1 August 2013

    Google Scholar 

  17. Evans, M.D.: A new approach to journal and conference name disambiguation through k-means clustering of internet and document surrogates (2013)

    Google Scholar 

  18. Shin, D., Kim, T., Jung, H., et al.: Automatic method for author name disambiguation using social networks. In: IEEE International Conference on Advanced Information NETWORKING and Applications, Aina 2010, Perth, Australia, 20–13 April. DBLP, pp. 1263–1270 (2010)

    Google Scholar 

  19. Fan, X., Wang, J., Pu, X., et al.: On graph-based name disambiguation. J. Data Inf. Qual. 2(2), 10 (2011)

    Google Scholar 

  20. Kang, I.-S., et al.: On co-authorship for author disambiguation. Inf. Process. Manag. 45(1), 84–97 (2009)

    Article  Google Scholar 

  21. Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24(6), 975–987 (2012)

    Article  Google Scholar 

  22. Tang, J., Lu, Q., Wang, T., Wang, J., Li, W.: A bipartite graph based social network splicing method for person name disambiguation. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011). ACM, New York, pp. 1233–1234 (2011)

    Google Scholar 

  23. Tan, Y.F., Kan, M.Y., Lee, D.: Search engine driven author disambiguation. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, JCDL, Chapel Hill, NC, USA, 11–15 June, pp. 314–315 (2006)

    Google Scholar 

  24. Zepeda-Mendoza, M.L., Resendis-Antonio, O.: Hierarchical agglomerative clustering. Encycl. Syst. Biol. 43(1), 886–887 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siyang Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, S., E, X., Huang, T., Yang, F. (2019). ANDMC: An Algorithm for Author Name Disambiguation Based on Molecular Cross Clustering. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11448. Springer, Cham. https://doi.org/10.1007/978-3-030-18590-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18590-9_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18589-3

  • Online ISBN: 978-3-030-18590-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics