Abstract
This paper introduces a new method of fuzzy semisupervised hierarchical clustering using fuzzy instance level constraints. It introduces the concepts of fuzzy must-link and fuzzy cannot-link constraints and use them to find the optimum α-cut of a dendrogram. This method is used to approach the problem of classifying scientific publications in web digital libraries. It is tested on real data from that problem against classical methods and crisp semisupervised hierarchical clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Diaz-Valenzuela, I., Martin-Bautista, M.J., Vila, M.A., Campaña, J.R.: An automatic system for identifying authorities in digital libraries. Expert Systems with Applications 40, 3994–4002 (2013)
Davidson, I., Basu, S.: A survey of clustering with instance level constraints. ACM Transactions on Knowledge Discovery from Data, 1–41 (2007)
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 1103–1110 (2000)
Peng, H.T., Lu, C.Y., Hsu, W., Ho, J.M.: Disambiguating authors in citations on the web and authorship correlations. Expert Systems with Applications 39, 10521–10532 (2012)
Huang, J., Ertekin, S., Giles, C.L.: Efficient name disambiguation for large-scale databases. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 536–544. Springer, Heidelberg (2006)
Han, H., Giles, L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, pp. 296–305 (2004)
Treeratpituk, P., Giles, C.L.: Disambiguating authors in academic publications using random forests. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2009, pp. 39–48. ACM, Austin (2009)
Ferreira, A.A., Gonçalves, M.A., Almeida, J.M., Laender, A.H.F., Veloso, A.: A tool for generating synthetic authorship records for evaluating author name disambiguation methods. Information Sciences 206, 42–62 (2012)
Tang, J., Fong, A.C.M., Wang, B., Zhang, J.: A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. on Knowl. and Data Eng. 24, 975–987 (2012)
Pereira, D.A., Ribeiro-Neto, B., Ziviani, N., Laender, A.H.: Using web information for author name disambiguation. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2009, pp. 49–58. ACM, Austin (2009)
Cota, R.G., Ferreira, A.A., Nascimento, C., Gonçalves, M.A., Laender, A.H.F.: An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology 61, 1853–1870 (2010)
Diaz-Valenzuela, I., Martin-Bautista, M.J., Vila, M.A.: A proposal for automatic authority control in digital libraries. Information Processing and Magnament (2013)
Diaz-Valenzuela, I., Martin-Bautista, M.J., Vila, M.A.: Using a semisupervised fuzzy clustering process for identity identification in digital libraries. In: IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 2013 Joint, pp. 831–836 (2013)
Kaufmann, A.: Introduction to the Theory of Fuzzy Subsets. Academic Pr. (1975)
Delgado, M., Gomez-Skarmeta, A.F., Vila, M.A.: On the use of hierarchical clustering in fuzzy modeling. International Journal of Approximate Reasoning 14, 237–257 (1996)
Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Diaz-Valenzuela, I., Martin-Bautista, M.J., Vila, MA. (2014). A Fuzzy Semisupervised Clustering Method: Application to the Classification of Scientific Publications. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2014. Communications in Computer and Information Science, vol 442. Springer, Cham. https://doi.org/10.1007/978-3-319-08795-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-08795-5_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08794-8
Online ISBN: 978-3-319-08795-5
eBook Packages: Computer ScienceComputer Science (R0)