Abstract
Comprehensive analysis of multi-domain texts has generated an important effect on text mining. Although the objects described by these multi-domain texts belong to different fields, they sometimes are overlapped partially; and linking these texts fragments which are overlapped or complementary is a necessary step for many tasks, such as entity resolution, information retrieval and text clustering. Previous works for computing text similarity mainly focus on string-based, corpus-based and knowledge-based approaches. However cross-domain texts exhibit very special features compared to texts in the same domain: (1) entity ambiguity, texts from different domains may contain various references to the same entity; (2) content skewness, cross domain texts are overlapped partially. In this paper, we propose a novel fine-grained approach based on text graph for evaluating the semantic similarity of cross-domain texts to link the similar parts. The experiment results show that our approach gives an effective solution to discover the semantic relationship between cross domain text fragments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Disc. Data (TKDD) 2(2), 10:1–10:25 (2008)
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
Zhan, Z., Yang, X., Computer, D.O., et al.: Text similarity calculation based on language network and semantic information. Comput. Eng. Appl. (2014)
Shameem, M.U.S., Ferdous, R.: An efficient k-means algorithm integrated with Jaccard distance measure for document clustering. In: First Asian Himalayas International Conference on Internet (AH-ICI 2009). IEEE, pp. 1–6 (2009)
Lan, Q.: Extraction of news content for text mining based on edit distance. J. Comput. Inf. Syst. 6(11), 3761–3777 (2010)
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Jimenez, S., Gonzalez, F., Gelbukh, A.: Text comparison using soft cardinality. In: Chavez, E., Lonardi, S. (eds.) String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 6393, pp. 297–302. Springer, Heidelberg (2010)
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: National Conference on Artificial Intelligence, vol. 1, pp. 775–780 (2006)
Fern, S., Stevenson, M.A.: Semantic similarity approach to paraphrase detection. In: Computational Linguistics UK Annual Research Colloquium (2008)
Turney, P.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (2001)
Dumais, S.T.: Latent semantic analysis. Ann. Rev. Inf. Sci. Technol. 38(1), 188–230 (2004)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference On Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc. (1999)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Zhang, K., Zhu, K.Q., Hwang, S.-w.: An Association Network for Computing Semantic Relatedness (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hu, Y., Nie, T., Shen, D., Kou, Y. (2015). Graph-Based Approach for Cross Domain Text Linking. In: Cai, R., Chen, K., Hong, L., Yang, X., Zhang, R., Zou, L. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9461. Springer, Cham. https://doi.org/10.1007/978-3-319-28121-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-28121-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28120-9
Online ISBN: 978-3-319-28121-6
eBook Packages: Computer ScienceComputer Science (R0)