Graph-Based Approach for Cross Domain Text Linking

Hu, Yu; Nie, Tiezheng; Shen, Derong; Kou, Yue

doi:10.1007/978-3-319-28121-6_14

Yu Hu¹⁹,
Tiezheng Nie¹⁹,
Derong Shen¹⁹ &
…
Yue Kou¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9461))

Included in the following conference series:

Asia-Pacific Web Conference

643 Accesses

Abstract

Comprehensive analysis of multi-domain texts has generated an important effect on text mining. Although the objects described by these multi-domain texts belong to different fields, they sometimes are overlapped partially; and linking these texts fragments which are overlapped or complementary is a necessary step for many tasks, such as entity resolution, information retrieval and text clustering. Previous works for computing text similarity mainly focus on string-based, corpus-based and knowledge-based approaches. However cross-domain texts exhibit very special features compared to texts in the same domain: (1) entity ambiguity, texts from different domains may contain various references to the same entity; (2) content skewness, cross domain texts are overlapped partially. In this paper, we propose a novel fine-grained approach based on text graph for evaluating the semantic similarity of cross-domain texts to link the similar parts. The experiment results show that our approach gives an effective solution to discover the semantic relationship between cross domain text fragments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Disc. Data (TKDD) 2(2), 10:1–10:25 (2008)
Google Scholar
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)
Google Scholar
Zhan, Z., Yang, X., Computer, D.O., et al.: Text similarity calculation based on language network and semantic information. Comput. Eng. Appl. (2014)
Google Scholar
Shameem, M.U.S., Ferdous, R.: An efficient k-means algorithm integrated with Jaccard distance measure for document clustering. In: First Asian Himalayas International Conference on Internet (AH-ICI 2009). IEEE, pp. 1–6 (2009)
Google Scholar
Lan, Q.: Extraction of news content for text mining based on edit distance. J. Comput. Inf. Syst. 6(11), 3761–3777 (2010)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Jimenez, S., Gonzalez, F., Gelbukh, A.: Text comparison using soft cardinality. In: Chavez, E., Lonardi, S. (eds.) String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 6393, pp. 297–302. Springer, Heidelberg (2010)
Chapter Google Scholar
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: National Conference on Artificial Intelligence, vol. 1, pp. 775–780 (2006)
Google Scholar
Fern, S., Stevenson, M.A.: Semantic similarity approach to paraphrase detection. In: Computational Linguistics UK Annual Research Colloquium (2008)
Google Scholar
Turney, P.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (2001)
Google Scholar
Dumais, S.T.: Latent semantic analysis. Ann. Rev. Inf. Sci. Technol. 38(1), 188–230 (2004)
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference On Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc. (1999)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Zhang, K., Zhu, K.Q., Hwang, S.-w.: An Association Network for Computing Semantic Relatedness (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Science and Engineering, Northeastern University, Shenyang, China
Yu Hu, Tiezheng Nie, Derong Shen & Yue Kou

Authors

Yu Hu
View author publications
You can also search for this author in PubMed Google Scholar
Tiezheng Nie
View author publications
You can also search for this author in PubMed Google Scholar
Derong Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yue Kou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tiezheng Nie .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, Guangdong, China
Ruichu Cai
Research Institute of China Telecom Co., Guangzhou, China
Kang Chen
Wuhan University, Wuhan, China
Liang Hong
Advanced Digital Sciences Center, Singapore, Singapore
Xiaoyan Yang
East China Normal University, Shanghai, China
Rong Zhang
Peking University, Beijing, China
Lei Zou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, Y., Nie, T., Shen, D., Kou, Y. (2015). Graph-Based Approach for Cross Domain Text Linking. In: Cai, R., Chen, K., Hong, L., Yang, X., Zhang, R., Zou, L. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9461. Springer, Cham. https://doi.org/10.1007/978-3-319-28121-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-28121-6_14
Published: 30 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28120-9
Online ISBN: 978-3-319-28121-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics