Skip to main content

Graph-Based Approach for Cross Domain Text Linking

  • Conference paper
  • First Online:
Web Technologies and Applications (APWeb 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9461))

Included in the following conference series:

  • 643 Accesses

Abstract

Comprehensive analysis of multi-domain texts has generated an important effect on text mining. Although the objects described by these multi-domain texts belong to different fields, they sometimes are overlapped partially; and linking these texts fragments which are overlapped or complementary is a necessary step for many tasks, such as entity resolution, information retrieval and text clustering. Previous works for computing text similarity mainly focus on string-based, corpus-based and knowledge-based approaches. However cross-domain texts exhibit very special features compared to texts in the same domain: (1) entity ambiguity, texts from different domains may contain various references to the same entity; (2) content skewness, cross domain texts are overlapped partially. In this paper, we propose a novel fine-grained approach based on text graph for evaluating the semantic similarity of cross-domain texts to link the similar parts. The experiment results show that our approach gives an effective solution to discover the semantic relationship between cross domain text fragments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Disc. Data (TKDD) 2(2), 10:1–10:25 (2008)

    Google Scholar 

  2. Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)

    Google Scholar 

  3. Zhan, Z., Yang, X., Computer, D.O., et al.: Text similarity calculation based on language network and semantic information. Comput. Eng. Appl. (2014)

    Google Scholar 

  4. Shameem, M.U.S., Ferdous, R.: An efficient k-means algorithm integrated with Jaccard distance measure for document clustering. In: First Asian Himalayas International Conference on Internet (AH-ICI 2009). IEEE, pp. 1–6 (2009)

    Google Scholar 

  5. Lan, Q.: Extraction of news content for text mining based on edit distance. J. Comput. Inf. Syst. 6(11), 3761–3777 (2010)

    Google Scholar 

  6. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  7. Jimenez, S., Gonzalez, F., Gelbukh, A.: Text comparison using soft cardinality. In: Chavez, E., Lonardi, S. (eds.) String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 6393, pp. 297–302. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: National Conference on Artificial Intelligence, vol. 1, pp. 775–780 (2006)

    Google Scholar 

  9. Fern, S., Stevenson, M.A.: Semantic similarity approach to paraphrase detection. In: Computational Linguistics UK Annual Research Colloquium (2008)

    Google Scholar 

  10. Turney, P.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (2001)

    Google Scholar 

  11. Dumais, S.T.: Latent semantic analysis. Ann. Rev. Inf. Sci. Technol. 38(1), 188–230 (2004)

    Article  Google Scholar 

  12. Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference On Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc. (1999)

    Google Scholar 

  13. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  14. Zhang, K., Zhu, K.Q., Hwang, S.-w.: An Association Network for Computing Semantic Relatedness (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tiezheng Nie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Hu, Y., Nie, T., Shen, D., Kou, Y. (2015). Graph-Based Approach for Cross Domain Text Linking. In: Cai, R., Chen, K., Hong, L., Yang, X., Zhang, R., Zou, L. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9461. Springer, Cham. https://doi.org/10.1007/978-3-319-28121-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28121-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28120-9

  • Online ISBN: 978-3-319-28121-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics