A New Approach for Improving Cross-Document Knowledge Discovery Using Wikipedia

  • Peng Yan
  • Wei Jin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7934)


In this paper, we present a new model that incorporates the extensive knowledge derived from Wikipedia for cross-document knowledge discovery. The model proposed here is based on our previously introduced Concept Chain Queries (CCQ) which is a special case of text mining focusing on detecting semantic relationships between two concepts across multiple documents. We attempt to overcome the limitations of CCQ by building a semantic kernel for concept closeness computing to complement existing knowledge in text corpus. The experimental evaluation demonstrates that the kernel-based approach outperforms in ranking important chains retrieved in the search results.


Knowledge Discovery Semantic Relatedness Cross-Document knowledge Discovery Document Representation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In: 20th International Joint Conference on Artificial Intelligence, pp. 1606–1611. Morgan Kaufmann, San Francisco (2007)Google Scholar
  2. 2.
    Hotho, A., Staab, S., Stumme, G.: Wordnet improves Text Document Clustering. In: SIGIR 2003 Semantic Web Workshop, pp. 541–544. Citeseer (2003)Google Scholar
  3. 3.
    Jin, W., Srihari, R.: Knowledge Discovery across Documents through Concept Chain Queries. In: 6th IEEE International Conference on Data Mining Workshops, pp. 448–452. IEEE Computer Society, Washington (2006)CrossRefGoogle Scholar
  4. 4.
    Martin, P.A.: Correction and Extension of WordNet 1.7. In: de Moor, A., Ganter, B., Lex, W. (eds.) ICCS 2003. LNCS (LNAI), vol. 2746, pp. 160–173. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Srinivasan, P.: Text Mining: Generating hypotheses from Medline. Journal of the American Society for Information Science and Technology 55(5), 396–413 (2004)CrossRefGoogle Scholar
  6. 6.
    Srihari, R.K., Lamkhede, S., Bhasin, A.: Unapparent Information Revelation: A Concept Chain Graph Approach. In: 14th ACM International Conference on Information and Knowledge Management, pp. 329–330. ACM, New York (2005)Google Scholar
  7. 7.
    Swason, D.R., Smalheiser, N.R.: Implicit Text Linkage between Medline Records: Using Arrowsmith as an Aid to Scientific Discovery. Library Trends 48(1), 48–59 (1999)Google Scholar
  8. 8.
    Wang, P., Domeniconi, C.: Building Semantic Kernels for Text Classification using Wikipedia. In: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713–721. ACM, New York (2008)CrossRefGoogle Scholar
  9. 9.
    Yan, P., Jin, W.: Improving Cross-Document Knowledge Discovery Using Explicit Semantic Analysis. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 378–389. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Peng Yan
    • 1
  • Wei Jin
    • 1
  1. 1.Department of Computer ScienceNorth Dakota State UniversityFargoUSA

Personalised recommendations