Advertisement

A Linguistic Graph-Based Approach for Web News Sentence Searching

  • Kim Schouten
  • Flavius Frasincar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8056)

Abstract

With an ever increasing amount of news being published every day, being able to effectively search these vast amounts of information is of primary interest to many Web ventures. As word-based approaches have their limits in that they ignore a lot of the information in texts, we present Destiny, a linguistic approach where news item sentences are represented as a graph featuring disambiguated words as nodes and grammatical relations between words as edges. Searching is then reminiscent of finding an approximate sub-graph isomorphism between the query sentence graph and the graphs representing the news item sentences, exploiting word synonymy, word hypernymy, and sentence grammar. Using a custom corpus of user-rated queries and sentences, the search algorithm is evaluated based on the Mean Average Precision, Spearman’s Rho, and the normalized Discounted Cumulative Gain. Compared to the TF-IDF baseline, the Destiny algorithm performs significantly better on these metrics.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barwise, J., Cooper, R.: Generalized Quantifiers and Natural Language. Linguistics and Philosophy 4, 159–219 (1981)zbMATHCrossRefGoogle Scholar
  2. 2.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V., Aswani, N., Roberts, I., Gorrell, G., Funk, A., Roberts, A., Damljanovic, D., Heitz, T., Greenwood, M.A., Saggion, H., Petrak, J., Li, Y., Peters, W.: Text Processing with GATE (Version 6). University of Sheffield Department of Computer Science (2011)Google Scholar
  3. 3.
    Devitt, M., Hanley, R. (eds.): The Blackwell Guide to the Philosophy of Language. Blackwell Publishing (2006)Google Scholar
  4. 4.
    Dijkman, R., Dumas, M., García-Bañuelos, L.: Graph Matching Algorithms for Business Process Model Similarity Search. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 48–63. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Haghighi, A., Klein, D.: Coreference Resolution in a Modular, Entity-Centered Model. In: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2010), pp. 385–393. ACL (2010)Google Scholar
  6. 6.
    Järvelin, K., Kekäläinen, J.: Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)CrossRefGoogle Scholar
  7. 7.
    Kilgarriff, A., Rosenzweig, J.: English SENSEVAL: Report and Results. In: 2nd International Conference on Language Resources and Evaluation (LREC 2000), pp. 1239–1244. ELRA (2000)Google Scholar
  8. 8.
    Klein, D., Manning, C.: Accurate Unlexicalized Parsing. In: 41st Meeting of the Association for Computational Linguistics (ACL 2003), pp. 423–430. ACL (2003)Google Scholar
  9. 9.
    Porter, M.F.: An Algorithm for Suffix Stripping. In: Readings in Information Retrieval, pp. 313–316. Morgan Kaufmann Publishers Inc. (1997)Google Scholar
  10. 10.
    Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice-Hall (2002)Google Scholar
  11. 11.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill (1983)Google Scholar
  12. 12.
    Schouten, K., Ruijgrok, P., Borsje, J., Frasincar, F., Levering, L., Hogenboom, F.: A Semantic Web-based Approach for Personalizing News. In: ACM Symposium on Applied Computing (SAC 2010), pp. 854–861. ACM (2010)Google Scholar
  13. 13.
    Ullmann, J.R.: An Algorithm for Subgraph Isomorphism. J. ACM 23(1), 31–42 (1976)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Kim Schouten
    • 1
  • Flavius Frasincar
    • 1
  1. 1.Erasmus University RotterdamRotterdamThe Netherlands

Personalised recommendations