Multimedia Tools and Applications

, Volume 74, Issue 20, pp 8729–8743 | Cite as

Finding hidden relevant documents buried in scientific documents by terminological paraphrases

  • Sung-Pil Choi
  • Sung-Ho Shin
  • Hanmin Jung
  • Daesung LeeEmail author


Technical terms play an important role of effective queries for many users to search scientific databases. However, authors of scientific literature often employ alternative expressions to represent the meanings of specific terms, in other words, Terminological Paraphrases (TPs) in the literature for certain reasons, which leads to producing relevant documents that are not captured by conventional terms above. In this paper, we propose an effective way to retrieve “de facto relevant documents” which only contain those TPs and cannot be searched by conventional models in an environment with only controlled vocabularies by adapting Predicate Argument Tuple (PAT). The experiment confirms that PAT-based document retrieval is an effective and promising method to discover those kinds of documents and to improve the recall of terminology-based scientific information access models.


De facto relevant documents Terminological paraphrase Scientific information retrieval Terminology Text mining Predicate argument tuple 


  1. 1.
    Abdou S, Ruck P, Savoy J (2005) Evaluation of stemming, query expansion and manual indexing approaches for the genomic task. In: The Fourtheenth Text REtrieval Conference Proceedings (TREC 2005), vol. 501, pp. 863–871Google Scholar
  2. 2.
    Aronson AR (1996) The effect of textual variation on concept based information retrieval. Proceedings a conference of the American Medical Informatics Association. pp. 373–377Google Scholar
  3. 3.
    Bacchin M, Melucci M (2005) Symbol-based query expansion experiments at TREC 2005 genomics trackGoogle Scholar
  4. 4.
    Choi S-P, Song S, Jung H, Geierhos M, Myaeng S-H (2012) Scientific literature retrieval based on terminological paraphrases using predicate argument tuple. In: SoftTech 2012Google Scholar
  5. 5.
    Cohen J (1968) Weighed kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 70(4):687–699CrossRefGoogle Scholar
  6. 6.
    InfoTerm, “Terminology Standardization,” 2010. [Online]. Available:
  7. 7.
    Lavrenko V, Croft WB (2001) Relevance based language models. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New Orleans, Louisiana, United States, pp. 120–127Google Scholar
  8. 8.
    Lu Z, Kim W, Wilbur WJ (2009) Evaluation of query expansion using MeSH in PubMed. Inf Retr 12(1):69–80CrossRefGoogle Scholar
  9. 9.
    Macdonald C, Ounis I (2007) Using relevance feedback in expert search. Proceedings of the 29th European conference on IR research. Springer-Verlag, Rome, Italy, pp. 431–443Google Scholar
  10. 10.
    Miyao Y, Tsujii J (2008) Feature forest models for probabilistic HPSG parsing. Comput Linguist 34(1):35–80CrossRefMathSciNetGoogle Scholar
  11. 11.
    Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, Melbourne, Australia, pp. 275–281Google Scholar
  12. 12.
    Srinivasan P (1996) Query expansion and MEDLINE. Inf Process Manag 32(4):431–443CrossRefGoogle Scholar
  13. 13.
    Turtle H, Croft WB (1991) Evaluation of an inference network-based retrieval model. ACM Trans Inf Syst 9(3):187–222CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Sung-Pil Choi
    • 1
    • 2
  • Sung-Ho Shin
    • 1
    • 2
  • Hanmin Jung
    • 1
  • Daesung Lee
    • 2
    Email author
  1. 1.Korea Institution of Science and Technology Information (KISTI)DaejeonSouth Korea
  2. 2.School of Applied Science, Department of Computer EngineeringCatholic University of PusanPusanSouth Korea

Personalised recommendations