Skip to main content

Towards Knowledge-Based Life Science Publication Repositories

  • Chapter
  • First Online:
  • 499 Accesses

Part of the book series: Annals of Information Systems ((AOIS,volume 11))

Abstract

Despite being a flourishing field, the contemporary online scientific publishing properly exploits mostly raw publication data (rather meaningless bags of words) and shallow meta-data (authors, keywords, citations, etc.) regarding search. The much needed economical mass exploitation of the knowledge implicitly contained in publication texts is still largely an uncharted territory. The way towards filling this gap leads through (1) extraction of asserted publication meta-data together with the knowledge implicitly present in the respective text; (2) integration, refinement and extension of the emergent content; (3) release of the processed content via a meaning-sensitive search&browse interface catering for services complementary to the current full-text search. This chapter addresses the scientific and engineering challenges related to the suggested approach and introduces a particular solution that tackles them – CORAAL, a prototype for knowledge-based life science publication search.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    CORAAL stands for COntent extended by emeRgent and Asserted Annotations of Linked publication data.

  2. 2.

    ACE stands for Addition, Closure, Extension. See Section 9.4.2 for details.

  3. 3.

    Cf. http://www.simile-widgets.org/exhibit/. Details on how to use the CORAAL user interface are given in Section 9.5.

  4. 4.

    Note that without loss of generality, URIs may serve as concept indices in the statements. Consequently, \(\textrm{ind}^{-1}\) de facto plays a role of the URI dereference. To facilitate readability, we provide simply lexical terms instead of indices or URIs in the examples throughout the chapter, though.

  5. 5.

    Defined in [28] as \(F(a_1, \dots,a_n) = \sum_{j=1}^n w_jb_j\), where b j is the jth largest of the a i and w j are a collection of weights (also called a weight vector) such that \(w_j \in [0,1]\) and \(\sum_{j=1}^n w_j = 1\). Note that we use the additional u, v weights in order to explicitly capture the relative relevance of the \(\varDelta_{u,v}\) first and second argument independently from their relative sizes.

  6. 6.

    Essentially only one value w fully dependent on \(u, v, x, y\) is to be derived, since the remaining element of the OWA weight vector of size 2 is equal to \(1-w\).

  7. 7.

    The duality w.r.t. the distance is ensured by the conformance to two intuitive conditions – inverse proportionality and equality to 1 when the distance is 0.

  8. 8.

    By iterating through a respective knowledge base and/or by similar concept retrieval.

  9. 9.

    Computed as \(|\mathbf{A}| = |{(i,j)|a_{i,\,\,j} \ne 0}|\) for a concept A.

  10. 10.

    Very large means hundreds or thousands of concepts and millions of respective statements, or more.

  11. 11.

    Each of the journals was associated with a specific context identifier to maintain the sub-domain provenance of the respective extracted information and reflect it later on in the CORAAL user interface.

  12. 12.

    The NCI and EMTREE thesauri – see http://www.cancer.gov/cancertopics/terminologyresources and http://www.embase.com/emtree/, respectively.

  13. 13.

    These results were achieved on a single server machine (which is not exclusively dedicated to CORAAL). There are still reserves regarding scalability even with the current implementation; however, for processing data two and more orders of magnitude larger, a distributed solution would be much better.

  14. 14.

    See http://www.elseviergrandchallenge.com/.

  15. 15.

    See http://salt.semanticauthoring.org/onto/. An extracted RDF file example is given at http://resources.smile.deri.ie/coraal/2008/11/ee7c3ec2536e6754ad424c9f95a0d8dce7059a4e.rdf.

  16. 16.

    The heuristics is quite similar to the technique described in [39]. We use the Python NLTK library for NLP (see http://nltk.sourceforge.net). We also experimented with state-of-the-art ontology learning solutions (such as Text2Onto, see http://ontoware.org/projects/text2onto/). The respective tools performed rather poorly in larger scale, though, while providing not that significant improvement in quality when compared to our simple approach. However, we do plan to include more sophisticated as well as domain-specific methods of knowledge extraction (cf. [4, 7, 40]) into our light-weight implementation at some stage.

  17. 17.

    \(^{*}, ^+\) and ? mean zero or more, one or more and zero or one repetitions of the preceding expression, respectively.

  18. 18.

    We exclude the by far most common is a predicate from the set considered for the \(f_P(t)\) computation (the value of \(\nu_1f_P(t_{\textrm{p}})\) was set to 0.8 for the is a statements). We also do not include the statements with \(f_P(t_{\textrm{p}}) = 1\) at all. Note that \(f(t_{\textrm s})f_D(t_{\textrm s}), \; f(t_{\textrm o})f_D(t_{\textrm o})\) are relevance scores of the particular s, o terms, respectively.

  19. 19.

    See http://en.wikipedia.org/wiki/SHA_hash_functions.

  20. 20.

    We use the Sesame repository (see http://www.openrdf.org/).

  21. 21.

    A comprehensive Java search engine library (see http://lucene.apache.org).

  22. 22.

    Modulo mapping the terms to indices and neglecting the infinite number of columns and rows with zero-only elements. We consider not a as a negation of is a.

  23. 23.

    See http://www.embase.com/emtree/ and http://www.cancer.gov/cancertopics/terminologyresources, respectively. EMTREE terms and relations were used in case of conflicts, since they cover more general domain. Synonyms defined in the thesauri were reflected in the lexicon data structure accordingly.

  24. 24.

    Note that the pipeline can be executed even as (ACE)+, i.e. as a search for a global fixed point of the respective operations; however, for the CORAAL prototype we employed only single iteration, since the results were already sufficient for the presented proof-of-concept.

  25. 25.

    Detailed Lucene syntax description can be found at http://lucene.apache.org/java/2_3_2/queryparsersyntax.html. Note that even though the meaning of the AND, NOT keywords is intuitively similar for both types of search in CORAAL, the knowledge and full-text variants are based on completely different principles. For instance, NOT indicates documents not containing the query expression for the full-text search, while in the knowledge search, it leads to documents containing a negation of the respective query statement; similarly for the AND keyword.

  26. 26.

    Note that you can watch a video comprehensively illustrating the essential CORAAL capabilities at http://resources.smile.deri.ie/coraal/videos/coraal_web.mp4 before starting to play with the tool itself.

  27. 27.

    Note that the HAS PART relation has rather general semantics in the knowledge extracted by CORAAL, i.e. its meaning is not strictly mereological in the physical sense, it can refer also to, e.g. conceptual parts or possession of entities. Similarly for the PART OF relation.

  28. 28.

    For instance, the users were asked to find all authors who support the fact that the acute granulocytic leukemia and T-cell leukemia concepts are disjoint, or to find which process is used as a complementary method, while being different from the polymerase chain reaction, and identify publications that support their findings.

References

  1. Bechhofer, S., Gangemi, A., Guarino, N., van Harmelen, F., Horrocks, I. Klein, M., Masolo, C., Oberle, D., Staab, S., Stuckenschmidt, H., Volz, R.: Tackling the ontology acquisition bottleneck: An experiment in ontology re-engineering (2003) Retrieved at http://tinyurl.com/96w7ms, Apr’08. 13 Jul 2010

  2. Gomez-Perez, A., Fernandez-Lopez, M., Corcho, O.: Ontological Engineering. Advanced Information and Knowledge Processing. Springer, New York (2004)

    Google Scholar 

  3. Aberer, K., Cudré-Mauroux, P., Ouksel, A.M.: Emergent semantics principles and issues. In: Proceedings of Database Systems for Advanced Applications, 9th International Conference, DASFAA 2004, Jeju Island, Korea (2004)

    Google Scholar 

  4. Maedche, A., Staab, S.: Ontology learning. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies. Springer, New York (2004) 173–190

    Google Scholar 

  5. Maedche, A.: Emergent semantics for ontologies. In: Emergent Semantics. IEEE Intelligent Systems. IEEE Press, NYC, USA (2002) 85–86

    Google Scholar 

  6. Ottens, K., Aussenac-Gilles, N., Gleizes, M.P., Camps, V.: Dynamic ontology coevolution from texts: Principles and case study. In: Proceedings of ESOE 2007 Workshop, CEUR-WS, Busan, Korea (2007) 70–83

    Google Scholar 

  7. Buitelaar, P., Cimiano, P.: Ontology Learning and Population: Bridging the Gap Between Text and Knowledge. IOS Press, Amsterdam, Netherlands (2008)

    Google Scholar 

  8. Haase, P., Völker, J.: Ontology learning and reasoning – dealing with uncertainty and inconsistency. In: Proceedings of the URSW2005 Workshop. (NOV 2005), Galway, Ireland 45–55

    Google Scholar 

  9. Hein, J., Hendler, J.: Dynamic ontologies on the web. In: Proceedings of AAAI 2000, AAAI Press, Menlo Park, California, USA (2000)

    Google Scholar 

  10. Haase, P., van Harmelen, F., Huang, Z., Stuckenschmidt, H., Sure, Y.: A framework for handling inconsistency in changing ontologies. In: Proceedings of ISWC’05. Volume 3792 of LNCS. Springer, New York (2005) 353–367

    Google Scholar 

  11. Straccia, U.: A fuzzy description logic for the semantic web. In: Sanchez, E. (ed.) Fuzzy Logic and the Semantic Web. Capturing Intelligence. Elsevier, Amsterdam (2006) 73–90

    Chapter  Google Scholar 

  12. Flouris, G., Huang, Z., Pan, J.Z., Plexousakis, D., Wache, H.: Inconsistencies, negations and changes in ontologies. In: Proceedings of AAAI 2006, AAAI Press, Menlo Park, California, USA (2006)

    Google Scholar 

  13. Sheth, A., Ramakrishnan, C., Thomas, C.: Semantics for the semantic web: The implicit, the formal and the powerful. International Journal on SemanticWeb & Information Systems 1(1) (2005) 1–18

    Article  Google Scholar 

  14. Frith, C.: Making Up the Mind: How the Brain Creates Our Mental World. Blackwell, Oxford, UK (2007)

    Google Scholar 

  15. Gentner, D., Holyoak, K.J., Kokinov, B.K. (eds.): The Analogical Mind: Perspectives from Cognitive Science. MIT Press, Cambridge, MA (2001)

    Google Scholar 

  16. McGuinness, D.L.: Ontology-enhanced search for primary care medical literature. In: Proceedings of the Medical Concept Representation and Natural Language Processing Conference, Phoenix, Arizona, USA (1999) 16–19

    Google Scholar 

  17. Abasolo, J.M., Gómez, M.: M.: Melisa: An ontology-based agent for information retrieval in medicine. In: Proceedings of the First International Workshop on the Semantic Web (SemWeb2000), Lisbon, Portugal (2000) 73–82

    Google Scholar 

  18. Dietze, H., et al.: Gopubmed: Exploring pubmed with ontological background knowledge. In: Ontologies and Text Mining for Life Sciences, IBFI (2008)

    Google Scholar 

  19. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.: The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, Cambridge, USA (2003)

    Google Scholar 

  20. Müller, H.M., Kenny, E.E., Sternberg, P.W.: Textpresso: An ontology-based information retrieval and extraction system for biological literature. PLoS Biology 2(11) (2004) 1984–1998

    Google Scholar 

  21. Groza, T., Handschuh, S., Moeller, K., Decker, S.: KonneXSALT: First steps towards a semantic claim federation infrastructure. In: The Semantic Web: Research and Applications (Proceedings of ESWC 2008), Springer, New York (2008) 80–94

    Google Scholar 

  22. Hulpus, I.: Design and implementation of a semantic claim federation infrastructure. Master’s Thesis, Technical University of Cluj-Napoca (2008)

    Google Scholar 

  23. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific American 5 (2001)

    Google Scholar 

  24. Zadeh, L.A.: Fuzzy sets. Journal of Information and Control 8 (1965) 338–353

    Article  Google Scholar 

  25. Ogden, C.K., Richards, I.A.: The Meaning of Meaning. Mariner Books (1989)

    Google Scholar 

  26. Brickley, D., Guha, R.V.: RDF Vocabulary Description Language 1.0: RDF Schema. (2004) Available at (Feb 2006): http://www.w3.org/TR/rdf-schema/. 13 Jul 2010

  27. Deschrijver, G., Cornelis, C., Kerre, E.E.: On the representation of intuitionistic fuzzy t-norms and t-conorms. In: Transactions on Fuzzy Systems. IEEE (2004)

    Google Scholar 

  28. Yager, R.R.: On ordered weighted averaging aggregation operators in multi-criteria decision making. IEEE Transactions on Systems, Man and Cybernetics 18 (1988) 183–190

    Article  Google Scholar 

  29. Greenwald, A.G.: Cognitive learning, cognitive response to persuasion, and attitude change. In: Psychological Foundations of Attitudes, Academic Press Inc., New York (1968) 147–169

    Google Scholar 

  30. Grimm, S., Motik, B.: Closed world reasoning in the semantic web through epistemic operators. In: Proceedings of the Workshop OWL – Experiences and Directions, CEUR-WS (2005)

    Google Scholar 

  31. Patel-Schneider, P.F., Horrocks, I.: Position paper: A comparison of two modelling paradigms in the semantic web. In: Proceedings of http://WWW2006, ACM Press, NYC, USA (2006) 3–12

  32. Stanfill, C., Waltz, D.: Toward memory-based reasoning. Communications of the ACM 29(12) (1986) 1213–1228

    Article  Google Scholar 

  33. Kokinov, B.N., Petrov, A.: Integrating memory and reasoning in analogy-making: The AMBR model. In: The Analogical Mind: Perspectives from Cognitive Science, MIT Press, Cambridge, MA (2001) 59–124

    Google Scholar 

  34. Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5) (1999) 604–632

    Google Scholar 

  35. Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Magazine 17(3) (1996) 73–83

    Google Scholar 

  36. Nováček, V.: Towards an efficient knowledge-based publication data exploitation: An oncological literature search scenario. Technical Report DERI-TR-2009-03-23, DERI, NUIG (2009) Available at http://tinyurl.com/csh3rf. 13 Jul 2010

  37. Manola, F., Miller, E.: RDF Primer. (2004) Available at (November 2008): http://www.w3.org/TR/rdf-primer/. 13 Jul 2010

  38. Groza, T., Möller, K., Handschuh, S., Trif, D., Decker, S.: SALT: Weaving the claim web. In: ISWC 2007, Busan, Korea (2007)

    Google Scholar 

  39. Maedche, A., Staab, S.: Discovering conceptual relations from text. In: Proceedings of ECAI 2000, IOS Press, Amsterdam, Netherlands (2000)

    Google Scholar 

  40. Blaschke, C., Andrade, M., Ouzounis, C., Valencia, A.: Automatic extraction of biological information from scientific text: Protein-protein interactions. In: Proc. Int Conf Intell Syst Mol Biol, Protein Design Group, CNB-CSIC, Madrid, Spain (1999) 60–67

    Google Scholar 

  41. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA (1998)

    Google Scholar 

  42. Cimiano, P., Pivk, A., Schmidt-Thieme, L., Staab, S.: Learning taxonomic relations from heterogenous sources of evidence. In: Buitelaar, P., Cimiano, P., Magnini, B. (eds.) Ontology Learning from Text: Methods, Evaluation and Applications. IOS Press, Amsterdam, Netherlands (2005) 59–73

    Google Scholar 

  43. Voelker, J., Vrandecic, D., Sure, Y., Hotho, A.: Learning disjointness. In: Proceedings of ESWC’07, Springer, New York (2007)

    Google Scholar 

  44. Gärdenfors, P.: Conceptual Spaces: The Geometry of Thought. MIT Press, Cambridge, MA (2000)

    Google Scholar 

  45. Aisbett, J., Gibbon, G.: A general formulation of conceptual spaces as a meso level representation. Artificial Intelligence 133(1–2) (2001) 189–232

    Google Scholar 

  46. Smolensky, P., Legendre, G.: The Harmonic Mind: From Neural Computation to Optimality – Theoretic Grammar. MIT Press, Cambridge, MA (2006)

    Google Scholar 

  47. Sowa, J.F., Majumdar, A.K.: Analogical reasoning. In: Proceedings of ICCS’03. Springer, Berlin, Heidelberg (2003)

    Google Scholar 

  48. Sowa, J.F.: A dynamic theory of ontology. In: Proceedings of FOIS’06, IOS Press, Amsterdam, Netherlands (2006)

    Google Scholar 

  49. Bechhofer, S., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D.L., Patel-Schneider, P.F., Stein, L.A.: OWL Web Ontology Language Reference. (2004) Available at (February 2006): http://www.w3.org/TR/owl-ref/. 13 Jul 2010

  50. ter Horst, H.J.: Completeness, decidability and complexity of entailment for rdf schema and a semantic extension involving the owl vocabulary. Journal of Web Semantics 3(2-3) (2005) 79–115

    Google Scholar 

  51. Motik, B., Grau, B.C., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C.: OWL 2 Web Ontology Language: Profiles. Working draft, available at http://www.w3.org/TR/owl2-profiles as of Dec 11 (2008). 13 Jul 2010

  52. Noy, N., Rector, A.: Defining N-ary Relations on the Semantic Web (2006). Available at (June 2008): http://www.w3.org/TR/swbp-n-aryRelations/. 13 Jul 2010

  53. Laskey, K.J., Laskey, K.B., Costa, P.C.G., Kokar, M.M., Martin, T., Lukasiewicz, T.: Uncertainty Reasoning for the World Wide Web. (2008) W3C Incubator Group final report, available at http://www.w3.org/2005/Incubator/urw3/XGR-urw3-20080331/ as of Dec 11, 2008. 13 Jul 2010

Download references

Acknowledgments

This work has been supported by the EU IST 6th framework’s project “Nepomuk” (FP6-027705) and the “Líon” and “Líon II” projects funded by Science Foundation Ireland under Grant No. SFI/02/CE1/I131, SFI/08/CE/ I1380, respectively. We would like to thank the employees of Masaryk Oncology Institute for their feedback and to Ioana Hulpus for her work on the former CORAAL user interface. Very special thanks goes to the people who have actively participated in the continuous prototype evaluation and testing, namely to (in alphabetical order) Doug Foxvog, Peter Gréll, MD, Miloš Holánek, MD, Matthias Samwald, Holger Stenzhorn and Jiří Vyskočil, MD. We also acknowledge the valuable comments from the anonymous reviewers who helped to improve the final shape of the chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vít Nováček .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Nováček, V., Groza, T., Handschuh, S. (2010). Towards Knowledge-Based Life Science Publication Repositories. In: Chen, H., Wang, Y., Cheung, KH. (eds) Semantic e-Science. Annals of Information Systems, vol 11. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5908-9_9

Download citation

Publish with us

Policies and ethics