Comparing Small Graph Retrieval Performance for Ontology Concepts in Medical Texts

Schlegel, Daniel R.; Bona, Jonathan P.; Elkin, Peter L.

doi:10.1007/978-3-319-41576-5_3

Comparing Small Graph Retrieval Performance for Ontology Concepts in Medical Texts

Daniel R. Schlegel¹⁹,
Jonathan P. Bona¹⁹ &
Peter L. Elkin¹⁹

Conference paper
First Online: 24 June 2016

798 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9579))

Abstract

Some terminologies and ontologies, such as SNOMED CT, allow for post–coordinated as well as pre-coordinated expressions. Post–coordinated expressions are, essentially, small segments of the terminology graphs. Compositional expressions add logical and linguistic relations to the standard technique of post-coordination. In indexing medical text, many instances of compositional expressions must be stored, and in performing retrieval on that index, entire compositional expressions and sub-parts of those expressions must be searched. The problem becomes a small graph query against a large collection of small graphs. This is further complicated by the need to also find sub-graphs from a collection of small graphs. In previous systems using compositional expressions, such as iNLP, the index was stored in a relational database. We compare retrieval characteristics of relational databases, triplestores, and general graph databases to determine which is most efficient for the task at hand.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Appropriate table indexes were created to speed execution as much as possible.
2.
This is a simplified, impure, version of a propositional graph, as used in the SNePS family [17] of knowledge representation and reasoning systems, and for which a formal mapping is defined between logical expressions and the graph structure [15, 16].
3.
All tests were run on a laptop with a Core i7 4600U CPU, 16GB of RAm, and an SSD. Evaluation code was run in a VirtualBox VM running Ubuntu 14.04.

References

Andrš, J.: Metadata repository benchmark: PostgreSQL vs. Neo4j (2014). http://mantatools.com/metadata-repository-benchmark-postgresql-vs-neo4j
Angles, R.: A comparison of current graph database models. In: 2012 IEEE 28th International Conference on Data Engineering Workshops (ICDEW), pp. 171–177. IEEE (2012)
Google Scholar
Ciglan, M., Averbuch, A., Hluchy, L.: Benchmarking traversal operations over graph databases. In: 2012 IEEE 28th International Conference on Data Engineering Workshops (ICDEW), pp. 186–189. IEEE (2012)
Google Scholar
Dominguez-Sal, D., Urbón-Bayes, P., Giménez-Vañó, A., Gómez-Villamor, S., Martínez-Bazán, N., Larriba-Pey, J.L.: Survey of graph database performance on the HPC scalable graph analysis benchmark. In: Shen, H.T. (ed.) WAIM 2010. LNCS, vol. 6185, pp. 37–48. Springer, Heidelberg (2010)
Chapter Google Scholar
Elkin, P.L., Brown, S.H., Husser, C.S., Bauer, B.A., Wahner-Roedler, D., Rosenbloom, S.T., Speroff, T.: Evaluation of the content coverage of snomed ct: ability of SNOMED clinical terms to represent clinical problem lists. In: Mayo Clinic Proceedings. vol. 81, pp. 741–748. Elsevier (2006)
Google Scholar
Elkin, P.L., Froehling, D.A., Wahner-Roedler, D.L., Brown, S.H., Bailey, K.R.: Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes. Ann. Intern. Med. 156(1_Part_1), 11–18 (2012)
Google Scholar
Elkin, P.L., Trusko, B.E., Koppel, R., Speroff, T., Mohrer, D., Sakji, S., Gurewitz, I., Tuttle, M., Brown, S.H.: Secondary use of clinical data. Stud Health Technol. Inform. 155, 14–29 (2010)
Google Scholar
Microsoft: SQL server 2014 (2015). http://www.microsoft.com/en-us/server-cloud/products/sql-server/
Murff, H.J., FitzHenry, F., Matheny, M.E., Gentry, N., Kotter, K.L., Crimin, K., Dittus, R.S., Rosen, A.K., Elkin, P.L., Brown, S.H., et al.: Automated identification of postoperative complications within an electronic medical record using natural language processing. Jama 306(8), 848–855 (2011)
Article Google Scholar
Neo Technology Inc: Neo4j, the world’s leading graph database. (2015). http://neo4j.com/
Ontotext: Ontotext GraphDB. (2015). http://ontotext.com/products/ontotext-graphdb/
Oracle: Database 11g R2 (2015). http://www.oracle.com/technetwork/database/index.html
Partner, J., Vukotic, A., Watt, N., Abedrabbo, T., Fox, D.: Neo4j in Action. Manning Publications Company, Greenwich (2014)
Google Scholar
Rodriguez, M.: MySQL vs. Neo4j on a large-scale graph traversal (2011). https://dzone.com/articles/mysql-vs-neo4j-large-scale
Schlegel, D.R.: Concurrent Inference Graphs. Ph.D. thesis, State University of New York at Buffalo (2015)
Google Scholar
Schlegel, D.R., Shapiro, S.C.: Visually interacting with a knowledge base using frames, logic, and propositional graphs. In: Croitoru, M., Rudolph, S., Wilson, N., Howse, J., Corby, O. (eds.) GKR 2011. LNCS, vol. 7205, pp. 188–207. Springer, Heidelberg (2012)
Chapter Google Scholar
Shapiro, S.C., Rapaport, W.J.: The SNePS family. Comput. Math. Appl. 23(2–5), 243–275 (1992)
Article MATH Google Scholar
The International Health Terminology Standards Development Organisation:SNOMED CT technical implementation guide (July 2014)
Google Scholar
W3C OWL Working Group: Owl 2 web ontology language document overview (2nd edn.) (2012). http://www.w3.org/TR/owl2-overview/
W3C RDF Working Group: Rdf 1.1 semantics (2014). http://www.w3.org/TR/rdf11-mt/
W3C RDF Working Group: Rdf schema 1.1 (2014). http://www.w3.org/TR/rdf-schema/
Zhao, F., Tung, A.K.: Large scale cohesive subgraphs discovery for social network visual analysis. Proc. VLDB Endowment 6(2), 85–96 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biomedical Informatics, University at Buffalo, Buffalo, NY, 14260, USA
Daniel R. Schlegel, Jonathan P. Bona & Peter L. Elkin

Authors

Daniel R. Schlegel
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan P. Bona
View author publications
You can also search for this author in PubMed Google Scholar
Peter L. Elkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel R. Schlegel .

Editor information

Editors and Affiliations

Stony Brook University, Stony Brook, New York, USA
Fusheng Wang
University of Utah, Salt Lake City, Utah, USA
Gang Luo
Columbia University, New York, New York, USA
Chunhua Weng
Nanyang Technological University, Singapore, Singapore
Arijit Khan
Qatar Computing Research Institute, Doha, Qatar
Prasenjit Mitra
Google Research, New York, New York, USA
Cong Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schlegel, D.R., Bona, J.P., Elkin, P.L. (2016). Comparing Small Graph Retrieval Performance for Ontology Concepts in Medical Texts. In: Wang, F., Luo, G., Weng, C., Khan, A., Mitra, P., Yu, C. (eds) Biomedical Data Management and Graph Online Querying. Big-O(Q) DMAH 2015 2015. Lecture Notes in Computer Science(), vol 9579. Springer, Cham. https://doi.org/10.1007/978-3-319-41576-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-41576-5_3
Published: 24 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41575-8
Online ISBN: 978-3-319-41576-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics