Abstract
Biobanks are extremely important resources for medical research: they collect biological material (samples) and data describing this material. Biobanks provide medical researchers with material and data they need for their studies. Data availability varies greatly among samples and makes the retrieval of data and identification of relevant samples a strenuous task. We show the challenges and limitations when using pure SQL statements for querying a relational database. To tackle the problem of locating interesting material we present a novel approach which automatically generates approximate queries with ranking capabilities. Medical researchers use a Query By Example interface to specify desired attributes and restrictions and assign weights to them to influence the ranking function.
The work reported here was partially supported by the European Commission 7th Framework program - project BBMRI and by the Austrian Ministry of Science and Research within the program GEN-AU - project GATIB II.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Biobanking and biomolecular resources research infrastructure, http://www.bbmri.eu
Systematized nomenclature of medicine-clinical terms, http://www.ihtsdo.org
Unified global medical coding system, http://www.icd10codes.com
Agichtein, E., Brill, E., Dumais, S.: Improving web search ranking by incorporating user behavior information. In: SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 19–26. ACM, New York (2006)
Agrawal, S., Chaudhuri, S.: Automated ranking of database query results. In: CIDR, pp. 888–899 (2003)
Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., Toman, D.: Structure and content scoring for xml. In: VLDB 2005: Proceedings of the 31st international conference on Very large data bases, pp. 361–372. VLDB Endowment (2005)
Bosc, P., Hadjali, A., Pivert, O.: Empty versus overabundant answers to flexible relational queries. Fuzzy Sets and Systems 159(12), 1450–1467 (2008); Advances in Intelligent Databases and Information Systems
Buckland, M., Gey, F.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45(1), 12–19 (1994)
Chaudhuri, S., Das, G.: Probabilistic information retrieval approach for ranking of database query results. ACM Trans. Database Syst. 31(3), 1134–1168 (2006)
Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic ranking of database query results. In: VLDB 2004: Proceedings of the Thirtieth international conference on Very large data bases, pp. 888–899. VLDB Endowment (2004)
Church, K., Gale, W.: Inverse document frequency (idf): A measure of deviations from poisson
Das, G., Hristidis, V., Kapoor, N., Sudarshan, S.: Ordering the attributes of query results. In: SIGMOD 2006: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 395–406. ACM, New York (2006)
Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. Transactions on Large Scale Data and Knowledge Centered Systems (2009)
Fazzinga, B., Flesca, S., Pugliese, A.: Retrieving xml data from heterogeneous sources through vague querying. ACM Trans. Internet Technol. 9(2), 1–35 (2009)
Litwin, W., Mark, L., Roussopoulos, N.: Interoperability of multiple autonomous databases. ACM Comput. Surv. 22(3), 267–293 (1990)
Liu, S., Zou, Q., Chu, W.W.: Configurable indexing and ranking for xml information retrieval. In: SIGIR 2004: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 88–95. ACM, New York (2004)
Mandreoli, F., Martoglia, R., Tiberio, P.: Approximate query answering for a heterogeneous xml document base (2004)
Mandreoli, F., Martoglia, R., Tiberio, P.: Approximate query answering for a heterogeneous xml document base. LNCS. Springer, Heidelberg (2004)
Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)
Ortega-Binderberger, M., Chakrabarti, K., Mehrotra, S.: An approach to integrating query refinement in sql (2002)
Papakonstantinou, Y., Vassalos, V.: Query rewriting for semistructured data. In: SIGMOD 1999: Proceedings of the 1999 ACM SIGMOD international conference on Management of data, pp. 455–466. ACM, New York (1999)
Robertson, S.: Understanding inverse document frequency: on theoretical arguments for idf. Journal of Documentation 60, 503–520 (2004)
Sheth, A.P., Larson, J.A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. 22(3), 183–236 (1990)
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval, pp. 132–142 (1988)
Stojanovic, N., Studer, R., Stojanovic, L.: An approach for the ranking of query results in the semantic web. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 500–516. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dabringer, C., Eder, J. (2010). Retrieving Samples from Biobanks. In: Khuri, S., Lhotská, L., Pisanti, N. (eds) Information Technology in Bio- and Medical Informatics, ITBAM 2010. ITBAM 2010. Lecture Notes in Computer Science, vol 6266. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15020-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-15020-3_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15019-7
Online ISBN: 978-3-642-15020-3
eBook Packages: Computer ScienceComputer Science (R0)