Abstract
Life Sciences research extensively and routinely use external online databases, tools and applications for the implementation of computational pipelines. These applications are among the truly distributed and highly collaborative global systems in existence. Since the resources these applications use are designed to serve individual users, they adopt an all-or-nothing model in which users necessarily have to accept the entire response even though only a fraction of the response is relevant. In computational pipelines involving several databases and complex repeat operations, costs due to unnecessary data transmissions and computations could be significant enough to reduce productivity and make the applications sluggish. Since these resources are autonomous, and do not accept user instructions or queries, users are not able to customize their behavior in order to reduce network latency and wasteful computation or data transmission. Obviously, such a resource utilization and sharing model is wasteful and expensive. In this paper, our goal is to propose a new collaborative data integration and computational pipeline execution model for systems biology research. We show that in our envisioned model, arbitrary sites are able to accept user constraints and limited processing instructions to avoid wasteful computation resulting in improved overall efficiency. We also demonstrate that the proposed collaborative model does not breach site security or infringe upon its autonomy.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Afrati, F.N., Damigos, M., Gergatsoulis, M.: Query containment under bag and bag-set semantics. Inf. Process. Lett. 110(10), 360–369 (2010)
Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: An extensible system for design and execution of scientific workflows. In: SSDBM, p. 423 (2004)
Bancilhon, F., Maier, D., Sagiv, Y., Ullman, J.D.: Magic sets and other strange ways to implement logic programs. In: PODS, pp. 1–15 (1986)
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L.: Genbank. Nucleic Acids Res. 36(database issue) (January 2008)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)
Bhattacharjee, A., Islam, A., Amin, M.S., Hossain, S., Hosain, S., Jamil, H., Lipovich, L.: On-the-fly integration and ad hoc querying of life sciences databases using LifeDB. In: 20th International Conference on Database and Expert Systems Applications, Linz, Austria, pp. 561–575 (August 2009)
Boulakia, S.C., Biton, O., Davidson, S.B., Froidevaux, C.: Bioguidesrs: querying multiple sources with a user-centric perspective. Bioinformatics 23(10), 1301–1303 (2007)
Cafarella, M.J., Halevy, A.Y., Khoussainova, N.: Data integration for the relational web. PVLDB 2(1), 1090–1101 (2009)
Calvanese, D., Giacomo, G.D., Lenzerini, M., Vardi, M.Y.: View-based query containment. In: PODS, pp. 56–67 (2003)
Ceri, S., Gottlob, G., Tanca, L.: What you always wanted to know about datalog (and never dared to ask). IEEE Trans. Knowl. Data Eng. 1(1), 146–166 (1989)
Chen, L., Jamil, H.M.: On using remote user defined functions as wrappers for biological database interoperability. International Journal on Cooperative Information Systems 12(2), 161–195 (2003)
Farré, C., Teniente, E., Urpí, T.: Checking query containment with the cqc method. Data Knowl. Eng. 53(2), 163–223 (2005)
Freire, J.: Practical problems in coupling deductive engines with relational databases. In: Proceedings of the 5th KRDB Workshop, Seattle, WA, pp. 11-1–11-7 (May 1998)
Grahne, G., Thomo, A.: Query containment and rewriting using views for regular path queries under constraints. In: PODS, pp. 111–122 (2003)
Guo, S., Dong, X., Srivastava, D., Zajac, R.: Record linkage with uniqueness constraints and erroneous values. PVLDB 3(1), 417–428 (2010)
Gusfield, D., Stoye, J.: Relationships between p63 binding, dna sequence, transcription activity, and biological function in human cells. Mol. Cell 24(4), 593–602 (2006)
He, B., Zhang, Z., Chang, K.C.C.: MetaQuerier: querying structured web sources on-the-fly. In: SIGMOD Conference, pp. 927–929 (2005)
Hosain, S., Jamil, H.: An algebraic foundation for semantic data integration on the hidden web. In: Third IEEE International Conference on Semantic Computing, Berkeley, CA (September 2009)
Hossain, S., Jamil, H.: A visual interface for on-the-fly biological database integration and workflow design using VizBuilder. In: 6th International Workshop on Data Integration in the Life Sciences (July 2009)
Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34 (July 2006); web Server issue
Jamil, H., Islam, A., Hossain, S.: A declarative language and toolkit for scientific workflow implementation and execution. International Journal of Business Process Integration and Management 5(1), 3–17 (2010); iEEE SCC/SWF 2009 Special Issue on Scientific Workflows
Jamil, H., Jagadish, H.V.: Accepting external constraints on deep web database query forms and surviving it. Tech. rep., Department of Computer Science, Wayne State University, Michigan (June 2011)
Kent, J.W., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D.: The human genome browser at ucsc. Genome Res. 12(6), 996–1006 (2002)
Köpcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. PVLDB 3(1), 484–493 (2010)
Penabad, M.R., Brisaboa, N.R., Hernández, H.J., Paramá, J.R.: A general procedure to check conjunctive query containment. Acta Inf. 38(7), 489–529 (2002)
Roichman, A., Gudes, E.: Fine-grained access control to web databases. In: SACMAT, pp. 31–40 (2007)
Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: VLDB 2006, pp. 691–702 (2006)
Tejada, S., Knoblock, C.A., Minton, S.: Learning object identification rules for information integration. Inf. Syst. 26(8), 607–633 (2001)
Wang, K., Tarczy-Hornoch, P., Shaker, R., Mork, P., Brinkley, J.: Biomediator data integration: Beyond genomics to neuroscience data. In: AMIA Annu. Symp. Proc., pp. 779–783 (2005)
Yakout, M., Elmagarmid, A.K., Elmeleegy, H., Ouzzani, M., Qi, A.: Behavior based record linkage. PVLDB 3(1), 439–448 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Jamil, H. (2011). A Secured Collaborative Model for Data Integration in Life Sciences. In: Hameurlain, A., Küng, J., Wagner, R., Böhm, C., Eder, J., Plant, C. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems IV. Lecture Notes in Computer Science, vol 6990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23740-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-23740-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23739-3
Online ISBN: 978-3-642-23740-9
eBook Packages: Computer ScienceComputer Science (R0)