Skip to main content

Optimizing Monitoring Queries over Distributed Data

  • Conference paper
Advances in Database Technology - EDBT 2006 (EDBT 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3896))

Included in the following conference series:

Abstract

Scientific data in the life sciences is distributed over various independent multi-format databases and is constantly expanding. We discuss a scenario where a life science research lab monitors over time the results of queries to remote databases beyond their control. Queries are registered at a local system and get executed on a daily basis in batch mode. The goal of the paper is to study evaluation strategies minimizing the total number of accesses to databases when evaluating all queries in bulk. We use an abstraction based on the relational model with fan-out constraints and conjunctive queries. We show that the above problem remains np-hard in two restricted settings: queries of bounded depth and the scenario with a fixed schema. We further show that both restrictions taken together results in a tractable problem. As the constant for the latter algorithm is too high to be feasible in practice, we present four heuristic methods that are experimentally compared on randomly generated and biologically motivated schemas. Our algorithms are based on a greedy method and approximations for the shortest common super sequence problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altinel, M., Franklin, M.J.: Efficient filtering of XML documents for selective dissemination of information. In: Proc. of the 26th International Conference on Very Large Data Bases (VLDB 2000), pp. 53–64. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  2. Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)

    Article  Google Scholar 

  3. Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence data bank and its new supplement TREMBL. Nucleic Acids Research 24(1), 21–25 (1996)

    Article  Google Scholar 

  4. Bernstein, P.A., Goodman, N., Wong, E., Reeve, C.L., Rothnie Jr., J.B.: Query processing in a system for distributed databases (SDD-1). ACM Transactions on Database Systems 6(4), 602–625 (1981)

    Article  MATH  Google Scholar 

  5. Bilofsky, H.S., et al.: The GenBank Genetic Sequence Databank. Nucleic Acids Research 14, 1–4 (1986)

    Article  Google Scholar 

  6. Chandra, A., Merlin, P.: Optimal implementation of conjunctive queries in relational data bases. In: Proceedings 9th ACM Symposium on Theory of Computing (STOC 1977), pp. 77–90. ACM Press, New York (1977)

    Chapter  Google Scholar 

  7. Foulser, D.E., Li, M., Yang, Q.: Theory and algorithms for plan merging. Artificial Intelligence 57(2-3), 143–181 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  8. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman (1979)

    Google Scholar 

  9. Hokamp, K., Wolfe, K.: What’s new in the library? What’s new in GenBank? Let PubCrawler tell you. Trends in Genetics 15(11), 471–472 (1999)

    Article  Google Scholar 

  10. Jiang, T., Li, M.: On the approximation of shortest common supersequences and longest common subsequences. SIAM Journal on Computing 24(5), 1122–1139 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  11. Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Research 28(1), 27–30 (2000)

    Article  Google Scholar 

  12. Kushilevitz, E., Nisan, N.: Communication complexity. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  13. Lacroix, Z., Critchlow, T.: Bioinformatics: Managing Scientific Data. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  14. Lu, H., Ooi, B., Goh, C.: On global multidatabase query optimization. SIGMOD Record 21(4), 6–11 (1992)

    Article  Google Scholar 

  15. Raeiha, K.J., Ukkonen, E.: Shortest common supersequence problem over binary alphabet is NP-complete. Theoretical Computer Science 16(2), 187–198 (1981)

    Article  MathSciNet  Google Scholar 

  16. Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD 2000), pp. 249–260. ACM Press, New York (2000)

    Chapter  Google Scholar 

  17. Sheth, A.P., Larson, J.A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys 22(3), 183–236 (1990)

    Article  Google Scholar 

  18. Suciu, D.: Distributed query evaluation on semistructured data. ACM Transactions on Database Systems 27(1), 1–62 (2002)

    Article  Google Scholar 

  19. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to data mining. Addison Wesley, Reading (2005)

    Google Scholar 

  20. Van de Craen, D.: Biologically motivated schema, http://alpha.uhasselt.be/~lucp1631/files/biodbschema.pdf

  21. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the ACM 21(1), 168–173 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  22. Wang, C., Chen, M.: On the complexity of distributed query optimization. IEEE Transactions on Knowledge and Data Engineering 8(4), 650–662 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Neven, F., Van de Craen, D. (2006). Optimizing Monitoring Queries over Distributed Data. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_49

Download citation

  • DOI: https://doi.org/10.1007/11687238_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32960-2

  • Online ISBN: 978-3-540-32961-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics