Skip to main content

Data Merging in Life Science Data Integration Systems

  • Conference paper

Part of the book series: Advances in Soft Computing ((AINSC,volume 31))

Abstract

An index-driven integration system provides access to a multitude of data sources: it uses pre-compiled indexes covering content of these sources. Such a scenario is especially attractive in life science applications which integrate data from hundreds of very valuable carefully maintained databases. A key bottleneck in building such systems is data merging where partial answers obtained from different data sources are to be merged and the problem of overlapping data should be solved. In response to a query the most informative redundancy-free answer should be constructed. In the paper we propose a formal foundation for merging XML-like data and discuss indexing support for data merging.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Petel-Schneider, P., Eds.: The Description Logic Handbook: Theory, Implementation and Applications, Cambridge University Press, 2003.

    Google Scholar 

  2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, Addison Wesley, New York, 1999.

    Google Scholar 

  3. Buneman, P., Davidson, S. B., Fan, W., Hara, C. S., Tan, W. C.: Keys for XML, Computer networks, 39(5), 2002, 473–487.

    Article  Google Scholar 

  4. Buneman, P., Davidson, S. B., Fan, W., Hara, C. S., Tan, W. C.: Reasoning about keys for XML, Information Systems, 28(8), 2003, 1037–1063.

    Article  Google Scholar 

  5. Carvalho, J. C. P., da Silva, A. S.: Finding similar identities among objects from multiple web sources, Fifth ACM CIKM International Workshop on Web Information and Data Management, WIDM 2003, ACM, 2003, 90–93.

    Google Scholar 

  6. Doan, A., Domingos, P., Halevy, A.: Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach, ACM SIGMOD 2001, ACM, 2001, 509–520.

    Google Scholar 

  7. Grahne, G., Zhu, J.: Discovering approximate keys in XML data, ACM CIKM International Conference on Information and Knowledge Management, ACM, 2002, 453–460.

    Google Scholar 

  8. He, B., Chang, K. C.-C., Han, J.: Discovering complex matchings across web query interfaces: a correlation mining approach, Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, ACM, 2004, 148–157.

    Google Scholar 

  9. Hunt, E., Pafilis, E., Tulloch, I., Wilson, J.: Index-Driven XML Data Integration to Support Functional Genomics In: Workshop on Data Integration in Life Sciences, DILS’04, Lecture Notes in Computer Science, 2994, 2004, 95–109.

    Google Scholar 

  10. Jagadish, H., Olken, F.: Data Management for life science research, SIGMOD Record, 33(2), 2004, 15–20.

    Article  Google Scholar 

  11. Lacroix, Z., Boucelma, O., Essid, M.: The Biological Integration System, Fifth ACM CIKM International Workshop on Web Information and Data Management, WIDM 2003, ACM, 2003, 45–49.

    Google Scholar 

  12. Lacroix, Z., Critchlow, T., Eds.: Bioinformatics: Managing Scientific Data, Morgan Kaufman, 2003.

    Google Scholar 

  13. Pankowski, T.: A High-Level Language for Specifying XML Data Transformations, In: Advances in Databases and Information Systems, ADBIS 2004, Lecture Notes in Computer Science, 3255, 2004, 159–172.

    MATH  Google Scholar 

  14. Pankowski, T.: Processing XPath expressions in relational databases, In: Theory and Practice of Computer Science, SOFSEM 2004, Lecture Notes in Computer Science, 2932, 2004, 265–276.

    Google Scholar 

  15. Theobald, A., Weikum, G.: The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking, In: Advances in Database Technology-EDBT 2002, Lecture Notes in Computer Science, 2287, 2002, 477–495.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pankowski, T., Hunt, E. (2005). Data Merging in Life Science Data Integration Systems. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32392-9_29

Download citation

  • DOI: https://doi.org/10.1007/3-540-32392-9_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25056-2

  • Online ISBN: 978-3-540-32392-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics