Skip to main content

Precision in Processing Data from Heterogeneous Resources

  • Conference paper
  • First Online:
Advances in Databases (BNCOD 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1832))

Included in the following conference series:

  • 399 Accesses

Abstract

Much information is becoming available on the world-wide-web, on Intranets, and on publicly accessible databases. The benefits of integrating related data from distinct sources are great, since it allows the discovery or validation of relationships among events and trends in many areas of science and commerce. But most sources are established autonomously, and hence are heterogeneous in form and content. Resolution of heterogeneity of form has been an exciting research topic for many years now. We can access information from diverse computers, alternate data representations, varied operating systems, multiple database models, and deal with a variety of transmission protocols. But progress in these areas is raising a new problem: semantic heterogeneity. Semantic heterogeneity comes about because the meaning of words depends on context, and autonomous sources are developed and maintained within their own contexts. Types of semantic heterogeneity include spelling variations, use of synonyms, and the use of identically spelled words to refer to different objects. The effect of semantic heterogeneity is not only failure to find desired material, but also lack of precision in selection, aggregation, comparison, etc., when trying to integrate information. While browsing we may complain of ‘information overload’. But when trying to automate these processes, an essential aspect of business-oriented operations, the imprecision due to semantic heterogeneity can be become fatal. Manual resolutions to the problem do work today, but it forces businesses to limit the scope of their partnering. In expanding supply chains and globalized commerce we have to deal in many more contexts, but cannot afford manual, case-by-case resolution. In business we become efficient by rapidly carrying out processes on regular schedules. XML is touted as the new universal medium for electronic commerce, but the meaning of the tags identifying data fields remains context dependent. Attempting a global resolution of the semantic mismatch is futile. The number of participants is immense, growing, and dynamic. Terminology changes, and must be able to change as our knowledge grows. Using precise, finely differentiated terms and abbreviations is important for efficiency within a domain, but frustrating to outsiders. In this paper we indicate research directions to resolve inconsistencies incrementally, so that we may be able to interoperate effectively in the presence of inter-domain inconsistencies. This work is an early stage, and will provide research opportunities for a range of disciplines, including databases, artificial intelligence, and formal linguistics. We also sketch an information systems architecture which is suitable for such services and their infrastructure. Research issues in managing complexity of multiple services arise here as well. The conclusion of this paper can be summarized as stating that today, and even more in the future, precision and relevance will be more valuable than completeness and recall. Solutions are best composed from many small-scale efforts rather than by overbearing attempts at standardization. This observation will, in turn, affect research directions in information sciences.

“The certitude that any book exists on the shelves of the library first led to elation, but soon the realization that it was unlikely to be found converted the feelings to a great depression”, Luis Borges: The Infinite Library, 1964.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Neal Coulter, et al: ACM Computing Classification System http://www.acm.org/class

  2. Adobe Corporation: PDF and Printing; http://www.adobe.com/prodindex/postscript/pdf.html

  3. Art museum image consortium (AMICO) http://www.amico.net/docs/vra

  4. C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber and Michael F. Schwartz: The HARVEST Information Discovery and Access System”; Proceedings of the Second International World Wide Web Conference, Chicago, Illinois, October 1994, pp 763–771.

    Google Scholar 

  5. Chen-Chuan K. Chang, Hector Garcia-Molina, Andreas Paepcke: Boolean Query Mapping Across Heterogeneous Information Sources; IEEE Transactions on Knowledge and Data Engineering; Vol.8 no., pp.515–521, Aug., 1996.

    Article  Google Scholar 

  6. Anthony Chavez and Pattie Maes: „Kasbah: An Agent Marketplace for Buying and Selling Goods’; First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology, London, UK, April 1996.

    Google Scholar 

  7. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, J. Widom: The TSIMMIS Project: Integration of Heterogeneous Information Sources; IPSJ Conference, Tokyo Japan, 1994.

    Google Scholar 

  8. Peter P.S. Chen: The Entity-Relationship Model — Toward a Unified View of Data; ACM Transactions on Database Systems, March 1976.

    Google Scholar 

  9. J.J. Cimino: „Review paper: coding systems in health care“; Methods of Information in Medicine, Schattauer Verlag, Stuttgart Germany, Vol.35 Nos.4–5, Dec.1996, pp.273–284.

    Google Scholar 

  10. C. Collet, M. Huhns, and W-M. Shen: “Resource Integration Using a Large Knowledge Base in CARNOT”; IEEE Computer, Vol.24 No.12, Dec.1991.

    Google Scholar 

  11. Dan Connolly (ed.): XML: Principles, Tools, and Techniques; O’Reilly, 1997.

    Google Scholar 

  12. R. ElMasri and G. Wiederhold: Data Model Integration Using the Stuctural Model; ACM SIGMOD Conf. On the Management of Data, May 1979, pp.191–202.

    Google Scholar 

  13. L. Gravano, H. Garcia-Molina, and A. Tomasic: „Precision and Recall of GlOSS Estimators for Database Discovery“; Parallel and Distributed Information Systems, 1994.

    Google Scholar 

  14. Stathes Hadjiefthymiades and Lazaros Merakos: „A Survey of Web Architectures for Wireless Communication Environments“; Computer Networks and ISDN Systems, Vol.28, May 1996, p.1139, http://www.imag.fr/Multimedia/www5cd/www139/overview.htm.

    Article  Google Scholar 

  15. Scott Hamilton: Taking Moore’s Law into the Next Century; IEEE Computer, Jan. 99, pp. 43–48.

    Google Scholar 

  16. Marty Hearst: “Interfaces for Searching the Web”; in [SA:97].

    Google Scholar 

  17. Michael Huhns and J. Singh: Readings in Agents; Morgan Kaufmann, October, 1997, pp.185–196.

    Google Scholar 

  18. Betsy Humphreys and Don Lindberg: „The UMLS project: Making the conceptual connection between users and the information they need“; Bulletin of the Medical Library Association, 1993, see also http://www.lexical.com

  19. Inktomi and NEC: Size of the Web; http://www.inktomi.com/webmap/, 17Jan2000).

  20. Jan Jannink, Pichai Srinivasan, Danladi Verheijen, and Gio Wiederhold: “Encapsulation and Composition of Ontologies”; Proc. AAAI Workshop on Information Integration, AAAI Summer Conference, Madison WI, July 1998.

    Google Scholar 

  21. Th. Jelassi, H.-S. Lai: CitiusNet: The Emergence of a Global Electronic Market, INSEAD, The European Institue of Business Administration, Fontainebleau, France; http://www.simnet.org/public/programs/capital/96paper/paper3/3.html; Society for Information Management, 1996.

    Google Scholar 

  22. Robert E. Kent: Ontology Markup Language; http://wave.eecs.wsu.edu/CKRMI/OML.html, Feb.1999

  23. Steven P. Ketchpel, Hector Garcia-Molina, Andreas Paepcke: Shopping Models: A Flexible Architecture for Information Commerce; Digital Libraries’ 97, ACM 1997.

    Google Scholar 

  24. Y. Labrou and Tim Finin: A Semantics Approach for KQML, a general Purpose Language for Software Agents; Proc. CIKM 94, ACM, 1994.

    Google Scholar 

  25. Thomas Langer: „MeBro-A Framework for Metadata-Based Information Mediation“; First International Workshop on Practical Information Mediation and Brokering, and the Commerce of Information on the Internet, Tokyo Japan, September 1998, http://context.mit.edu/imediat98/paper2/

  26. D. Lenat and R.V. Guha: Building Large Knowledge-Based Systems; Addison-Wesley (Reading MA), 372 pages.

    Google Scholar 

  27. Peter Lockeman et al.: „The Network as a Global Database: Challenges of Interoperability, Proactivity, Interactiveness, Legacy“; Proc. 23 VLDB, Athens Greece, Morgan Kaufman, Aug. 1997.

    Google Scholar 

  28. Clifford Lynch: „Searching the Internet“; in [SA:97].

    Google Scholar 

  29. David Mark et al.: “Geographic Information Science: Critical Issues in an Emerging Cross-Disciplinary Research Domain”; NCGIA, Feb. 1999, http://www.geog.buffalo.edu/ncgia/workshopreport.html.

  30. H.E. McEwen (ed): Management of Data Elements in Information Processing; NTIS, US. Dept.of Commerce pub.74-10700, 1974.

    Google Scholar 

  31. Prasenjit Mitra, Gio Wiederhold, and Martin Kersten: „A Graph-oriented Model for Articulation of Ontology Interdependencies“; in Zaniolo, Locckeman, chll and Grust: Advances in Database Technology — EDBT 2000, Springer Verlag LNCS Vol. 1777, March 2000, pp. 86–100.

    Chapter  Google Scholar 

  32. Motion Picture Group: Proposed standard for Video Metadata, MPEG7; http://www.cselt.it/mpeg, 2000.

  33. D. Ponceleon, S. Srinivashan, A. Amir, D. Petkovic, D. Diklic: „Key to Effective Video Retrieval: Effective Cataloguing and Browsing“; Proc.of ACM Multimedia’ 98 Conference, September 1998.

    Google Scholar 

  34. Paul Resnick “Filtering Information on the Internet”; in [SA:97].

    Google Scholar 

  35. N.F. Roy and C.D. Hafner: „The State of the Art in Ontology Design“; AI Magazine, 1997, Vol.18 No.3, pp.53–74.

    Google Scholar 

  36. Scientific American Editors: The Internet: Fulfilling the Promise; Scientific American March 1997.

    Google Scholar 

  37. C.E. Shannon and W. Weaver: The Mathematical Theory of Computation;1948, reprinted by The Un.Illinois Press, 1962.

    Google Scholar 

  38. Richard T. Snodgrass (editor): The TSQL2 Temporal Query Language; Kluwer Academic Publishers, 1995.

    Google Scholar 

  39. Gary Stix: „Finding Pictures“; in [SA:97].

    Google Scholar 

  40. James Z. Wang, Gio Wiederhold, and Jia Li: „Wavelet-based Progressive Transmission and Security Filtering for Medical Image Distribution“; in Stephen Wong (ed.): Medical Image Databases; Kluwer publishers, 1998, pp.303–324.

    Google Scholar 

  41. Gio Wiederhold, Gio: “Mediators in the Architecture of Future Information Systems”; IEEE Computer, March 1992, pages 38–49.

    Google Scholar 

  42. Gio Wiederhold, Sushil Jajodia, and Witold Litwin: Integrating Temporal Data in a Heterogenous Environment; in Tansel, Clifford, Gadia, Jajodia, Segiv, Snodgrass: Temporal Databases Theory, Design and Implementation; Benjamin Cummins Publishing, 1993, pp. 563–579.

    Google Scholar 

  43. Gio Wiederhold: “Customer Models for Effective Presentation of Information”; Position Paper, Flanagan, Huang, Jones, Kerf (eds): Human-Centered Systems: Information, Interactivity, and Intelligence, National Science Foundation, July 1997, pp.218–221.

    Google Scholar 

  44. Gio Wiederhold and Michael Genesereth: “The Conceptual Basis for Mediation Services”; IEEE Expert, Intelligent Systems and their Applications, Vol.12 No.5, Sep–Oct.1997.

    Google Scholar 

  45. Gio Wiederhold: “Weaving Data into Information”; Database Programming and Design; Freeman pubs, Sept. 1998.

    Google Scholar 

  46. Gio Wiederhold: Trends in Information Technology; report to JETRO.MITI, currently available as http://www-db.stanford.edu/pub/gio/1999/miti.htm.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wiederhold, G. (2000). Precision in Processing Data from Heterogeneous Resources. In: Lings, B., Jeffery, K. (eds) Advances in Databases. BNCOD 2000. Lecture Notes in Computer Science, vol 1832. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45033-5_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-45033-5_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67743-7

  • Online ISBN: 978-3-540-45033-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics