Precision in Processing Data from Heterogeneous Resources

Wiederhold, Gio

doi:10.1007/3-540-45033-5_1

Gio Wiederhold⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1832))

Included in the following conference series:

British National Conference on Databases

399 Accesses

Abstract

Much information is becoming available on the world-wide-web, on Intranets, and on publicly accessible databases. The benefits of integrating related data from distinct sources are great, since it allows the discovery or validation of relationships among events and trends in many areas of science and commerce. But most sources are established autonomously, and hence are heterogeneous in form and content. Resolution of heterogeneity of form has been an exciting research topic for many years now. We can access information from diverse computers, alternate data representations, varied operating systems, multiple database models, and deal with a variety of transmission protocols. But progress in these areas is raising a new problem: semantic heterogeneity. Semantic heterogeneity comes about because the meaning of words depends on context, and autonomous sources are developed and maintained within their own contexts. Types of semantic heterogeneity include spelling variations, use of synonyms, and the use of identically spelled words to refer to different objects. The effect of semantic heterogeneity is not only failure to find desired material, but also lack of precision in selection, aggregation, comparison, etc., when trying to integrate information. While browsing we may complain of ‘information overload’. But when trying to automate these processes, an essential aspect of business-oriented operations, the imprecision due to semantic heterogeneity can be become fatal. Manual resolutions to the problem do work today, but it forces businesses to limit the scope of their partnering. In expanding supply chains and globalized commerce we have to deal in many more contexts, but cannot afford manual, case-by-case resolution. In business we become efficient by rapidly carrying out processes on regular schedules. XML is touted as the new universal medium for electronic commerce, but the meaning of the tags identifying data fields remains context dependent. Attempting a global resolution of the semantic mismatch is futile. The number of participants is immense, growing, and dynamic. Terminology changes, and must be able to change as our knowledge grows. Using precise, finely differentiated terms and abbreviations is important for efficiency within a domain, but frustrating to outsiders. In this paper we indicate research directions to resolve inconsistencies incrementally, so that we may be able to interoperate effectively in the presence of inter-domain inconsistencies. This work is an early stage, and will provide research opportunities for a range of disciplines, including databases, artificial intelligence, and formal linguistics. We also sketch an information systems architecture which is suitable for such services and their infrastructure. Research issues in managing complexity of multiple services arise here as well. The conclusion of this paper can be summarized as stating that today, and even more in the future, precision and relevance will be more valuable than completeness and recall. Solutions are best composed from many small-scale efforts rather than by overbearing attempts at standardization. This observation will, in turn, affect research directions in information sciences.

“The certitude that any book exists on the shelves of the library first led to elation, but soon the realization that it was unlikely to be found converted the feelings to a great depression”, Luis Borges: The Infinite Library, 1964.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Neal Coulter, et al: ACM Computing Classification System http://www.acm.org/class
Adobe Corporation: PDF and Printing; http://www.adobe.com/prodindex/postscript/pdf.html
Art museum image consortium (AMICO) http://www.amico.net/docs/vra
C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber and Michael F. Schwartz: The HARVEST Information Discovery and Access System”; Proceedings of the Second International World Wide Web Conference, Chicago, Illinois, October 1994, pp 763–771.
Google Scholar
Chen-Chuan K. Chang, Hector Garcia-Molina, Andreas Paepcke: Boolean Query Mapping Across Heterogeneous Information Sources; IEEE Transactions on Knowledge and Data Engineering; Vol.8 no., pp.515–521, Aug., 1996.
Article Google Scholar
Anthony Chavez and Pattie Maes: „Kasbah: An Agent Marketplace for Buying and Selling Goods’; First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology, London, UK, April 1996.
Google Scholar
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, J. Widom: The TSIMMIS Project: Integration of Heterogeneous Information Sources; IPSJ Conference, Tokyo Japan, 1994.
Google Scholar
Peter P.S. Chen: The Entity-Relationship Model — Toward a Unified View of Data; ACM Transactions on Database Systems, March 1976.
Google Scholar
J.J. Cimino: „Review paper: coding systems in health care“; Methods of Information in Medicine, Schattauer Verlag, Stuttgart Germany, Vol.35 Nos.4–5, Dec.1996, pp.273–284.
Google Scholar
C. Collet, M. Huhns, and W-M. Shen: “Resource Integration Using a Large Knowledge Base in CARNOT”; IEEE Computer, Vol.24 No.12, Dec.1991.
Google Scholar
Dan Connolly (ed.): XML: Principles, Tools, and Techniques; O’Reilly, 1997.
Google Scholar
R. ElMasri and G. Wiederhold: Data Model Integration Using the Stuctural Model; ACM SIGMOD Conf. On the Management of Data, May 1979, pp.191–202.
Google Scholar
L. Gravano, H. Garcia-Molina, and A. Tomasic: „Precision and Recall of GlOSS Estimators for Database Discovery“; Parallel and Distributed Information Systems, 1994.
Google Scholar
Stathes Hadjiefthymiades and Lazaros Merakos: „A Survey of Web Architectures for Wireless Communication Environments“; Computer Networks and ISDN Systems, Vol.28, May 1996, p.1139, http://www.imag.fr/Multimedia/www5cd/www139/overview.htm.
Article Google Scholar
Scott Hamilton: Taking Moore’s Law into the Next Century; IEEE Computer, Jan. 99, pp. 43–48.
Google Scholar
Marty Hearst: “Interfaces for Searching the Web”; in [SA:97].
Google Scholar
Michael Huhns and J. Singh: Readings in Agents; Morgan Kaufmann, October, 1997, pp.185–196.
Google Scholar
Betsy Humphreys and Don Lindberg: „The UMLS project: Making the conceptual connection between users and the information they need“; Bulletin of the Medical Library Association, 1993, see also http://www.lexical.com
Inktomi and NEC: Size of the Web; http://www.inktomi.com/webmap/, 17Jan2000).
Jan Jannink, Pichai Srinivasan, Danladi Verheijen, and Gio Wiederhold: “Encapsulation and Composition of Ontologies”; Proc. AAAI Workshop on Information Integration, AAAI Summer Conference, Madison WI, July 1998.
Google Scholar
Th. Jelassi, H.-S. Lai: CitiusNet: The Emergence of a Global Electronic Market, INSEAD, The European Institue of Business Administration, Fontainebleau, France; http://www.simnet.org/public/programs/capital/96paper/paper3/3.html; Society for Information Management, 1996.
Google Scholar
Robert E. Kent: Ontology Markup Language; http://wave.eecs.wsu.edu/CKRMI/OML.html, Feb.1999
Steven P. Ketchpel, Hector Garcia-Molina, Andreas Paepcke: Shopping Models: A Flexible Architecture for Information Commerce; Digital Libraries’ 97, ACM 1997.
Google Scholar
Y. Labrou and Tim Finin: A Semantics Approach for KQML, a general Purpose Language for Software Agents; Proc. CIKM 94, ACM, 1994.
Google Scholar
Thomas Langer: „MeBro-A Framework for Metadata-Based Information Mediation“; First International Workshop on Practical Information Mediation and Brokering, and the Commerce of Information on the Internet, Tokyo Japan, September 1998, http://context.mit.edu/imediat98/paper2/
D. Lenat and R.V. Guha: Building Large Knowledge-Based Systems; Addison-Wesley (Reading MA), 372 pages.
Google Scholar
Peter Lockeman et al.: „The Network as a Global Database: Challenges of Interoperability, Proactivity, Interactiveness, Legacy“; Proc. 23 VLDB, Athens Greece, Morgan Kaufman, Aug. 1997.
Google Scholar
Clifford Lynch: „Searching the Internet“; in [SA:97].
Google Scholar
David Mark et al.: “Geographic Information Science: Critical Issues in an Emerging Cross-Disciplinary Research Domain”; NCGIA, Feb. 1999, http://www.geog.buffalo.edu/ncgia/workshopreport.html.
H.E. McEwen (ed): Management of Data Elements in Information Processing; NTIS, US. Dept.of Commerce pub.74-10700, 1974.
Google Scholar
Prasenjit Mitra, Gio Wiederhold, and Martin Kersten: „A Graph-oriented Model for Articulation of Ontology Interdependencies“; in Zaniolo, Locckeman, chll and Grust: Advances in Database Technology — EDBT 2000, Springer Verlag LNCS Vol. 1777, March 2000, pp. 86–100.
Chapter Google Scholar
Motion Picture Group: Proposed standard for Video Metadata, MPEG7; http://www.cselt.it/mpeg, 2000.
D. Ponceleon, S. Srinivashan, A. Amir, D. Petkovic, D. Diklic: „Key to Effective Video Retrieval: Effective Cataloguing and Browsing“; Proc.of ACM Multimedia’ 98 Conference, September 1998.
Google Scholar
Paul Resnick “Filtering Information on the Internet”; in [SA:97].
Google Scholar
N.F. Roy and C.D. Hafner: „The State of the Art in Ontology Design“; AI Magazine, 1997, Vol.18 No.3, pp.53–74.
Google Scholar
Scientific American Editors: The Internet: Fulfilling the Promise; Scientific American March 1997.
Google Scholar
C.E. Shannon and W. Weaver: The Mathematical Theory of Computation;1948, reprinted by The Un.Illinois Press, 1962.
Google Scholar
Richard T. Snodgrass (editor): The TSQL2 Temporal Query Language; Kluwer Academic Publishers, 1995.
Google Scholar
Gary Stix: „Finding Pictures“; in [SA:97].
Google Scholar
James Z. Wang, Gio Wiederhold, and Jia Li: „Wavelet-based Progressive Transmission and Security Filtering for Medical Image Distribution“; in Stephen Wong (ed.): Medical Image Databases; Kluwer publishers, 1998, pp.303–324.
Google Scholar
Gio Wiederhold, Gio: “Mediators in the Architecture of Future Information Systems”; IEEE Computer, March 1992, pages 38–49.
Google Scholar
Gio Wiederhold, Sushil Jajodia, and Witold Litwin: Integrating Temporal Data in a Heterogenous Environment; in Tansel, Clifford, Gadia, Jajodia, Segiv, Snodgrass: Temporal Databases Theory, Design and Implementation; Benjamin Cummins Publishing, 1993, pp. 563–579.
Google Scholar
Gio Wiederhold: “Customer Models for Effective Presentation of Information”; Position Paper, Flanagan, Huang, Jones, Kerf (eds): Human-Centered Systems: Information, Interactivity, and Intelligence, National Science Foundation, July 1997, pp.218–221.
Google Scholar
Gio Wiederhold and Michael Genesereth: “The Conceptual Basis for Mediation Services”; IEEE Expert, Intelligent Systems and their Applications, Vol.12 No.5, Sep–Oct.1997.
Google Scholar
Gio Wiederhold: “Weaving Data into Information”; Database Programming and Design; Freeman pubs, Sept. 1998.
Google Scholar
Gio Wiederhold: Trends in Information Technology; report to JETRO.MITI, currently available as http://www-db.stanford.edu/pub/gio/1999/miti.htm.

Download references

Author information

Authors and Affiliations

Computer Science Department, Stanford University, Stanford, CA, 94305-9040
Gio Wiederhold

Authors

Gio Wiederhold
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Exeter, Prince of Wales Road, Exeter, EX4 4PT, UK
Brian Lings
Department for Computation and Information, CLRC Rutherford Appleton Laboratory, Chilton-Didcot, Oxon, OX11 0QX, UK
Keith Jeffery

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wiederhold, G. (2000). Precision in Processing Data from Heterogeneous Resources. In: Lings, B., Jeffery, K. (eds) Advances in Databases. BNCOD 2000. Lecture Notes in Computer Science, vol 1832. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45033-5_1

Download citation

DOI: https://doi.org/10.1007/3-540-45033-5_1
Published: 11 November 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67743-7
Online ISBN: 978-3-540-45033-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics