Abstract
Much information is becoming available on the world-wide-web, on Intranets, and on publicly accessible databases. The benefits of integrating related data from distinct sources are great, since it allows the discovery or validation of relationships among events and trends in many areas of science and commerce. But most sources are established autonomously, and hence are heterogeneous in form and content. Resolution of heterogeneity of form has been an exciting research topic for many years now. We can access information from diverse computers, alternate data representations, varied operating systems, multiple database models, and deal with a variety of transmission protocols. But progress in these areas is raising a new problem: semantic heterogeneity. Semantic heterogeneity comes about because the meaning of words depends on context, and autonomous sources are developed and maintained within their own contexts. Types of semantic heterogeneity include spelling variations, use of synonyms, and the use of identically spelled words to refer to different objects. The effect of semantic heterogeneity is not only failure to find desired material, but also lack of precision in selection, aggregation, comparison, etc., when trying to integrate information. While browsing we may complain of ‘information overload’. But when trying to automate these processes, an essential aspect of business-oriented operations, the imprecision due to semantic heterogeneity can be become fatal. Manual resolutions to the problem do work today, but it forces businesses to limit the scope of their partnering. In expanding supply chains and globalized commerce we have to deal in many more contexts, but cannot afford manual, case-by-case resolution. In business we become efficient by rapidly carrying out processes on regular schedules. XML is touted as the new universal medium for electronic commerce, but the meaning of the tags identifying data fields remains context dependent. Attempting a global resolution of the semantic mismatch is futile. The number of participants is immense, growing, and dynamic. Terminology changes, and must be able to change as our knowledge grows. Using precise, finely differentiated terms and abbreviations is important for efficiency within a domain, but frustrating to outsiders. In this paper we indicate research directions to resolve inconsistencies incrementally, so that we may be able to interoperate effectively in the presence of inter-domain inconsistencies. This work is an early stage, and will provide research opportunities for a range of disciplines, including databases, artificial intelligence, and formal linguistics. We also sketch an information systems architecture which is suitable for such services and their infrastructure. Research issues in managing complexity of multiple services arise here as well. The conclusion of this paper can be summarized as stating that today, and even more in the future, precision and relevance will be more valuable than completeness and recall. Solutions are best composed from many small-scale efforts rather than by overbearing attempts at standardization. This observation will, in turn, affect research directions in information sciences.
“The certitude that any book exists on the shelves of the library first led to elation, but soon the realization that it was unlikely to be found converted the feelings to a great depression”, Luis Borges: The Infinite Library, 1964.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Neal Coulter, et al: ACM Computing Classification System http://www.acm.org/class
Adobe Corporation: PDF and Printing; http://www.adobe.com/prodindex/postscript/pdf.html
Art museum image consortium (AMICO) http://www.amico.net/docs/vra
C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber and Michael F. Schwartz: The HARVEST Information Discovery and Access System”; Proceedings of the Second International World Wide Web Conference, Chicago, Illinois, October 1994, pp 763–771.
Chen-Chuan K. Chang, Hector Garcia-Molina, Andreas Paepcke: Boolean Query Mapping Across Heterogeneous Information Sources; IEEE Transactions on Knowledge and Data Engineering; Vol.8 no., pp.515–521, Aug., 1996.
Anthony Chavez and Pattie Maes: „Kasbah: An Agent Marketplace for Buying and Selling Goods’; First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology, London, UK, April 1996.
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, J. Widom: The TSIMMIS Project: Integration of Heterogeneous Information Sources; IPSJ Conference, Tokyo Japan, 1994.
Peter P.S. Chen: The Entity-Relationship Model — Toward a Unified View of Data; ACM Transactions on Database Systems, March 1976.
J.J. Cimino: „Review paper: coding systems in health care“; Methods of Information in Medicine, Schattauer Verlag, Stuttgart Germany, Vol.35 Nos.4–5, Dec.1996, pp.273–284.
C. Collet, M. Huhns, and W-M. Shen: “Resource Integration Using a Large Knowledge Base in CARNOT”; IEEE Computer, Vol.24 No.12, Dec.1991.
Dan Connolly (ed.): XML: Principles, Tools, and Techniques; O’Reilly, 1997.
R. ElMasri and G. Wiederhold: Data Model Integration Using the Stuctural Model; ACM SIGMOD Conf. On the Management of Data, May 1979, pp.191–202.
L. Gravano, H. Garcia-Molina, and A. Tomasic: „Precision and Recall of GlOSS Estimators for Database Discovery“; Parallel and Distributed Information Systems, 1994.
Stathes Hadjiefthymiades and Lazaros Merakos: „A Survey of Web Architectures for Wireless Communication Environments“; Computer Networks and ISDN Systems, Vol.28, May 1996, p.1139, http://www.imag.fr/Multimedia/www5cd/www139/overview.htm.
Scott Hamilton: Taking Moore’s Law into the Next Century; IEEE Computer, Jan. 99, pp. 43–48.
Marty Hearst: “Interfaces for Searching the Web”; in [SA:97].
Michael Huhns and J. Singh: Readings in Agents; Morgan Kaufmann, October, 1997, pp.185–196.
Betsy Humphreys and Don Lindberg: „The UMLS project: Making the conceptual connection between users and the information they need“; Bulletin of the Medical Library Association, 1993, see also http://www.lexical.com
Inktomi and NEC: Size of the Web; http://www.inktomi.com/webmap/, 17Jan2000).
Jan Jannink, Pichai Srinivasan, Danladi Verheijen, and Gio Wiederhold: “Encapsulation and Composition of Ontologies”; Proc. AAAI Workshop on Information Integration, AAAI Summer Conference, Madison WI, July 1998.
Th. Jelassi, H.-S. Lai: CitiusNet: The Emergence of a Global Electronic Market, INSEAD, The European Institue of Business Administration, Fontainebleau, France; http://www.simnet.org/public/programs/capital/96paper/paper3/3.html; Society for Information Management, 1996.
Robert E. Kent: Ontology Markup Language; http://wave.eecs.wsu.edu/CKRMI/OML.html, Feb.1999
Steven P. Ketchpel, Hector Garcia-Molina, Andreas Paepcke: Shopping Models: A Flexible Architecture for Information Commerce; Digital Libraries’ 97, ACM 1997.
Y. Labrou and Tim Finin: A Semantics Approach for KQML, a general Purpose Language for Software Agents; Proc. CIKM 94, ACM, 1994.
Thomas Langer: „MeBro-A Framework for Metadata-Based Information Mediation“; First International Workshop on Practical Information Mediation and Brokering, and the Commerce of Information on the Internet, Tokyo Japan, September 1998, http://context.mit.edu/imediat98/paper2/
D. Lenat and R.V. Guha: Building Large Knowledge-Based Systems; Addison-Wesley (Reading MA), 372 pages.
Peter Lockeman et al.: „The Network as a Global Database: Challenges of Interoperability, Proactivity, Interactiveness, Legacy“; Proc. 23 VLDB, Athens Greece, Morgan Kaufman, Aug. 1997.
Clifford Lynch: „Searching the Internet“; in [SA:97].
David Mark et al.: “Geographic Information Science: Critical Issues in an Emerging Cross-Disciplinary Research Domain”; NCGIA, Feb. 1999, http://www.geog.buffalo.edu/ncgia/workshopreport.html.
H.E. McEwen (ed): Management of Data Elements in Information Processing; NTIS, US. Dept.of Commerce pub.74-10700, 1974.
Prasenjit Mitra, Gio Wiederhold, and Martin Kersten: „A Graph-oriented Model for Articulation of Ontology Interdependencies“; in Zaniolo, Locckeman, chll and Grust: Advances in Database Technology — EDBT 2000, Springer Verlag LNCS Vol. 1777, March 2000, pp. 86–100.
Motion Picture Group: Proposed standard for Video Metadata, MPEG7; http://www.cselt.it/mpeg, 2000.
D. Ponceleon, S. Srinivashan, A. Amir, D. Petkovic, D. Diklic: „Key to Effective Video Retrieval: Effective Cataloguing and Browsing“; Proc.of ACM Multimedia’ 98 Conference, September 1998.
Paul Resnick “Filtering Information on the Internet”; in [SA:97].
N.F. Roy and C.D. Hafner: „The State of the Art in Ontology Design“; AI Magazine, 1997, Vol.18 No.3, pp.53–74.
Scientific American Editors: The Internet: Fulfilling the Promise; Scientific American March 1997.
C.E. Shannon and W. Weaver: The Mathematical Theory of Computation;1948, reprinted by The Un.Illinois Press, 1962.
Richard T. Snodgrass (editor): The TSQL2 Temporal Query Language; Kluwer Academic Publishers, 1995.
Gary Stix: „Finding Pictures“; in [SA:97].
James Z. Wang, Gio Wiederhold, and Jia Li: „Wavelet-based Progressive Transmission and Security Filtering for Medical Image Distribution“; in Stephen Wong (ed.): Medical Image Databases; Kluwer publishers, 1998, pp.303–324.
Gio Wiederhold, Gio: “Mediators in the Architecture of Future Information Systems”; IEEE Computer, March 1992, pages 38–49.
Gio Wiederhold, Sushil Jajodia, and Witold Litwin: Integrating Temporal Data in a Heterogenous Environment; in Tansel, Clifford, Gadia, Jajodia, Segiv, Snodgrass: Temporal Databases Theory, Design and Implementation; Benjamin Cummins Publishing, 1993, pp. 563–579.
Gio Wiederhold: “Customer Models for Effective Presentation of Information”; Position Paper, Flanagan, Huang, Jones, Kerf (eds): Human-Centered Systems: Information, Interactivity, and Intelligence, National Science Foundation, July 1997, pp.218–221.
Gio Wiederhold and Michael Genesereth: “The Conceptual Basis for Mediation Services”; IEEE Expert, Intelligent Systems and their Applications, Vol.12 No.5, Sep–Oct.1997.
Gio Wiederhold: “Weaving Data into Information”; Database Programming and Design; Freeman pubs, Sept. 1998.
Gio Wiederhold: Trends in Information Technology; report to JETRO.MITI, currently available as http://www-db.stanford.edu/pub/gio/1999/miti.htm.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wiederhold, G. (2000). Precision in Processing Data from Heterogeneous Resources. In: Lings, B., Jeffery, K. (eds) Advances in Databases. BNCOD 2000. Lecture Notes in Computer Science, vol 1832. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45033-5_1
Download citation
DOI: https://doi.org/10.1007/3-540-45033-5_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67743-7
Online ISBN: 978-3-540-45033-7
eBook Packages: Springer Book Archive