Context Oriented Information Integration

Mohania, Mukesh; Bhide, Manish; Roy, Prasan; Chakaravarthy, Venkatesan T.; Gupta, Himanshu

doi:10.1007/978-3-642-03722-1_12

Context Oriented Information Integration

Mukesh Mohania¹⁸,
Manish Bhide¹⁸,
Prasan Roy¹⁹,
Venkatesan T. Chakaravarthy¹⁸ &
…
Himanshu Gupta¹⁸

Chapter

738 Accesses

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 5740))

Abstract

Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources. Academicians have focused on this problem but there still remain a lot of obstacles for its widespread use in practice. One of the key problems is the absence of schema in unstructured text. In this paper we present a new paradigm for integrating information which overcomes this problem – that of Context Oriented Information Integration. The goal is to integrate unstructured data with the structured data present in the enterprise and use the extracted information to generate actionable insights for the enterprise. We present two techniques which enable context oriented information integration and show how they can be used for solving real world problems.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Doan, A., Naughton, J.F., Ramakrishnan, R., Baid, A., Chai, X., Chen, F., Chen, T., Chu, E., DeRose, P., Gao, B., Gokhale, C., Huang, J., Shen, W., Vuong, B.: Information extraction challenges in managing unstructured data. SIGMOD Rec. 37(4), 14–20 (2009)
Article Google Scholar
Bruce, H., Halevy, A., Jones, W., Pratt, W., Shapiro, L., Suciu, D.: Information retrieval and databases: Synergies and syntheses (2003), http://www2.cs.washington.edu/nsf2003
Hamilton, J., Nayak, T.: Microsoft SQL Server Full-Text Search. IEEE Data Engg. Bull. 24(4) (2001)
Google Scholar
Jhingran, A., Mattos, N., Pirahesh, H.: Information integration: A research agenda. IBM Sys. J. 41(4) (2002)
Google Scholar
Dixon, P.: Basics of Oracle Text Retrieval. IEEE Data Engg. Bull. 24(4) (2001)
Google Scholar
Maier, A., Simmen, D.: DB2 Optimization in Support of Full Text Search. IEEE Data Engg. Bull. 24(4) (2001)
Google Scholar
Somani, A., Choy, D., Kleewein, J.C.: Bringing together content and data management: Challenges and opportunities. IBM Sys. J. 41(4) (2002)
Google Scholar
Raghavan, P.: Structured and unstructured search in enterprises. IEEE Data Engg. Bull. 24(4) (2001)
Google Scholar
Goldman, R., Widom, J.: WSQ/DSQ: A Practical Approach for Combined Querying of Databases and the Web. In: SIGMOD (2000)
Google Scholar
Maier, A., Simmen, D.: DB2 Optimization in Support of Full Text Search. IEEE Data Engg. Bull. 24(4) (2001)
Google Scholar
Roy, P., Mohania, M.K., Bamba, B., Raman, S.: Towards automatic association of relevant unstructured content with structured query results. In: CIKM 2005 (2005)
Google Scholar
Chakaravarthy, V.T., Gupta, H., Roy, P., Mohania, M.K.: Efficiently Linking Text Documents with Relevant Structured Information. In: VLDB 2006 (2006)
Google Scholar
Sarawagi, S.: Automation in information extraction and integration (tutorial). In: VLDB (2002)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley/ACM (1999)
Google Scholar
Chakrabarti, S.: Breaking through the syntax barrier: Searching with entities and relations. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS, vol. 3202, pp. 9–16. Springer, Heidelberg (2004)
Chapter Google Scholar
Voorhees, E., Tice, D.: The TREC-8 question answering track evaluation. In: Proc. Eighth Text Retrieval Conference, TREC-8 (1999)
Google Scholar
Walker, M.H., Eaton, N.J.: Microsoft Office Visio 2003 Inside Out. Microsoft Press (2003)
Google Scholar
Barsalou, T., Keller, A.M., Siambela, N., Wiederhold, G.: Updating relational databases through object-based views. In: SIGMOD (1991)
Google Scholar
Barsalou, T.: View objects for relational databases. Tech. Rep. STAN-CS-90-1310, CS Dept., Stanford University, Ph.D. thesis (1990)
Google Scholar
Premerlani, W.J., Blaha, M.R.: An Approach for Reverse Engineering of Relational Databases. CACM 37(5) (1994)
Google Scholar
Agichtein, E., Ganti, V.: Mining reference tables for automatic text segmentation. In: SIGKDD (2004)
Google Scholar
Li, X., Morie, P., Roth, D.: Semantic Integration in Text: From Ambiguous Names to Identifiable Entities. AI Magazine: Special Issue on Semantic Integration (2005)
Google Scholar
Poosala, V.: Histogram-based estimation techniques in database systems. PhD thesis, University of Wisconsin, Madison, WI, USA (1997)
Google Scholar
Chen, P.P.-S.: The Entity-Relationship Model–Toward a Unified View of Data. ACM TODS 1(1) (1976)
Google Scholar
Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational databases for querying XML documents: Limitations and opportunities. In: VLDB (1999)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley/ACM (1999)
Google Scholar
IBM. IBM DB2 UDB Net Search Extender : Administration and User Guide (version 8.1) (2003)
Google Scholar
Business case for content scorecarding, http://www.analyticstrategy.com/research/Content%20Scorecarding%20Business%20Case.pdf
Aditya, B., Bhalotia, G., Chakrabarti, S., Hulgeri, A., Nakhe, C., Parag, S.S.: BANKS: Browsing and Keyword Searching in Relational Databases. In: VLDB 2002, pp. 1083–1086 (2002)
Google Scholar
Roy, S.B., Wang, H., Das, G., Nambiar, U., Mohania, M.K.: Minimum-effort driven dynamic faceted search in structured databases. In: CIKM 2008, pp. 13–22 (2008)
Google Scholar
Call Center use Survey, http://www.incoming.com/statistics/performance.aspx
Soltau, H., Kingsbury, B., Mangu, L., Povey, D., Saon, G., Zweig, G.: The IBM 2004 Coversational Telephony System for Rich Transcription. In: IEEE ICASSP (March 2005)
Google Scholar
Wikipedia. Sanitization (classified information) — wikipedia, the free encyclopedia (2006)
Google Scholar
U.S. Department of Energy. Department of energy researches use of advanced computing for document declassification, http://www.osti.gov/opennet
Agichtein, E., Gravano, L., Pavel, J., Sokolova, V., Voskoboynik, A.: Snowball: A prototype system for extracting relations from large text collections. In: SIGMOD (2001)
Google Scholar
Douglass, M.M., Clifford, G.D., Reisner, A., Long, W.J., Moody, G.B., Mark, R.G.: De-identification algorithm for free-text nursing notes. Computers in Cardiology (2005)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: Efficient full domain k-anonymity. In: SIGMOD (2005)
Google Scholar
Sweeney, L.: Replacing personally-identifying information in medical records, the srub system. Journal of the Americal Medical Informatics Association (1996)
Google Scholar
Sweeney, L.: K-anonymity: A model for protecting privacy. Intl. Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5) (2002)
Google Scholar
Tveit, A.: Anonymization of general practitioner medical records. In: HelsIT 2004, Trondheim, Norway (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM India Research Lab, Plot-4, Block-C, Institutional Area, Vasant Kunj, New Delhi, India, 110070
Mukesh Mohania, Manish Bhide, Venkatesan T. Chakaravarthy & Himanshu Gupta
Aster Data Systems, CA, USA
Prasan Roy

Authors

Mukesh Mohania
View author publications
You can also search for this author in PubMed Google Scholar
Manish Bhide
View author publications
You can also search for this author in PubMed Google Scholar
Prasan Roy
View author publications
You can also search for this author in PubMed Google Scholar
Venkatesan T. Chakaravarthy
View author publications
You can also search for this author in PubMed Google Scholar
Himanshu Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut de Recherche en Informatique de Toulouse (IRIT), Paul Sabatier University, 118, route de Narbonne, 31062, Toulouse CEDEX Cedex, France
Abdelkader Hameurlain
University of Linz, Altenbergerstraße 69, 4040, Linz, Austria
Josef Küng & Roland Wagner &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mohania, M., Bhide, M., Roy, P., Chakaravarthy, V.T., Gupta, H. (2009). Context Oriented Information Integration. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. Lecture Notes in Computer Science, vol 5740. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03722-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-03722-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03721-4
Online ISBN: 978-3-642-03722-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics