Context Oriented Information Integration

  • Mukesh Mohania
  • Manish Bhide
  • Prasan Roy
  • Venkatesan T. Chakaravarthy
  • Himanshu Gupta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5740)


Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources. Academicians have focused on this problem but there still remain a lot of obstacles for its widespread use in practice. One of the key problems is the absence of schema in unstructured text. In this paper we present a new paradigm for integrating information which overcomes this problem – that of Context Oriented Information Integration. The goal is to integrate unstructured data with the structured data present in the enterprise and use the extracted information to generate actionable insights for the enterprise. We present two techniques which enable context oriented information integration and show how they can be used for solving real world problems.


Information Integration Unstructured Data Integration Context Oriented Information Integration SCORE EROCS 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Doan, A., Naughton, J.F., Ramakrishnan, R., Baid, A., Chai, X., Chen, F., Chen, T., Chu, E., DeRose, P., Gao, B., Gokhale, C., Huang, J., Shen, W., Vuong, B.: Information extraction challenges in managing unstructured data. SIGMOD Rec. 37(4), 14–20 (2009)CrossRefGoogle Scholar
  2. 2.
    Bruce, H., Halevy, A., Jones, W., Pratt, W., Shapiro, L., Suciu, D.: Information retrieval and databases: Synergies and syntheses (2003),
  3. 3.
    Hamilton, J., Nayak, T.: Microsoft SQL Server Full-Text Search. IEEE Data Engg. Bull. 24(4) (2001)Google Scholar
  4. 4.
    Jhingran, A., Mattos, N., Pirahesh, H.: Information integration: A research agenda. IBM Sys. J. 41(4) (2002)Google Scholar
  5. 5.
    Dixon, P.: Basics of Oracle Text Retrieval. IEEE Data Engg. Bull. 24(4) (2001)Google Scholar
  6. 6.
    Maier, A., Simmen, D.: DB2 Optimization in Support of Full Text Search. IEEE Data Engg. Bull. 24(4) (2001)Google Scholar
  7. 7.
    Somani, A., Choy, D., Kleewein, J.C.: Bringing together content and data management: Challenges and opportunities. IBM Sys. J. 41(4) (2002)Google Scholar
  8. 8.
    Raghavan, P.: Structured and unstructured search in enterprises. IEEE Data Engg. Bull. 24(4) (2001)Google Scholar
  9. 9.
    Goldman, R., Widom, J.: WSQ/DSQ: A Practical Approach for Combined Querying of Databases and the Web. In: SIGMOD (2000)Google Scholar
  10. 10.
    Maier, A., Simmen, D.: DB2 Optimization in Support of Full Text Search. IEEE Data Engg. Bull. 24(4) (2001)Google Scholar
  11. 11.
    Roy, P., Mohania, M.K., Bamba, B., Raman, S.: Towards automatic association of relevant unstructured content with structured query results. In: CIKM 2005 (2005)Google Scholar
  12. 12.
    Chakaravarthy, V.T., Gupta, H., Roy, P., Mohania, M.K.: Efficiently Linking Text Documents with Relevant Structured Information. In: VLDB 2006 (2006)Google Scholar
  13. 13.
    Sarawagi, S.: Automation in information extraction and integration (tutorial). In: VLDB (2002)Google Scholar
  14. 14.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley/ACM (1999)Google Scholar
  15. 15.
    Chakrabarti, S.: Breaking through the syntax barrier: Searching with entities and relations. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS, vol. 3202, pp. 9–16. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Voorhees, E., Tice, D.: The TREC-8 question answering track evaluation. In: Proc. Eighth Text Retrieval Conference, TREC-8 (1999)Google Scholar
  17. 17.
    Walker, M.H., Eaton, N.J.: Microsoft Office Visio 2003 Inside Out. Microsoft Press (2003)Google Scholar
  18. 18.
    Barsalou, T., Keller, A.M., Siambela, N., Wiederhold, G.: Updating relational databases through object-based views. In: SIGMOD (1991)Google Scholar
  19. 19.
    Barsalou, T.: View objects for relational databases. Tech. Rep. STAN-CS-90-1310, CS Dept., Stanford University, Ph.D. thesis (1990)Google Scholar
  20. 20.
    Premerlani, W.J., Blaha, M.R.: An Approach for Reverse Engineering of Relational Databases. CACM 37(5) (1994)Google Scholar
  21. 21.
    Agichtein, E., Ganti, V.: Mining reference tables for automatic text segmentation. In: SIGKDD (2004)Google Scholar
  22. 22.
    Li, X., Morie, P., Roth, D.: Semantic Integration in Text: From Ambiguous Names to Identifiable Entities. AI Magazine: Special Issue on Semantic Integration (2005)Google Scholar
  23. 23.
    Poosala, V.: Histogram-based estimation techniques in database systems. PhD thesis, University of Wisconsin, Madison, WI, USA (1997)Google Scholar
  24. 24.
    Chen, P.P.-S.: The Entity-Relationship Model–Toward a Unified View of Data. ACM TODS 1(1) (1976)Google Scholar
  25. 25.
    Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational databases for querying XML documents: Limitations and opportunities. In: VLDB (1999)Google Scholar
  26. 26.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley/ACM (1999)Google Scholar
  27. 27.
    IBM. IBM DB2 UDB Net Search Extender : Administration and User Guide (version 8.1) (2003) Google Scholar
  28. 28.
  29. 29.
    Aditya, B., Bhalotia, G., Chakrabarti, S., Hulgeri, A., Nakhe, C., Parag, S.S.: BANKS: Browsing and Keyword Searching in Relational Databases. In: VLDB 2002, pp. 1083–1086 (2002)Google Scholar
  30. 30.
    Roy, S.B., Wang, H., Das, G., Nambiar, U., Mohania, M.K.: Minimum-effort driven dynamic faceted search in structured databases. In: CIKM 2008, pp. 13–22 (2008)Google Scholar
  31. 31.
  32. 32.
    Soltau, H., Kingsbury, B., Mangu, L., Povey, D., Saon, G., Zweig, G.: The IBM 2004 Coversational Telephony System for Rich Transcription. In: IEEE ICASSP (March 2005)Google Scholar
  33. 33.
    Wikipedia. Sanitization (classified information) — wikipedia, the free encyclopedia (2006)Google Scholar
  34. 34.
    U.S. Department of Energy. Department of energy researches use of advanced computing for document declassification,
  35. 35.
    Agichtein, E., Gravano, L., Pavel, J., Sokolova, V., Voskoboynik, A.: Snowball: A prototype system for extracting relations from large text collections. In: SIGMOD (2001)Google Scholar
  36. 36.
    Douglass, M.M., Clifford, G.D., Reisner, A., Long, W.J., Moody, G.B., Mark, R.G.: De-identification algorithm for free-text nursing notes. Computers in Cardiology (2005)Google Scholar
  37. 37.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: Efficient full domain k-anonymity. In: SIGMOD (2005)Google Scholar
  38. 38.
    Sweeney, L.: Replacing personally-identifying information in medical records, the srub system. Journal of the Americal Medical Informatics Association (1996)Google Scholar
  39. 39.
    Sweeney, L.: K-anonymity: A model for protecting privacy. Intl. Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5) (2002)Google Scholar
  40. 40.
    Tveit, A.: Anonymization of general practitioner medical records. In: HelsIT 2004, Trondheim, Norway (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Mukesh Mohania
    • 1
  • Manish Bhide
    • 1
  • Prasan Roy
    • 2
  • Venkatesan T. Chakaravarthy
    • 1
  • Himanshu Gupta
    • 1
  1. 1.IBM India Research LabNew DelhiIndia
  2. 2.Aster Data SystemsUSA

Personalised recommendations