Skip to main content

Uncertainty Representations for Information Retrieval with Missing Data

  • Chapter
Fusion Methodologies in Crisis Management

Abstract

Retrieving items such as similar past events, or vessels with a specific characteristic of interest, is a critical task for crisis management support. The problem of information retrieval from incomplete databases is addressed in this paper. In particular, we assess the impact of the uncertainty representation about missing data for retrieving the corresponding items. After a brief survey on the problem of missing data with an emphasis on the information retrieval application, we propose a novel approach for retrieving records with missing data. The general idea of the proposed data-driven approach is to model the uncertainty pertaining to this missing data. We chose the general model of belief functions as it encompasses as special cases both classical set and probability models. Several uncertainty models are then compared based on (1) an expressiveness criterion (non-specificity or randomness) and (2) objective measures of performance typical to the Information Retrieval domain. The results are illustrated on a real dataset and a simulation controlled missing data mechanism.

This work is an updated version of Jousselme and Maupin (2013).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.nctc.gov.

  2. 2.

    http://msi.nga.mil/NGAPortal/MSI.portal?_nfpb=true&_pageLabel=msi_portal_page_64.

  3. 3.

    http://www.nctc.gov.

  4. 4.

    http://www.geocommons.com.

  5. 5.

    Maritime Mobile Service Identity.

  6. 6.

    International Maritime Organization.

  7. 7.

    Descriptors are also called terms, features, attributes, etc.

  8. 8.

    The typology of uncertainty types referred here is the one of Klir and Yuan (1995), in which fuzziness is omitted. Note that randomness is called discord in Klir and Yuan (1995).

  9. 9.

    We used here equal weights.

  10. 10.

    http://eturwg.c4i.gmu.edu.

  11. 11.

    http://archive.ics.uci.edu/ml/index.html.

  12. 12.

    Among the series of results obtained for different values of Ï„ we selected these ones as they were amongst those with (1) a clear difference between the models and (2) good performances. Further results will be provided in an extended version of our work.

References

  • Aamodt A, Plaza E (1994) Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun 7(1):39–59

    Google Scholar 

  • Ahlgren P, Grönqvist L (2006) Retrieval evaluation with incomplete relevance data: a comparative study of three measures. In: 15th ACM international conference on information and knowledge management, Arlington

    Google Scholar 

  • Bach Tobji MA, Ben Yaghlane B, Mellouli K (2008) A new algorithm for mining frequent itemsets from evidential databases. In: Magdalena JVL, Ojeda-Aciego M (ed) Proceedings of IPMU, pp 1535–1542

    Google Scholar 

  • Brini A, Boughanem M, Dubois D (2005) A model for information retrieval based on possibilistic networks. In: String processing and information retrieval (SPIRE 2005), Buenos Aires. Lecture notes in computer sciences. Springer, New York, pp 271–282

    Google Scholar 

  • Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’04), Sheffield, pp 25–32

    Google Scholar 

  • Burkhard H-D (2004) Case completion and similarity in case-based reasoning. Comput Sci Inf Syst 1(2):27–55

    Article  Google Scholar 

  • Chen LA (1988) On information retrieval and evidential reasoning. Tech. Rep. UCB/CSD-88-429, EECS Department, University of California, Berkeley

    Google Scholar 

  • Chen N, Dahanayake A (2007) Role-based situation-aware information seeking and retrieval for crisis response. Int J Intell Control Syst 12:186–197

    Google Scholar 

  • Chowdhary KR, Bansal VS (2011) Information retrieval using probability and belief theory. In: International conference on emerging trends in networks and computer communications (ETNCC), pp 188–191

    Google Scholar 

  • Costa PCG, Laskey K, Blasch E, Jousselme A-L (2012) Towards unbiased evaluation of uncertainty reasoning: The URREF Ontology. In: Proceedings of the 15th International Conference on Information Fusion, Singapore

    Google Scholar 

  • Crestani F, Lalmas M, Van Rijsbergen CJ, Campbell I (1998) Is this document relevant? … probably: a survey of probabilistic models in information retrieval. ACM Comput Surv 30(4):528–552

    Article  Google Scholar 

  • Dalvi N, Re C, Suciu D (2009) Probabilistic databases: diamonds in the dirt (extended version). Commun ACM 52:86–94

    Article  Google Scholar 

  • da Silva WT, Milidiú RL (1993) Belief function model for information retrieval. J Am Soc Inf Sci 44(2):10–18

    Article  Google Scholar 

  • Farhangfar A, Kurgan L, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern - A: Syst and Humans 37(5):692–708

    Article  Google Scholar 

  • Fuhr N (1992) Probabilistic models in information retrieval. Comput J 35:243–255

    Article  MATH  Google Scholar 

  • Hewawasam GK, Premaratne K, Subasingha M-L, Shyu SP (2005) Rule mining and classification in imperfect databases. In: Proceedings of the 7th international conference on information fusion

    Google Scholar 

  • Joussselme A-L, Maupin P (2012) A brief survey of comparative elements for uncertainty calculi and decision procedures assessment. In: Proceedings of the 15th international conference on information fusion, 2012. Panel Uncertainty Evaluation: Current Status and Major Challenges

    Google Scholar 

  • Jousselme A-L, Maupin P (2013) Comparison of uncertainty representations for missing data in information retrieval. In: Proceedings of the international conference of information fusion, Istanbul

    Google Scholar 

  • Jousselme A-L, Grenier D, Bossé E (2001) A new distance between two bodies of evidence. Inf Fusion 2:91–101

    Article  Google Scholar 

  • Kim W, Choi B-J, Hong E-K, Kim S-K, Lee D (2003) A taxonomy of dirty data. Data Min Knowl Discov 7(1):81–99

    Article  MathSciNet  Google Scholar 

  • Klir GJ, Yuan B (1995) Fuzzy sets and fuzzy logic: theory and applications. Prentice Hall International, Upper Saddle River

    MATH  Google Scholar 

  • Lalmas M (1998) Information retrieval and Dempster-Shafer’s theory of evidence. In: Applications of uncertainty formalisms. Lecture notes in computer science, Chap. B. Springer Berlin/Heidelberg, pp 157–176

    Google Scholar 

  • Lee SK (1992) Imprecise and uncertain information in databases: an evidential approach. In: Proceedings of the 8th international conference data engineering, pp 614–621

    Google Scholar 

  • Lynch SM (2003) Missing data. http://www.princeton.edu/~slynch/soc504/missingdata.pdf

  • McClean S, Scotney B, Shapcott M (2001) Aggregation of imprecise and uncertain information in databases. IEEE Trans Knowl Data Eng 13:902

    Article  Google Scholar 

  • National Counterterrorism Center (NCTC) (2010) Worldwide Incidents Tracking System (WITS) report on terrorism. http://www.nctc.gov/, April 2011

  • Schafer JL, John WG (2004) Missing data: our view of the state of the art. Psychol Methods 7(2):147–177

    Article  Google Scholar 

  • Schmidt R, Vorobieva O (2007) Applying case-based reasoning for missing medical data in ISOR. In: LWA 07, pp 275–280

    Google Scholar 

  • Telmoudi A, Chakhar S (2004) Data fusion application from evidential databases as a support for decision making. Inf Softw Technol 46:547–555

    Article  Google Scholar 

  • Wu S, McClean S (2006) Evaluation of system measures for incomplete relevance judgment in IR. In: Flexible query answering systems. Lecture notes in computer sciences, vol 4027. Springer, New York, pp 245–256

    Google Scholar 

  • Yassir A, Nayak S (2012) Issues in data mining and information retrieval. Int J Comput Sci Commun Netw 2:93–98

    Google Scholar 

  • Yi X (2011) Discovering and using implicit data for information retrieval. Ph.D. thesis, University of Massachusetts Amherst

    Google Scholar 

  • Zaffalon M (2002) Exact credal treatment of missing data. J Stat Plann Inference 105(1):105–122

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne-Laure Jousselme .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Jousselme, AL., Maupin, P. (2016). Uncertainty Representations for Information Retrieval with Missing Data. In: Rogova, G., Scott, P. (eds) Fusion Methodologies in Crisis Management. Springer, Cham. https://doi.org/10.1007/978-3-319-22527-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22527-2_5

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22526-5

  • Online ISBN: 978-3-319-22527-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics