Abstract
Retrieving items such as similar past events, or vessels with a specific characteristic of interest, is a critical task for crisis management support. The problem of information retrieval from incomplete databases is addressed in this paper. In particular, we assess the impact of the uncertainty representation about missing data for retrieving the corresponding items. After a brief survey on the problem of missing data with an emphasis on the information retrieval application, we propose a novel approach for retrieving records with missing data. The general idea of the proposed data-driven approach is to model the uncertainty pertaining to this missing data. We chose the general model of belief functions as it encompasses as special cases both classical set and probability models. Several uncertainty models are then compared based on (1) an expressiveness criterion (non-specificity or randomness) and (2) objective measures of performance typical to the Information Retrieval domain. The results are illustrated on a real dataset and a simulation controlled missing data mechanism.
This work is an updated version of Jousselme and Maupin (2013).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
Maritime Mobile Service Identity.
- 6.
International Maritime Organization.
- 7.
Descriptors are also called terms, features, attributes, etc.
- 8.
- 9.
We used here equal weights.
- 10.
- 11.
- 12.
Among the series of results obtained for different values of Ï„ we selected these ones as they were amongst those with (1) a clear difference between the models and (2) good performances. Further results will be provided in an extended version of our work.
References
Aamodt A, Plaza E (1994) Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun 7(1):39–59
Ahlgren P, Grönqvist L (2006) Retrieval evaluation with incomplete relevance data: a comparative study of three measures. In: 15th ACM international conference on information and knowledge management, Arlington
Bach Tobji MA, Ben Yaghlane B, Mellouli K (2008) A new algorithm for mining frequent itemsets from evidential databases. In: Magdalena JVL, Ojeda-Aciego M (ed) Proceedings of IPMU, pp 1535–1542
Brini A, Boughanem M, Dubois D (2005) A model for information retrieval based on possibilistic networks. In: String processing and information retrieval (SPIRE 2005), Buenos Aires. Lecture notes in computer sciences. Springer, New York, pp 271–282
Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’04), Sheffield, pp 25–32
Burkhard H-D (2004) Case completion and similarity in case-based reasoning. Comput Sci Inf Syst 1(2):27–55
Chen LA (1988) On information retrieval and evidential reasoning. Tech. Rep. UCB/CSD-88-429, EECS Department, University of California, Berkeley
Chen N, Dahanayake A (2007) Role-based situation-aware information seeking and retrieval for crisis response. Int J Intell Control Syst 12:186–197
Chowdhary KR, Bansal VS (2011) Information retrieval using probability and belief theory. In: International conference on emerging trends in networks and computer communications (ETNCC), pp 188–191
Costa PCG, Laskey K, Blasch E, Jousselme A-L (2012) Towards unbiased evaluation of uncertainty reasoning: The URREF Ontology. In: Proceedings of the 15th International Conference on Information Fusion, Singapore
Crestani F, Lalmas M, Van Rijsbergen CJ, Campbell I (1998) Is this document relevant? … probably: a survey of probabilistic models in information retrieval. ACM Comput Surv 30(4):528–552
Dalvi N, Re C, Suciu D (2009) Probabilistic databases: diamonds in the dirt (extended version). Commun ACM 52:86–94
da Silva WT, Milidiú RL (1993) Belief function model for information retrieval. J Am Soc Inf Sci 44(2):10–18
Farhangfar A, Kurgan L, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern - A: Syst and Humans 37(5):692–708
Fuhr N (1992) Probabilistic models in information retrieval. Comput J 35:243–255
Hewawasam GK, Premaratne K, Subasingha M-L, Shyu SP (2005) Rule mining and classification in imperfect databases. In: Proceedings of the 7th international conference on information fusion
Joussselme A-L, Maupin P (2012) A brief survey of comparative elements for uncertainty calculi and decision procedures assessment. In: Proceedings of the 15th international conference on information fusion, 2012. Panel Uncertainty Evaluation: Current Status and Major Challenges
Jousselme A-L, Maupin P (2013) Comparison of uncertainty representations for missing data in information retrieval. In: Proceedings of the international conference of information fusion, Istanbul
Jousselme A-L, Grenier D, Bossé E (2001) A new distance between two bodies of evidence. Inf Fusion 2:91–101
Kim W, Choi B-J, Hong E-K, Kim S-K, Lee D (2003) A taxonomy of dirty data. Data Min Knowl Discov 7(1):81–99
Klir GJ, Yuan B (1995) Fuzzy sets and fuzzy logic: theory and applications. Prentice Hall International, Upper Saddle River
Lalmas M (1998) Information retrieval and Dempster-Shafer’s theory of evidence. In: Applications of uncertainty formalisms. Lecture notes in computer science, Chap. B. Springer Berlin/Heidelberg, pp 157–176
Lee SK (1992) Imprecise and uncertain information in databases: an evidential approach. In: Proceedings of the 8th international conference data engineering, pp 614–621
Lynch SM (2003) Missing data. http://www.princeton.edu/~slynch/soc504/missingdata.pdf
McClean S, Scotney B, Shapcott M (2001) Aggregation of imprecise and uncertain information in databases. IEEE Trans Knowl Data Eng 13:902
National Counterterrorism Center (NCTC) (2010) Worldwide Incidents Tracking System (WITS) report on terrorism. http://www.nctc.gov/, April 2011
Schafer JL, John WG (2004) Missing data: our view of the state of the art. Psychol Methods 7(2):147–177
Schmidt R, Vorobieva O (2007) Applying case-based reasoning for missing medical data in ISOR. In: LWA 07, pp 275–280
Telmoudi A, Chakhar S (2004) Data fusion application from evidential databases as a support for decision making. Inf Softw Technol 46:547–555
Wu S, McClean S (2006) Evaluation of system measures for incomplete relevance judgment in IR. In: Flexible query answering systems. Lecture notes in computer sciences, vol 4027. Springer, New York, pp 245–256
Yassir A, Nayak S (2012) Issues in data mining and information retrieval. Int J Comput Sci Commun Netw 2:93–98
Yi X (2011) Discovering and using implicit data for information retrieval. Ph.D. thesis, University of Massachusetts Amherst
Zaffalon M (2002) Exact credal treatment of missing data. J Stat Plann Inference 105(1):105–122
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Jousselme, AL., Maupin, P. (2016). Uncertainty Representations for Information Retrieval with Missing Data. In: Rogova, G., Scott, P. (eds) Fusion Methodologies in Crisis Management. Springer, Cham. https://doi.org/10.1007/978-3-319-22527-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-22527-2_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22526-5
Online ISBN: 978-3-319-22527-2
eBook Packages: EngineeringEngineering (R0)