Abstract
Systems supporting situation awareness typically deal with a vast stream of information about a large number of real-world objects anchored in time and space provided by multiple sources. These sources are often characterized by frequent updates, heterogeneous formats and most crucial, identical, incomplete and often even contradictory information. In this respect, duplicate detection methods are of paramount importance allowing to explore whether or not information having, e.g., different origins or different observation times concern one and the same real-world object. Although many such duplicate detection methods have been proposed in literature—each of them having different origins, pursuing different goals and often, by nature, being heavily domain-specific—the unique characteristics of situation awareness and their implications on the method’s applicability were not the focus up to now. This paper examines existing duplicate detection methods appearing to be suitable in the area of situation awareness and identifies their strengths and shortcomings. As a prerequisite, based on a motivating case study in the domain of road traffic management, an evaluation framework is suggested, which categorizes the major requirements on duplicate detection methods with regard to situation awareness.
This work has been funded by the Austrian Federal Ministry of Transport, Innovation and Technology (BMVIT) under grant FIT-IT 819577.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Endsley, M.R.: Design and evaluation for situation awareness enhancement. In: Proceedings of the Human Factors Society 32nd Annual Meeting, Santa Monica, CA, USA, pp. 97–101. Human Factors Society (1988)
Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys 41(1) (2008)
Hernández, M.A., Stolfo, S.J.: Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery 2(1), 9–37 (1998)
Slivinskas, G., Jensen, C.S., Snodgrass, R.T.: A foundation for conventional and temporal query optimization addressing duplicates and ordering. IEEE Transactions on Knowledge and Data Engineering 13(1), 21–49 (2001)
Schwering, A., Raubal, M.: Measuring semantic similarity between geospatial conceptual regions. In: Proceedings of the 1st International Conference on GeoSpatial Semantics, Mexico City, Mexico, pp. 90–106 (2005)
Ananthakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating fuzzy duplicates in data warehouses. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB Endowment, pp. 586–597 (2002)
Deng, F., Rafiei, D.: Approximately detecting duplicates for streaming data using stable bloom filters. In: Proceedings of the 2006 ACM SIGMOD Intl. Conference on Management of Data, pp. 25–36. ACM Press, New York (2006)
Jefferey, S.R., Alonso, G., Franklin, M.J., Hong, W., Widom, J.: Declarative support for sensor data cleaning. In: Proceedings of the 4th International Conference on Pervasive Computing, Dublin, Ireland, pp. 83–100. Springer, Heidelberg (2006)
Weis, M., Naumann, F.: Dogmatix Tracks Down Duplicates in XML. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, MD, USA, June 2005, pp. 431–442. ACM Press, New York (2005)
Noy, N.F.: Semantic integration: A survey of ontology-based approaches. SIGMOD Rec. 33(4), 65–70 (2004)
Wongsuphasawat, K., Shneiderman, B.: Finding comparable temporal categorical records: A similarity measure with an interactive visualization. Technical Report HCIL-2009-08, University of Maryland (2009)
Herzog, T.N., Scheuren, F.J., Winkler, W.E.: Data Quality and Record Linkage Techniques. Springer, Heidelberg (2007)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)
Schwering, A.: Approaches to semantic similarity measurement for geo-spatial data: A survey. Transactions in GIS 12(1), 5–29 (2008)
Morris, A., Velegrakis, Y., Bouquet, P.: Entity identification on the semantic web. In: Proceedings of the 5th International Workshop on Semantic Web Applications and Perspectives, Rome, Italy (2008)
Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 85–96. ACM Press, New York (2005)
Mularie, W.M.: World Geodetic System 1984–Its Definition and Relationships with Local Geodetic Systems. Technical Report TR8350.2, National Imagery and Mapping Agency (2000)
ITU-R: TF.460-4, Annex I. International Telecommunication Union (1970)
Baumgartner, N., Retschitzegger, W.: Towards a situation awareness framework based on primitive relations. In: Proceedings of the IEEE Conference on Information, Decision, and Control (IDC), Adelaide, Australia, pp. 291–295. IEEE, Los Alamitos (2007)
Abraham, T., Roddick, J.F.: Survey of spatio-temporal databases. GeoInformatica 3(1), 61–99 (1999)
Caspi, Y., Irani, M.: Spatio-temporal alignment of sequences. IEEE Transactions on Pattern Analysis Machine Intelligence 24(11), 1409–1424 (2002)
Liao, T.W.: Clustering of time series data–a survey. Pattern Recognition 38(11), 1857–1874 (2005)
Dyreson, C.E., Evans, W., Lin, H., Snodgrass, R.T.: Efficiently supporting temporal granularities. IEEE Trans. on Knowledge and Data Eng. 12(4), 568–587 (2000)
Worboys, M.: Computation with imprecise geospatial data. Computer, Environment and Urban Systems 22(2), 85–106 (1998)
Khatri, V., Ram, S., Snodgrass, R.T., O’Brien, G.M.: Supporting user-defined granularities in a spatiotemporal conceptual model. Annals of Mathematics and Artificial Intelligence 36(1-2), 195–232 (2002)
Baumgartner, N., Retschitzegger, W., Schwinger, W., Kotsis, G., Schwietering, C.: Of situations and their neighbors—Evolution and Similarity in Ontology-Based Approaches to Situation Awareness. In: Kokinov, B., Richardson, D.C., Roth-Berghofer, T.R., Vieu, L. (eds.) CONTEXT 2007. LNCS (LNAI), vol. 4635, pp. 29–42. Springer, Heidelberg (2007)
Metwally, A., Agrawal, D., El Abbadi, A.: Duplicate detection in click streams. In: Proceedings of the 14th International Conference on World Wide Web, pp. 12–21. ACM, New York (2005)
Cheng, J., Ke, Y., Ng, W.: A survey on algorithms for mining frequent itemsets over data streams. Knowledge and Information Systems 16(1), 1–27 (2008)
Jensen, C.S., Snodgrass, R.T.: Temporal data management. IEEE Transactions on Knowledge and Data Engineering 11(1), 36–44 (1999)
Dekhtyar, A., Ross, R., Subrahmanian, V.S.: Probabilistic temporal databases, I: Algebra. ACM Transactions on Database Systems 26(1), 41–95 (2001)
Yick, J., Mukherjee, B., Ghosal, D.: Wireless sensor network survey. Computer Networks 52(12), 2292–2330 (2008)
Subramaniam, S., Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Online outlier detection in sensor data using non-parametric models. In: Proc. of the 32nd Intl. Conf. on Very Large Data Bases, VLDB Endowment, pp. 187–198 (2006)
Thor, A., Rahm, E.: MOMA - A Mapping-based Object Matching System. In: Proc. of the 3rd Biennial Conf. on Innovative Data Systems Research, Asilomar, CA, USA, pp. 247–258 (2007)
Rusu, L.I., Rahayu, J.W., Taniar, D.: On data cleaning in building XML data warehouses. In: Proceedings of the 6th International Conference on Information Integration and Web-based Applications Services. Austrian Computer Society, Jakarta (2004)
Weis, M., Naumann, F., Jehle, U., Lufter, J., Schuster, H.: Industry-scale duplicate detection. Proceedings of the VLDB Endowment 1(2), 1253–1264 (2008)
Kalfoglou, Y., Schorlemmer, M.: Ontology Mapping: The State of the Art. The Knowledge Engineering Review 18(1), 1–31 (2003)
Choi, N., Song, I.Y., Han, H.: A survey on ontology mapping. ACM SIGMOD Record 35(3), 34–41 (2006)
Castano, S., Ferrara, A., Lorusso, D., Montanelli, S.: On the ontology instance matching problem. In: Proceedings of the 19th International Conference on Database and Expert Systems Applications, pp. 180–184. IEEE, Turin (2008)
Qin, H., Dou, D., LePendu, P.: Discovering executable semantic mappings between ontologies. In: Meersman, R., Tari, Z. (eds.) OTM 2007, Part I. LNCS, vol. 4803, pp. 832–849. Springer, Heidelberg (2007)
Beeri, C., Kanza, Y., Safra, E., Sagiv, Y.: Object fusion in geographic information systems. In: Proceedings of the Thirtieth international conference on Very Large Data Bases, VLDB Endowment, pp. 816–827 (2004)
Sehgal, V., Getoor, L., Viechnicki, P.D.: Entity resolution in geospatial data integration. In: Proc. of the 14th Annual ACM International Symposium on Advances in Geographic Information Systems, pp. 83–90. ACM Press, New York (2006)
Rodríguez, M.A., Bertossi, L., Caniupán, M.: An inconsistency tolerant approach to querying spatial databases. In: Proc. of the 16th Intl. Conf. on Advances in Geographic Information Systems, pp. 1–10. ACM Press, New York (2008)
Bakillah, M., Mostafavi, M.A., Bédard, Y.: A semantic similarity model for mapping between evolving geospatial data cubes. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM 2006 Workshops. LNCS, vol. 4278, pp. 1658–1669. Springer, Heidelberg (2006)
Rivest, S., Bdard, Y., Proulx, M.J., Nadeau, M., Hubert, F., Pastor, J.: SOLAP technology: Merging business intelligence with geospatial technology for interactive spatio-temporal exploration and analysis of data. ISPRS Journal of Photogrammetry and Remote Sensing 60(1), 17–33 (2005)
Frentzos, E., Pelekis, N., Ntoutsi, I., Theodoridis, Y.: Trajectory Database Systems. In: Mobility, Data Mining and Privacy—Geographic Knowledge Discovery, pp. 151–188. Springer, Heidelberg (2008)
Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proceedings of the International Conference on Management of Data, pp. 491–502. ACM, New York (2005)
Frentzos, E., Gratsias, K., Theodoridis, Y.: Index-based most similar trajectory search. In: Proc. of the 23rd Int. Conf. on Data Engineering, pp. 816–825. IEEE, Los Alamitos (2007)
Hwang, J.R., Kang, H.Y., Li, K.J.: Searching for similar trajectories on road networks using spatio-temporal similarity. In: Proc. of the 10th East Euro. Conf. on Adv. in Databases and Inf. Sys., Thessaloniki, Greece, pp. 282–295. Springer, Heidelberg (2006)
Baumgartner, N., Retschitzegger, W., Schwinger, W.: Lost in time, space, and meaning—an ontology-based approach to road traffic situation awareness. In: Proc. of the 3rd Worksh. on Context Awareness for Proactive Sys. Guildford, UK (2007)
Roddick, J.F., Spiliopoulou, M.: A survey of temporal knowledge discovery paradigms and methods. IEEE Trans. on Knowl. and Data Eng. 14(4) (2002)
Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: A comparative evaluation. In: Proceedings of the SIAM International Conference on Data Mining, pp. 243–254. SIAM, Atlanta (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baumgartner, N., Gottesheim, W., Mitsch, S., Retschitzegger, W., Schwinger, W. (2009). “Same, Same but Different” A Survey on Duplicate Detection Methods for Situation Awareness. In: Meersman, R., Dillon, T., Herrero, P. (eds) On the Move to Meaningful Internet Systems: OTM 2009. OTM 2009. Lecture Notes in Computer Science, vol 5871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05151-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-05151-7_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05150-0
Online ISBN: 978-3-642-05151-7
eBook Packages: Computer ScienceComputer Science (R0)