Abstract
Automated and real-time management of customer relationships requires robust and intelligent data matching across widespread and diverse data sources. Simple string matching algorithms, such as dynamic programming, can handle typographical errors in the data, but are less able to match records that require contextual and experiential knowledge. Latent Semantic Indexing (LSI) (Berry et al. ; Deerwester et al. is a machine intelligence technique that can match data based upon higher order structure, and is able to handle difficult problems, such as words that have different meanings but the same spelling, are synonymous, or have multiple meanings. Essentially, the technique matches records based upon context, or mathematically quantifying when terms occur in the same record.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baeza-Yates R, Ribeiro-Neto B (1999) Modern Information Retrieval. ACM Press, New York.
Berry MW, Dumais ST, O’Brien GW (1995) Using Linear Algebra for Intelligent Information Retrieval. Siam Review 37 pp 573–595.
Deerwester S, Dumai ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by Latent Semantic Analysis. Journal of the Society for Information Science 41 pp 391–407.
Fellbaum C (1998) WordNet. MIT Press, Cambridge, MA.
Gibbons A (1985) Algorithmic Graph Theory. Cambridge University Press, Cambridge, England.
Hwa T, Lassig M (1996) Similarity Detection and Localization. Phys. Rev. Lett. 76 pp 2591–2594.
Manning CD, Schutze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
Telecordia (2007) Latent Semantic Indexing. Retrieved from http://lsi.research.telcordia.com.
Watts DJ, Strogatz SH (1998) Collective Dynamics of Small World Networks. Nature 393 p 440.
Wild F, Stahl C, Stermsek G, Neumann G (2005) Parameters Driving Effectiveness of Automated Essay Scoring with LSA. In Danson, M., ed.: Proceedings of the 9th CAA, Loughborough, Professional Development pp 485–494.
Acknowledgments
This work was supported by a grant from the Acxiom Corporation. We also thank Ms. Ameera Jaradat for contributions to the small world model.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Deaton, R., Doan, T., Schweiger, T. (2009). Semantic Data Matching: Principles and Performance. In: Chan, Y., Talburt, J., Talley, T. (eds) Data Engineering. International Series in Operations Research & Management Science, vol 132. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0176-7_4
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0176-7_4
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-0175-0
Online ISBN: 978-1-4419-0176-7
eBook Packages: Computer ScienceComputer Science (R0)