Evaluating and Improving Data Fusion Accuracy

  • John R. TalburtEmail author
  • Daniel Pullen
  • Melody Penning
Part of the Information Fusion and Data Science book series (IFDS)


Information fusion is the process of combining different sources of information for use in a particular application. The production of almost every information product incorporates some level of data fusion. Poor implementation of data and information fusion will have an impact on many other key data processes, most particularly data quality management, data governance, and data analytics. In this chapter we focus on a particular type of data fusion process called entity-based data fusion (EBDF) and on the application of EBDF in high-risk applications where accuracy of the fusion must be very high. One of the foremost examples is in healthcare. Fusing information belonging to different patients or failing to bring together all of the information for the same patient can both have dire, even life-threatening, implications.


Entity-based data fusion Probabilistic matching Precision Recall F-Measure Data quality management Quality control Quality assurance 


  1. 1.
    P. Christen, Febrl- A freely available record linkage system with a graphical user interface, in Proceedings of the Australian Workshop on Health Data and Knowledge Management (HDKM). Conferences in research and practice in information technology (CRPIT), Wollongong, January 2008, vol. 80Google Scholar
  2. 2.
    P. Christen, Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection (Springer, New York, 2012)CrossRefGoogle Scholar
  3. 3.
    A. Doan, A. Halevy, Z. Ives, Principles of Data Integration (Morgan Kaufmann, Waltham, 2012)Google Scholar
  4. 4.
    A. Eram, A.G. Mohammed, V. Pillai, J.R. Talburt, Comparing the effectiveness of deterministic matching with probabilistic matching for entity resolution of student enrollment records, in 22nd MIT International Conference on Information Quality (ICIQ-2017), Little Rock, 6–7 October 2017, pp. 14:1–14:12Google Scholar
  5. 5.
    I. Fellegi, A. Sunter, A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969)CrossRefGoogle Scholar
  6. 6.
    C. Fisher, E. Lauria, S. Chengalur-Smith, R. Wang, Introduction to Information Quality (MIT Information Quality Program, Cambridge, MA, 2008)Google Scholar
  7. 7.
    T.N. Herzog, F.J. Scheuren, W.E. Winkler, Data Quality and Record Linkage Techniques (Springer, New York, 2007)zbMATHGoogle Scholar
  8. 8.
    G. Holland, J.R. Talburt, A framework for evaluating information source interactions, in 2008 Conference on Applied Research in Information Technology, ed. by C. Hu, D. Berleant (University of Central Arkansas, Conway, 2008), pp. 13–19.
  9. 9.
    G. Holland, J.R. Talburt, An entity-based integration framework for modeling and evaluating data enhancement products. J. Comput. Sci. Coll. 24(5), 65–73 (2010)Google Scholar
  10. 10.
    ISO 8000-Part 61, Data Quality Management: Process Reference Model (ISO copyright office, Geneva, 2016)Google Scholar
  11. 11.
    F. Kobayashi, A. Eram, J. Talburt, Entity resolution using logistic regression as an extension to the rule-based OYSTER system, in Proceedings: IEEE International Conference on Multimedia Information Processing and Retrieval (IEEE MIPR 2018), Miami, 10–12 April 2018 (accepted for publication)Google Scholar
  12. 12.
    E. Lawley, Building a health data hub. March 29, 2010. Nashville Post (online version, downloaded July 24, 2010)Google Scholar
  13. 13.
    Y.W. Lee, L.L. Pipino, J.D. Funk, R.Y. Wang, Journey to Data Quality (MIT Press, Cambridge, MA, 2006)Google Scholar
  14. 14.
    D. Mahata, J.R. Talburt, A framework for collecting and managing entity identity information from social media, in 19th MIT International Conference on Information Quality, Xi’an, 1–3 August, 2014, pp. 216–233Google Scholar
  15. 15.
    C.D. Manning, P. Raghavan, H. Schütze, An Introduction to Information Retrieval (Cambridge University Press, Cambridge, England, 2009)zbMATHGoogle Scholar
  16. 16.
    E. Nelson, J.R. Talburt, Improving the quality of law enforcement information through entity resolution, in 2008 Conference on Applied Research in Information Technology, ed. by C. Hu, D. Berleant (University of Central Arkansas, Conway, 2008), pp. 113–118.
  17. 17.
    E. Nelson, J.R. Talburt, Entity resolution for longitudinal studies in education using OYSTER, in Proceedings: 2011 Information and Knowledge Engineering Conference (IKE 2011), Las Vegas, 18–20 July 2011, pp. 286–290Google Scholar
  18. 18.
    M. Penning, J.R. Talburt, Information quality assessment and improvement of student information in the university environment, in The 2012 International Conference on Information and Knowledge Engineering (IKE’12), Las Vegas, 16–29 July 2012, pp. 351–357Google Scholar
  19. 19.
    M. Penning, Inferred error rates for entity resolution, Doctoral Dissertation, University of Arkansas at Little Rock, Published by Proquest, 2016Google Scholar
  20. 20.
    D. Pullen, A system for stratified sampling of entity resolution results to assess and improve accuracy with minimal clerical review, Doctoral dissertation, University of Arkansas at Little Rock, Published by Proquest, 2017Google Scholar
  21. 21.
    W.M. Rand, Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRefGoogle Scholar
  22. 22.
    J.R. Talburt, R. Hashemi, A formal framework for defining entity-based, data source integration, in 2008 International Conference on Information and Knowledge Engineering, ed. by H. Arabnia, R. Hashemi (CSREA Press, Las Vegas, 2008), pp. 394–398Google Scholar
  23. 23.
    J.R. Talburt, Y. Zhou, Entity Information Life Cycle for Big Data: Master Data Management and Information Integrations (Morgan Kaufmann, Waltham, 2015)Google Scholar
  24. 24.
    J.R. Talburt, Entity Resolution and Information Quality (Morgan Kaufmann, San Francisco, 2011)Google Scholar
  25. 25.
    E.M. Voorhees, W. Hersh, Overview of the TREC 2012 medical records track, in The Twenty-First Text Retrieval Conference (TREC 2012) Proceedings, National Institute of Standards and Technology, 2012Google Scholar
  26. 26.
    P. Wang, D. Pullen, J.R. Talburt, N. Wu, Iterative approach to weight calculation in probabilistic entity resolution, in 2014 International Conference on Information Quality, Xi’an, 1–3 August 2014Google Scholar
  27. 27.
    R.Y. Wang, A product perspective on total data quality management. Commun. ACM 41(2), 58–65 (1998)CrossRefGoogle Scholar
  28. 28.
    W.E. Winkler, Automatically Estimating Record Linkage False Match Rates (Census Bureau, Statistical Research Division, Washington, DC, 2007)Google Scholar
  29. 29.
    E. Yilmaz, J.A. Aslam, Estimating average precision with incomplete and imperfect judgments, in Proceedings of the Fifteenth ACM International Conference on Information and Knowledge Management, ACM Press, New York, NY, 2006Google Scholar
  30. 30.
    E. Yilmaz, E. Kanoulas, J.A. Aslam, A simple and efficient sampling method for estimating AP and NDCG, in Proceedings of the Thirty-First Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), Singapore, 2008Google Scholar
  31. 31.
    Y. Zhou, A. Kooshesh, J. Talburt, Optimizing the accuracy of entity-based data integration of multiple data sources using genetic programming methods. Int. J. Bus. Intell. Res. 3(1), 72–82 (2012)CrossRefGoogle Scholar
  32. 32.
    Y. Zhou, J. Talburt, Y. Su, L. Yin, OYSTER: a tool for entity resolution in health information exchange, in 5th International Conference on the Cooperation and Promotion of Information Resources in Science and Technology (COINFO’10), Beijing, 27–29 November 2010, pp. 356–362Google Scholar
  33. 33.
    Y. Zhou, J.R. Talburt, Entity identity information management, in International Conference on Information Quality 2011, Adelaide, 18–20 November 2011, electronic proceedings at:

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • John R. Talburt
    • 1
    Email author
  • Daniel Pullen
    • 1
  • Melody Penning
    • 2
  1. 1.University of Arkansas at Little RockLittle RockUSA
  2. 2.University of Arkansas for Medical SciencesLittle RockUSA

Personalised recommendations