Big Data Quality: A Data Quality Profiling Model

  • Ikbal Taleb
  • Mohamed Adel SerhaniEmail author
  • Rachida Dssouli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11517)


Big Data is becoming a standard data model, and it is gaining wide adoption in the digital universe. Estimating the Quality of Big Data is recognized to be essential for data management and data governance. To ensure a fast and efficient data quality assessment represented by its dimensions, we need to extend the data profiling model to incorporate also quality profiling. The latter encompasses more value-added quality processes that go beyond data and its corresponding metadata. In this paper, we propose a Data Quality Profiling Model (BDQPM) for Big Data that involves several modules such as sampling, profiling, exploratory quality profiling, quality profile repository (QPREPO), and the data quality profile (DQP). Thus, the QPREPO plays an important role in managing many quality-related elements such as data quality dimensions and their related metrics, pre-defined quality actions scenarios, pre-processing activities (PPA), their related functions (PPAF), and the data quality profile. Our exploratory quality profiling method discovers a set of PPAF from systematic predefined quality actions scenarios to leverage the quality trends of any data set and show the cause and effects of such a process on the data. Such a quality overview is considered as a preliminary quality profile of the data. We conducted a series of experiments to test different features of the BDQPM including sampling and profiling, quality evaluation, and exploratory quality profiling for Big Data quality enhancement. The results prove that quality profiling tracks quality at the earlier stage of Big data life cycle leading to quality improvement and enforcement insights from exploratory quality profiling methodology.


Big Data quality Data Quality Profile Profile repository Data quality profiling 


  1. 1.
    Abedjan, Z.: An introduction to data profiling. In: Zimányi, E. (ed.) eBISS 2017. LNBIP, vol. 324, pp. 1–20. Springer, Cham (2018). Scholar
  2. 2.
    Abedjan, Z.: Data profiling. In: Sakr, S., Zomaya, A. (eds.) Encyclopedia of Big Data Technologies, pp. 563–568. Springer, Cham (2018). Scholar
  3. 3.
    Assunção, M.D., Calheiros, R.N., Bianchi, S., Netto, M.A.S., Buyya, R.: Big data computing and clouds: Trends and future directions. J. Parallel Distrib. Comput. 79(C), 3–15 (2015). Scholar
  4. 4.
    Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41, 1–52 (2009)CrossRefGoogle Scholar
  5. 5.
    Chester, J.: Cookie wars: how new data profiling and targeting techniques threaten citizens and consumers in the “Big Data” era. In: Gutwirth, S., Leenes, R., De Hert, P., Poullet, Y. (eds.) European Data Protection: in Good Health, pp. 53–77. Springer, Dordrecht (2012). Scholar
  6. 6.
    Dai, W., Wardlaw, I., Cui, Yu., Mehdi, K., Li, Y., Long, J.: Data profiling technology of data governance regarding big data: review and rethinking. Information Technology: New Generations. AISC, vol. 448, pp. 439–450. Springer, Cham (2016). Scholar
  7. 7.
    Naumann, F.: Big Data Profiling (2014)Google Scholar
  8. 8.
    Géczy, P.: Big data characteristics. The Macrotheme Review 3, 94–104 (2014)Google Scholar
  9. 9.
    Glowalla, P., Balazy, P., Basten, D., Sunyaev, A.: Process-driven data quality management – an application of the combined conceptual life cycle model. Presented at the 2014 47th Hawaii International Conference on System Sciences (HICSS), pp. 4700–4709 (2014).
  10. 10.
    Gu, X., et al.: Profiling Web users using big data. Soc. Netw. Anal. Min. 8, 24 (2018). Scholar
  11. 11.
    Hasan, O., Habegger, B., Brunie, L., Bennani, N., Damiani, E.: A discussion of privacy challenges in user profiling with big data techniques: the EEXCESS use case. In: BigDataCongress, pp. 25–30 (2013)Google Scholar
  12. 12.
    Eembi, N.B.C., Ishak, I.B., Sidi, F., Affendey, L.S., Mamat, A.: A systematic review on the profiling of digital news portal for big data veracity. Proc. Comput. Sci. 72, 390–397 (2015)CrossRefGoogle Scholar
  13. 13.
    Johnson, T.: Data profiling. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 808–812. Springer, New York (2018). Scholar
  14. 14.
    Loshin, D.: Rapid Data Quality Assessment Using Data Profiling, vol. 15 (2010)Google Scholar
  15. 15.
    Maier, M., Serebrenik, A., Vanderfeesten, I.T.P.: Towards a Big Data Reference Architecture. University of Eindhoven (2013)Google Scholar
  16. 16.
    McNeil, B.J., Pedersen, S.H., Gatsonis, C.: Current issues in profiling quality of care. Inquiry 29, 298–307 (1992)Google Scholar
  17. 17.
    Naumann, F.: Data profiling revisited. ACM SIGMOD Rec. 42, 40–49 (2014)CrossRefGoogle Scholar
  18. 18.
    Oliveira, P., Rodrigues, F., Henriques, P.R.: A formal definition of data quality problems. In: IQ (2005)Google Scholar
  19. 19.
    Prabha, M.S., Sarojini, B.: Survey on Big Data and Cloud Computing, pp. 119–122. IEEE (2017)Google Scholar
  20. 20.
    Sidi, F., Shariat Panahy, P.H., Affendey, L.S., Jabar, M.A., Ibrahim, H., Mustapha, A.: Data quality: a survey of data quality dimensions. In: CAMP 2012, pp 300–304 (2012)Google Scholar
  21. 21.
    Talwalkar AKA The Big Data Bootstrap. 20Google Scholar
  22. 22.
    Sun, Z.: 10 Bigs: Big Data and Its Ten Big Characteristics (2018).

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Concordia UniversityMontrealCanada
  2. 2.UAE UniversityAl Ain, Abu DhabiUnited Arab Emirates

Personalised recommendations