Data Quality Evaluation in Document Oriented Data Stores
Data quality management in document oriented data stores has not been deeply explored yet, presenting many challenges that arise because of the lack of a rigid schema associated to data. Data quality is a critical aspect in this kind of data stores, since its control is not possible and it is not a priority in the data storage stage. Additionally, data quality evaluation and improvement are also very difficult tasks due to the schema-less characteristic of data. This paper presents a first step towards data quality management in document oriented data stores. In order to address the problem, the paper proposes a strategy for defining data granularities for data quality evaluation and analyses some data quality dimensions relevant to document stores.
KeywordsDocument store Data Quality Schema-less Data quality dimensions Data granularities
- 1.Db-engines ranking of document stores. https://db-engines.com/en/ranking/document+store. Accessed 03 Feb 2018
- 2.Chodorow, K.: 50 Tips and Tricks for MongoDB Developers: Get the Most Out of Your Database. O’Reilly Media, Sebastopol (2011)Google Scholar
- 5.Juddoo, S.: Overview of data quality challenges in the context of big data. In: 2015 International Conference on Computing, Communication and Security (ICCCS), pp. 1–9, December 2015. https://doi.org/10.1109/CCCS.2015.7374131
- 7.Sadalage, P.J., Fowler, M.: NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley Professional, Upper Saddle River (2012)Google Scholar
- 8.Scannapieco, M., Catarci, T.: Data quality under a computer science perspective. Arch. Comput. 2, 1–15 (2002)Google Scholar