Dataspace Management for Large Data Sets
Abstract
In an ideal case, Big Data analysis will enable us to learn relevant and interesting facts using large interconnected data sets. Dataspace support platforms and dataspace management systems have been proposed to help analysts bring together data related to the analyst’s interests. In this paper, we provide an example of such a platform. In addition to storing data and description of its characteristics, the platform supports verifying compatibility (and eventually summarizability) of the underlying data. This will help the analysts discover mistakes and prevent meaningless aggregations. As an example of utilizing the platform, we present a case of large data sets (tens of millions of observations), describe how the data sets can be used, and study the platform’s performance.
Keywords
Data model Dataspace OLAP Business intelligenceNotes
Acknowledgments
The authors wish to thank COMTRADE for the access to their export data and Dr. Leslie Klieb for comments.
References
- 1.McFedries, P.: The coming of data deluge. IEEE Spectrum. 48, 19 (2011)CrossRefGoogle Scholar
- 2.Chen, P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)CrossRefGoogle Scholar
- 3.Halevy, A., Franklin, M., Maier, D.: Principles of dataspace systems. In: Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2006Google Scholar
- 4.Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23, 3–13 (2000)Google Scholar
- 5.Chaudhuri, S., Dayal, U., Vivek, N.: An overview of business intelligence technology. Comm. ACM. 54(8), 88–98 (2011)CrossRefGoogle Scholar
- 6.Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. ACM Sigmod Rec. 34(4), 27–33 (2005)CrossRefGoogle Scholar
- 7.Winston, W.: Microsoft Excel Data Analysis and Business Modeling, 5th edn, p. 864. Microsoft Press, Redmond (2016)Google Scholar
- 8.Lenz, H.-J., Shoshani, A.: Summarizability in OLAP and statistical data bases. In: Proceedings of the Ninth International Conference on Scientific and Statistical Database Management, 1997Google Scholar
- 9.Niemi, T., Niinimäki, M., Thanisch, P., Nummenmaa, J.: Detecting summarizability in OLAP. Data Knowl. Eng. 89, 1–20 (2014)CrossRefGoogle Scholar
- 10.Harinath, S., Pihlgren, R., Guang-Yeu Lee, D., Sirmon, J., Bruckner, R.R.: Professional Microsoft SQL Server 2012 Analysis Services with MDX and DAX. Wiley, Hoboken (2012)Google Scholar
- 11.Dittrich, J.-P.: iMeMex: a platform for personal dataspace management. In: Proceedings of Workshops of International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006Google Scholar
- 12.Mirza, H.T., Chen, L., Chen, G.: Practicability of dataspace systems. Int. J. Digital Content Technol. Appl. 4, 3 (2010)Google Scholar
- 13.Moilanen, K., Niemi, T., Näppilä, T., Kuru, M.: A visual XML dataspace approach for satisfying ad hoc information needs. J. Assoc. Inf. Sci. Technol. 66(11), 2304–2320 (2015)CrossRefGoogle Scholar
- 14.Niinimaki, M., Niemi, T.: An ETL process for OLAP using RDF/OWL ontologies. J. Data Semantics. XIII, 97–119 (2009)CrossRefGoogle Scholar
- 15.Stevens, S.: On the theory of scales of measurement. Science. 103(2684), 677–680 (1947)CrossRefGoogle Scholar
- 16.Grinberg, M.: Flask Web Development: Developing Web Applications with Python. O’Reilly Media, Sebastopol (2014)Google Scholar
- 17.Winston, W.: Microsoft Excel Data Analysis and Business Modeling. Microsoft Press, Redmond (2016)Google Scholar
- 18.Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Khan, S.: The rise of “big data” on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015)CrossRefGoogle Scholar
- 19.Cusumano, M.: Cloud computing and SaaS as new computing platforms. Commun. ACM. 53(4), 27–29 (2010)CrossRefGoogle Scholar