Advertisement

Dataspace Management for Large Data Sets

  • Marko NiinimakiEmail author
  • Peter Thanisch
Chapter
Part of the EAI/Springer Innovations in Communication and Computing book series (EAISICC)

Abstract

In an ideal case, Big Data analysis will enable us to learn relevant and interesting facts using large interconnected data sets. Dataspace support platforms and dataspace management systems have been proposed to help analysts bring together data related to the analyst’s interests. In this paper, we provide an example of such a platform. In addition to storing data and description of its characteristics, the platform supports verifying compatibility (and eventually summarizability) of the underlying data. This will help the analysts discover mistakes and prevent meaningless aggregations. As an example of utilizing the platform, we present a case of large data sets (tens of millions of observations), describe how the data sets can be used, and study the platform’s performance.

Keywords

Data model Dataspace OLAP Business intelligence 

Notes

Acknowledgments

The authors wish to thank COMTRADE for the access to their export data and Dr. Leslie Klieb for comments.

References

  1. 1.
    McFedries, P.: The coming of data deluge. IEEE Spectrum. 48, 19 (2011)CrossRefGoogle Scholar
  2. 2.
    Chen, P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)CrossRefGoogle Scholar
  3. 3.
    Halevy, A., Franklin, M., Maier, D.: Principles of dataspace systems. In: Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2006Google Scholar
  4. 4.
    Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23, 3–13 (2000)Google Scholar
  5. 5.
    Chaudhuri, S., Dayal, U., Vivek, N.: An overview of business intelligence technology. Comm. ACM. 54(8), 88–98 (2011)CrossRefGoogle Scholar
  6. 6.
    Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. ACM Sigmod Rec. 34(4), 27–33 (2005)CrossRefGoogle Scholar
  7. 7.
    Winston, W.: Microsoft Excel Data Analysis and Business Modeling, 5th edn, p. 864. Microsoft Press, Redmond (2016)Google Scholar
  8. 8.
    Lenz, H.-J., Shoshani, A.: Summarizability in OLAP and statistical data bases. In: Proceedings of the Ninth International Conference on Scientific and Statistical Database Management, 1997Google Scholar
  9. 9.
    Niemi, T., Niinimäki, M., Thanisch, P., Nummenmaa, J.: Detecting summarizability in OLAP. Data Knowl. Eng. 89, 1–20 (2014)CrossRefGoogle Scholar
  10. 10.
    Harinath, S., Pihlgren, R., Guang-Yeu Lee, D., Sirmon, J., Bruckner, R.R.: Professional Microsoft SQL Server 2012 Analysis Services with MDX and DAX. Wiley, Hoboken (2012)Google Scholar
  11. 11.
    Dittrich, J.-P.: iMeMex: a platform for personal dataspace management. In: Proceedings of Workshops of International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006Google Scholar
  12. 12.
    Mirza, H.T., Chen, L., Chen, G.: Practicability of dataspace systems. Int. J. Digital Content Technol. Appl. 4, 3 (2010)Google Scholar
  13. 13.
    Moilanen, K., Niemi, T., Näppilä, T., Kuru, M.: A visual XML dataspace approach for satisfying ad hoc information needs. J. Assoc. Inf. Sci. Technol. 66(11), 2304–2320 (2015)CrossRefGoogle Scholar
  14. 14.
    Niinimaki, M., Niemi, T.: An ETL process for OLAP using RDF/OWL ontologies. J. Data Semantics. XIII, 97–119 (2009)CrossRefGoogle Scholar
  15. 15.
    Stevens, S.: On the theory of scales of measurement. Science. 103(2684), 677–680 (1947)CrossRefGoogle Scholar
  16. 16.
    Grinberg, M.: Flask Web Development: Developing Web Applications with Python. O’Reilly Media, Sebastopol (2014)Google Scholar
  17. 17.
    Winston, W.: Microsoft Excel Data Analysis and Business Modeling. Microsoft Press, Redmond (2016)Google Scholar
  18. 18.
    Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Khan, S.: The rise of “big data” on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015)CrossRefGoogle Scholar
  19. 19.
    Cusumano, M.: Cloud computing and SaaS as new computing platforms. Commun. ACM. 53(4), 27–29 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Business and TechnologyWebster University ThailandBangkokThailand
  2. 2.School of Information Sciences, University of TampereTampereFinland

Personalised recommendations