Variety Management for Big Data

  • Wolfgang Mayer
  • Georg Grossmann
  • Matt Selway
  • Jan Stanek
  • Markus Stumptner


Of the core challenges originally associated with Big Data, namely Volume, Velocity, and Variety, the Variety aspect is the one that is least addressed by the standard analytics architectures. In this chapter, we analyze types and sources of variety and describe data- and metadata management principles for organizing data lakes. We discuss how semantic metadata can help describe and manage variety in structure, provenance, visibility and permitted use. Moreover, ontologies and metadata catalogs can aid discovery, navigation, exploration, and interpretation of heterogeneous data lakes, and can simplify interpretation, lift data quality, and simplify integration of multiple data sets. We present an application of these principles in a data architecture for the Law Enforcement domain in Australia.


  1. 1.
    Laney D (2001) 3D data management: controlling data volume, velocity and variety. META Group Inc, Stamford, ConnecticutGoogle Scholar
  2. 2.
    NewVantage Partners LLC (2016) Big Data executive survey 2016. NewVantage Partners, Boston, MAGoogle Scholar
  3. 3.
    Dayley A, Logan D (2015) Organizations will need to tackle three challenges to curb unstructured data glut and neglect. Gartner report G00275931. Updated Jan 2017Google Scholar
  4. 4.
    Marz N, Warren J (2013) Big Data: principles and best practices of scalable realtime data systems. Manning Publications, Manning, New YorkGoogle Scholar
  5. 5.
    Russom P (2017) Data lakes: purposes, practices, patterns, and platforms. Technical report, TDWIGoogle Scholar
  6. 6.
    D2D CRC (2016) Big Data reference architecture, vol 1–4. Data to Decisions Cooperative Research Centre, AdelaideGoogle Scholar
  7. 7.
    Stumptner M, Mayer W, Grossmann G, Liu J, Li W, Casanovas P, De Koker L, Mendelson D, Watts D, Bainbridge B (2016) An architecture for establishing legal semantic workflows in the context of Integrated Law Enforcement. In: Proceedings of the third workshop on legal knowledge and the semantic web (LK&SW-2016). Co-located with EKAW-2016, ArXivGoogle Scholar
  8. 8.
    Mayer W, Stumptner M, Casanovas P, de Koker L (2017) Towards a linked information architecture for integrated law enforcement. In: Proceedings of the workshop on linked democracy: artificial intelligence for democratic innovation (LINKDEM 2017), vol 1897. Co-located with the 26th international joint conference on artificial intelligence (IJCAI 2017), CEURGoogle Scholar
  9. 9.
    Lebo T, Sahoo S, McGuinness D, Belhajjame K, Cheney J, Corsar D, Garijo D, Soiland-Reyes S, Zednik S, Zhao J (2013) PROV-O: the PROV ontology. W3C on-line, Last accessed 15 Mar 2018
  10. 10.
    Bellahsene Z, Bonifati A, Rahm E (2011) Schema matching and mapping. Springer, Berlin, HeidelbergGoogle Scholar
  11. 11.
    Del Corro L, Gemulla R (2013) ClausIE: clause-based open information extraction. In: Proceedings of WWW. ACM New York, NY, USAGoogle Scholar
  12. 12.
    Beheshti S-M-R, Tabebordbar A, Benatallah B, Nouri R (2017) On automating basic data curation tasks. In: Proceedings of WWW. ACM, Geneva, Switzerland. pp 165–169Google Scholar
  13. 13.
    Sun Y-JJ, Barukh MC, Benatallah B, Beheshti S-M-R (2015) Scalable SaaS-based process customization with CaseWalls. In: Proceedings of ICSOC. LNCS, vol 9435. Springer, Berlin, Heidelberg. pp 218–233CrossRefGoogle Scholar
  14. 14.
    Drogemuller A, Cunningham A, Walsh J, Ross W, Thomas B (2017) VRige: exploring social network interactions in immersive virtual environments. In: Proceedings of the international symposium on big data visual analytics (BDVA). IEEE NJ, USAGoogle Scholar
  15. 15.
    Bastiras J, Thomas BH, Walsh JA, Baumeister J (2017) Combining virtual reality and narrative visualisation to persuade. In: Proceedings of the international symposium on big data visual analytics (BDVA). IEEE NJ, USAGoogle Scholar
  16. 16.
    Kurtev I, Jouault F, Allilaire F, Bezivin J (2008) ATL: a model transformation tool. Sci Comput Program 72(1):31–39MathSciNetzbMATHGoogle Scholar
  17. 17.
    Polack F, Kolovos DS, Paige RF (2008) The Epsilon transformation language. In: Proceedings of ICMT. LNCS, vol 5063. Springer, Berlin, HeidelbergGoogle Scholar
  18. 18.
    Shvaiko P, Euzenat J (2013) Ontology matching. Springer, Berlin, HeidelbergGoogle Scholar
  19. 19.
    Szekely P, Knoblock CA, Yang F, Zhu X, Fink EE, Allen R, Goodlander G (2013) Connecting the Smithsonian American Art Museum to the linked data cloud. In: Proceedings of ESWCGoogle Scholar
  20. 20.
    Russom P (2016) Best practices for data lake management. Technical report, TDWIGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Wolfgang Mayer
    • 1
  • Georg Grossmann
    • 1
  • Matt Selway
    • 1
  • Jan Stanek
    • 1
  • Markus Stumptner
    • 1
  1. 1.University of South AustraliaMawson LakesAustralia

Personalised recommendations