From Metadata Catalogs to Distributed Data Processing for Smart City Platforms and Services: A Study on the Interplay of CKAN and Hadoop

  • Robert ScholzEmail author
  • Nikolay Tcholtchev
  • Philipp Lämmel
  • Ina Schieferdecker
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 864)


Smart Cities are emerging based on the idea of provisioning and processing large amounts of urban data for various use cases. Thereby, Urban Data Platforms are usually employed to accumulate and expose the large amounts of governmental (i.e. public sector), sensor, static and real-time data in order to enable the community to create valuable applications and services for future Smart Cities. Hitherto, the Open Data initiative was seen as the key driver to providing large amounts of data within a city. Open Data platforms employ so-called data registries in order to keep track of the available datasets at various sources spread throughout the city, with CKAN currently being among the most popular data catalog software worldwide. With the emergence of frameworks for large scale distributed computing and storage, such as Hadoop and the belonging distributed file systems (HDFS), there is an inherent need for bridging the worlds of metadata catalogs and distributed data processing towards the goal of providing sophisticated urban ICT services. The current paper constitutes a first attempt on this new field, by prototyping and evaluating components that enable the collaboration and interplay between CKAN and Hadoop/HDFS. This interplay is realized through extensions to CKAN and its harvesting process and its benefits are demonstrated by belonging case studies.


Smart Cities Open Data Distributed processing Hadoop CKAN 


  1. 1.
    Scholz, R., Tcholtchev, N.,Lämmel, P., Schieferdecker, I.: A CKAN plugin for data harvesting to the Hadoop distributed file system. In: 7th International Conference on Cloud Computing and Services Science (CLOSER) (2017).
  2. 2.
    CKAN Association: CKAN Overview.
  3. 3.
    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST2010 (2010).
  4. 4.
    Helene, M.: GovData - Das Datenportal für Deutschland. In: Hill, H., Martini, M., Wagner, E. (eds.) Transparenz, Partizipation, Kollaboration: Die digitale Verwaltung neu denken, pp. 109–116. Nomos Verlagsgesellschaft mbH & Co. KG, Baden-Baden (2014)CrossRefGoogle Scholar
  5. 5.
    Bundesministerium des Innern: Nationaler Aktionsplan der Bundesregierung zur Umsetzung der Open-Data-Charta der G8. (2014)
  6. 6.
    Mercader, A., et al.: ckanext-harvest - remote harvesting extension (2012).
  7. 7.
    The Apache Software Foundation: Hadoop Project Webpage.
  8. 8.
    Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: ZooKeeper: wait-free Coordination for Internet-scale systems. In: USENIX Annual Technical Conference, Boston, MA, USA, p. 9 (2010)Google Scholar
  9. 9.
    Dittrich, J., Quiané-Ruiz, J.-A.: Efficient big data processing in Hadoop MapReduce. Proc. VLDB Endow. 5, 2014–2015 (2012). Scholar
  10. 10.
    The Apache Software Foundation: Apache Flink: Scalable Stream and Batch Data Processing.
  11. 11.
    The Apache Software Foundation: Apache Spark - Lightning-Fast Cluster Computing.
  12. 12.
    Iqbal, M., Soomro, T.: Big Data Analysis: Apache Storm Perspective (2015).
  13. 13.
    Avery, C.: Giraph: large-scale graph processing infrastructure on hadoop. Proc. Hadoop Summit. St. Cl. 11, 5–9 (2011)Google Scholar
  14. 14.
    Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a Map-Reduce framework. Proc. VLDB Endow. 2, 1626–1629 (2009). Scholar
  15. 15.
    Bittorf, M., Bobrovytsky, T., Erickson, C.C.A.C.J., Hecht, M.G.D., Kuff, M.J.I.J.L., Leblang, D.K.A., Robinson, N.L.I.P.H., Rus, D.R.S., Wanderman, J.R.D.T.S., Yoder, M.M.: Impala: A modern, open-source SQL engine for Hadoop. In: Proceedings of the 7th Biennial Conference on Innovative Data Systems Research (2015)Google Scholar
  16. 16.
    Vora, M.N.: Hadoop-HBase for large-scale data (2011).
  17. 17.
    National Strategy Office of Information and Communications Technology in Cabinet Secretariat:
  18. 18.
    Matheus, R., Vaz, J., Maia Ribeiro, M.: Open Government Data and the Data Usage for Improvement of Public Services in the Rio de Janeiro City (2014).
  19. 19.
    Socrata: Socrata - The Data Platform for 21st Century Digital Government.
  20. 20.
    Knoema: Webpage.
  21. 21.
    Senatsverwaltung für Wirtschaft, E. und B.: Offene Daten Berlin.
  22. 22.
    European Commission Directorate-General Communication: European Data Portal.
  23. 23.
    Lagoze, C., Van de Sompel, H., Nelson, M., Warner, S.: Open Archives Initiative Protocol for Metadata Harvesting (2015)Google Scholar
  24. 24.
    Open Archives Initiative: Object Reuse and Exchange Specifications and User Guides.
  25. 25.
    Marienfeld, F.: Open Government Data (OGD) - Die Metadaten-Struktur für Open Government Data in Deutschland.
  26. 26.
    Bartha, G., Kocsis, S.: Standardization of geographic data: the european inspire directive. Eur. J. Geogr. 2, 79–89 (2011)Google Scholar
  27. 27.
    Weibel, S., Kunze, J., Lagoze, C., Wolf, M.: Dublin core metadata for resource discovery (1998).
  28. 28.
    Coyle, K.: MARC21 as data: a start. Code4Lib J. 14, 1–10 (2011)Google Scholar
  29. 29.
    Liu, Xiaoming, Balakireva, Lyudmila, Hochstenbach, Patrick, Van de Sompel, Herbert: File-based storage of digital objects and constituent datastreams: XMLtapes and Internet Archive ARC files. In: Rauber, Andreas, Christodoulakis, Stavros, Tjoa, A.Min (eds.) ECDL 2005. LNCS, vol. 3652, pp. 254–265. Springer, Heidelberg (2005). Scholar
  30. 30.
    Open science and research initiative: OAI-PMH harvester for CKAN.
  31. 31.
    Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 35–40 (2010). Scholar
  32. 32.
    McGninnis, S., et al.: OpenStack Block Storage Cinder.
  33. 33., In.: Amazon Web Services S3 - Simple Cloud Storage ServiceGoogle Scholar
  34. 34.
    Watkins, N., Sevilla, M., Jimenez, I., Maltzahn, C.: Ceph: An Open-Source Software-Defined Storage StackGoogle Scholar
  35. 35.
    Dickinson, J., et al.: OpenStack Object Storage.
  36. 36.
    Nóbrega, T.: OpenStack Sahara.
  37. 37.
    Red Hat Inc.: Using Hadoop with CephFS.
  38. 38.
    Tierney, B., Kissel, E., Swany, M., Pouyoul, E.: Efficient data transfer protocols for big data (2012).
  39. 39.
    Kreps, J., Narkhede, N., Rao, J.: Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB, pp. 1–7 (2011)Google Scholar
  40. 40.
    Momjian, B.: PostgreSQL: Introduction and Concepts. Addison-Wesley, New York (2001)Google Scholar
  41. 41.
    The Apache Software Foundation: WebHDFS REST API.
  42. 42.
    Alinat, P., Pierrel, J.M.: Esprit II project 5516 Roars: robust analytic speech recognition system (1993)Google Scholar
  43. 43.
    Liu, Z., Li, H., Miao, G.: MapReduce-based Backpropagation Neural Network over large scale mobile data (2010).
  44. 44.
  45. 45.
    Klessmann, J., Denker, P., Schieferdecker, I., Schulz, S.: Open government data Deutschland. Eine Studie zu Open Government in Deutschland im Auftrag des Bundesministerium des Innern. Deutschland <Bundesrepublik>/Bundesministerium (2012)Google Scholar
  46. 46.
    Wuebker, J., Ney, H., Zens, R.: Fast and scalable decoding with language model look-ahead for phrase-based statistical machine translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 28–32. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Robert Scholz
    • 1
    Email author
  • Nikolay Tcholtchev
    • 1
  • Philipp Lämmel
    • 1
  • Ina Schieferdecker
    • 1
  1. 1.Fraunhofer Institute for Open Communication Systems (FOKUS)BerlinGermany

Personalised recommendations