Journal of Intelligent Information Systems

, Volume 51, Issue 2, pp 367–388 | Cite as

REMI: A framework of reusable elements for mining heterogeneous data with missing information

A Tale of Congestion in Two Smart Cities
  • Avigdor GalEmail author
  • Dimitrios Gunopulos
  • Nikolaos Panagiotou
  • Nicolo Rivetti
  • Arik Senderovich
  • Nikolas Zygouras


Applications targeting smart cities tackle common challenges, however solutions are seldom portable from one city to another due to the heterogeneity of smart city ecosystems. A major obstacle involves the differences in the levels of available information. In this work, we present REMI, which is a mining framework that handles varying degrees of information availability by providing a meta-solution to missing data. The framework core concept is the REMI layered stack architecture, offering two complementary approaches to dealing with missing information, namely data enrichment (DARE) and graceful degradation (GRADE). DARE aims at inference of missing information levels, while GRADE attempts to mine the patterns using only the existing data.We show that REMI provides multiple ways for re-usability, while being fault tolerant and enabling incremental development. One may apply the architecture to different problem instantiations within the same domain, or deploy it across various domains. Furthermore, we introduce the other three components of the REMI framework backing the layered stack. To support decision making in this framework, we show a mapping of REMI into an optimization problem (OTP) that balances the trade-off between three costs: inaccuracies in inference of missing data (DARE), errors when using less information (GRADE), and gathering of additional data. Further, we provide an experimental evaluation of REMI using real-world transportation data coming from two European smart cities, namely Dublin and Warsaw.


Reusable elements Missing information Mining Complex patterns Enrichment Graceful degradation 



This project received funding from the European Union Horizon 2020 Programme (Horizon2020/2014-2020), under grant agreement 688380.


  1. Artikis, A., Weidlich, M., Schnitzler, F., Boutsis, I., Liebig, T., Piatkowski, N., Bockermann, C., Morik, K., Kalogeraki, V., Marecek, J., Gal, A., Mannor, S., Kinane, D., Gunopulos, D. (2014). Heterogeneous stream processing and crowdsourcing for urban traffic management. EDBT, 14, 712–723.Google Scholar
  2. Bockermann, C, & Blom, H. (2012). The streams framework. Technical Report 5. TU Dortmund University, 12.Google Scholar
  3. Cao, X., Cong, G., Jensen, C.S. (2010). Mining significant semantic locations from GPS data. Proceedings of the VLDB Endowment, 3(1-2), 1009–1020.CrossRefGoogle Scholar
  4. Chen, C., Lu, C., Huang, Q., Yang, Q., Gunopulos, D., Guibas, L.J. (2016). City-scale map creation and updating using GPS collections. In KDD, pages 1465–1474. ACM.Google Scholar
  5. Chen, M., Mao, S., Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), 171–209.CrossRefGoogle Scholar
  6. Cole, T.A., Wanik, D.W., Molthan, A.L., Roman, M.O., Griffin, R.E. (2017). Synergistic use of nighttime satellite data, electric utility infrastructure, and ambient population to improve power outage detections in urban areas. Remote Sensing, 9(3), 286.CrossRefGoogle Scholar
  7. Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L. (2015). A multi-view learning approach to the discovery of deviant process instances, pp. 146–165.Google Scholar
  8. Deb, K. (2014). Multi-objective optimization. In: Search methodologies, pages 403–449. Springer.Google Scholar
  9. Docker. Inc. Docker.
  10. Gal, A., Mandelbaum, A., Schnitzler, F., Senderovich, A., Weidlich, M. (2017). Traveling time prediction in scheduled transportation with journey segments. Information Systems, 64, 266–280.CrossRefGoogle Scholar
  11. Lee, C.-H., Birch, D., Wu, C., Silva, D., Tsinalis, O., Li, Y., Yan, S., Ghanem, M., Guo, Y. (2013). Building a generic platform for big sensor data application. In: BigData Conference, pages 94–102. IEEE.Google Scholar
  12. Mihalkova, L., Huynh, T., Mooney, R.J. (2007). Mapping and revising markov logic networks for transfer learning. In: Proceedings of the 22nd national conference on artificial intelligence. AAAI’07, pp. 608–614.Google Scholar
  13. OpenStreetMap Foundation. OpenStreetMap.
  14. Pinelli, F., Hou, A., Calabrese, F., Nanni, M., Zegras, C., Ratti, C. (2009). Space and time-dependant bus accessibility: A case study in rome. In: 2009 12th international IEEE conference on intelligent transportation systems, pp. 1–6.Google Scholar
  15. Pratt, L.Y. (1993). Discriminability-based transfer between neural networks. In: Advances in Neural Information Processing Systems 5, [NIPS Conference], pp. 204–211.Google Scholar
  16. Rogers, S., Langley, P., Wilson, C. (1999). Mining GPS data to augment road models. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 104–113. ACM.Google Scholar
  17. Schieferdecker, I., Tcholtchev, N., Lämmel, P. (2016). Urban data platforms: An overview. In Proceedings of the 12th international symposium on open collaboration companion, OpenSym ’16, pages 14:1–14:4. New York: ACM.Google Scholar
  18. Schnitzler, F., Liebig, T., Marmor, S., Souto, G., Bothe, S., Stange, H. (2014). Heterogeneous stream processing for disaster detection and alarming. In: BigData Conference, pages 914–923. IEEE.Google Scholar
  19. Thakur, G.S., Bhaduri, B.L., Piburn, J.O., Sims, K.M., Stewart, R.N., Urban, M.L. (2015). PlanetSense: a real-time streaming and spatio-temporal analytics platform for gathering geo-spatial intelligence from open source data. In: SIGSPATIAL/GIS, pages 11:1–11:4. ACM.Google Scholar
  20. The Apache Software Foundation. Apache Flink.
  21. Xu, C., Tao, D., Xu, C. (2013). A survey on multi-view learning. CoRR.Google Scholar
  22. Zhang, D., Zhao, J., Zhang, F., He, T., Lee, H., Son, S.H. (2016). Heterogeneous model integration for multi-source urban infrastructure data. ACM Trans. Cyber-Phys. Syst., 1(1), 4,1–4,26.CrossRefGoogle Scholar
  23. Zheng, Y., Zhang, L., Xie, X., Ma, W.-Y. (2009). Mining interesting locations and travel sequences from GPS trajectories. In: Proceedings of the 18th international conference on World wide web, pages 791–800. ACM.Google Scholar
  24. Zygouras, N., & Gunopulos, D. (2017). Discovering corridors from gps trajectories. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL’17, pages 61:1–61:4. New York: ACM.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Technion – Israel Institute of Technology, Technion CityHaifaIsrael
  2. 2.University of AthensAthensGreece
  3. 3.University of TorontoTorontoCanada

Personalised recommendations