Advertisement

A Best of Both Worlds Approach to Complex, Efficient, Time Series Data Delivery

  • Benjamin Leighton
  • Simon J. D. Cox
  • Nicholas J. Car
  • Matthew P. Stenson
  • Jamie Vleeshouwer
  • Jonathan Hodge
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 448)

Abstract

Point time series are a key data-type for the description of real or modelled environmental phenomena. Delivering this data in useful ways can be challenging when the data volume is large, when computational work (such as aggregation, subsetting, or re-sampling) needs to be performed, or when complex metadata is needed to place data in context for understanding. Some aspects of these problems are especially relevant to the environmental domain: large sensor networks measuring continuous environmental phenomena sampling frequently over long periods of time generate very large datasets, and rich metadata is often required to understand the context of observations. Nevertheless, timeseries data, and most of these challenges, are prevalent beyond the environmental domain, for example in financial and industrial domains.

A review of recent technologies illustrates an emerging trend toward high performance, lightweight, databases specialized for time series data. These databases tend to have non-existent or minimalistic formal metadata capacities. In contrast, the environmental domain boasts standards such as the Sensor Observation Service (SOS) that have mature and comprehensive metadata models but existing implementations have had problems with slow performance.

In this paper we describe our hybrid approach to achieve efficient delivery of large time series datasets with complex metadata. We use three subsystems within a single system-of-systems: a proxy (Python), an efficient time series database (InfluxDB) and a SOS implementation (52 North SOS). Together these present a regular SOS interface. The proxy processes standard SOS queries and issues them to the either 52 North SOS or to InfluxDB for processing. Responses are returned directly from 52 North SOS or indirectly from InfluxDB via Python proxy where they are processed into WaterML. This enables the scalability and performance advantages of the time series database to be married with the sophisticated metadata handling of SOS. Testing indicates that a recent version of 52 North SOS configured with a Postgres/PostGIS database performs well but an implementation incorporating InfluxDB and 52 North SOS in a hybrid architecture performs approximately 12 times faster.

Keywords

time series timeseries SOS OGC sensor database 

References

  1. 1.
    Hey, A.J.G., Trefethen, A.E.: The data deluge: An e-science perspective. Wiley Sons (2003)Google Scholar
  2. 2.
    Portal.opengeospatial.org: OGC Sensor Observation Service Interface Standard (2014), https://portal.opengeospatial.org/files/?artifact_id=47599
  3. 3.
    Cox, S.: Geographic information: observations and measurements. Doc. OGC (2010)Google Scholar
  4. 4.
    Portal.opengeospatial.org: OGCSensorML: Model and XML Encoding Standard (2014), https://portal.opengeospatial.org/files/?artifact_id=55939
  5. 5.
  6. 6.
    Malewski, C., Simonis, I., Terhorst, A., Bröring, A.: StarFL–a modularised metadata language for sensor descriptions. Int. J. Digit. Earth 7, 450–469 (2014)CrossRefGoogle Scholar
  7. 7.
    Opentsdb.net: OpenTSDB - A Distributed, Scalable Monitoring System (2014), http://opentsdb.net/
  8. 8.
    Square.github.io: Cube (2014), http://square.github.io/cube/
  9. 9.
    Code.google.com: kairosdb - Fast scalable time series database - Google Project Hosting (2014), https://code.google.com/p/kairosdb/
  10. 10.
    Mongodb.org: MongoDB (2014), http://www.mongodb.org/
  11. 11.
    Influxdb.com: InfluxDB - Open Source Time Series, Metrics, and Analytics Database (2014), http://influxdb.com/
  12. 12.
    Postgresql.org: PostgreSQL: The world’s most advanced open source database (2014), http://www.postgresql.org/
  13. 13.
    Hollmann, C.: 52 North SOS 4.1 (2014), http://blog.52north.org/2014/09/02/52north-sos-4-1/
  14. 14.
    The Apache Cassandra Project, http://cassandra.apache.org/
  15. 15.
    Aaron Cois, C.: Large-Scale Data Collection and Real-Time Analytics Using Redis - O’Reilly Radar (2014), http://radar.oreilly.com/2013/03/large-scale-data-collection-and-real-time-analytics-using-redis.html
  16. 16.
  17. 17.
    Metadata — OpenTSDB 2.0 documentation, http://opentsdb.net/docs/build/html/user_guide/metadata.html
  18. 18.
    PushingData - kairosdb - Pushing data into KairosDB - Fast scalable time series database - Google Project Hosting, https://code.google.com/p/kairosdb/wiki/PushingData
  19. 19.
    Haak, L.L., Baker, D., Ginther, D.K., Gordon, G.J., Probus, M.A., Kannankutty, N., Weinberg, B.A.: Standards and infrastructure for innovation data exchange. Sci. 338, 196 (2012)CrossRefGoogle Scholar
  20. 20.
    Hendler, J.: Science and the semantic web. Science 299(80), 520 (2003)CrossRefGoogle Scholar
  21. 21.
    Tan, F.: SOS 2.0 Performance Test (2013), https://www.seegrid.csiro.au/wiki/SISS4BoM/SOS2PerformanceTest
  22. 22.
    Fwd: ODIP-3 Prototype SOS - Google Groups, https://groups.google.com/forum/#!searchin/ioostech_dev/geoff/ioostech_dev/ThkMPTsrEdA/Sv9_iGib1DAJGoogle Scholar
  23. 23.
  24. 24.
  25. 25.
    Broring, A., Foerster, T., Jirka, S.: Interaction patterns for bridging the gap between sensor networks and the Sensor Web. In: 2010 8th IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), pp. 732–737. IEEE (2010)Google Scholar
  26. 26.
    Golodoniuc, P.: ThinSOS<SISS4BoM<SEEGrid (2013), https://www.seegrid.csiro.au/wiki/SISS4BoM/ThinSOS

Copyright information

© IFIP International Federation for Information Processing 2015

Authors and Affiliations

  • Benjamin Leighton
    • 1
    • 2
  • Simon J. D. Cox
    • 1
  • Nicholas J. Car
    • 1
    • 2
  • Matthew P. Stenson
    • 1
    • 2
  • Jamie Vleeshouwer
    • 1
    • 2
  • Jonathan Hodge
    • 3
  1. 1.Land & Water Flagship: CSIROMelbourneAustralia
  2. 2.Land & Water Flagship: CSIROBrisbaneAustralia
  3. 3.Oceans and Atmosphere Flagship: CSIROBrisbaneAustralia

Personalised recommendations