Supporting Sustainable Publishing and Consuming of Live Linked Time Series Streams

Rojas Melendez, Julian Andres; Sedrakyan, Gayane; Colpaert, Pieter; Vander Sande, Miel; Verborgh, Ruben

doi:10.1007/978-3-319-98192-5_28

Julian Andres Rojas Melendez²⁶,
Gayane Sedrakyan²⁶,
Pieter Colpaert²⁶,
Miel Vander Sande²⁶ &
…
Ruben Verborgh²⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11155))

Included in the following conference series:

European Semantic Web Conference

1927 Accesses
2 Citations

Abstract

The road to publishing public streaming data on the Web is paved with trade-offs that determine its viability. The cost of unrestricted query answering on top of data streams, may not be affordable for all data publishers. Therefore, public streams need to be funded in a sustainable fashion to remain online. In this paper we present an overview of possible query answering features for live time series in the form of multidimensional interfaces. For example, from a live parking availability data stream, pre-calculated time constrained statistical indicators or geographically classified data can be provided to clients on demand. Furthermore, we demonstrate the initial developments of a Linked Time Series server that supports such features through an extensible modular architecture. Benchmarking the costs associated to each of these features allows to weigh the trade-offs inherent to publishing live time series and establishes the foundations to create a decentralized and sustainable ecosystem for live data streams on the Web.

You have full access to this open access chapter, Download conference paper PDF

Ronda: Real-Time Data Provision, Processing and Publication for Open Data

Holistic Analytics of Sensor Data from Renewable Energy Sources: A Vision Paper

TorqueDB: Distributed Querying of Time-Series Data from Edge-local Storage

Keywords

1 Introduction

The development of Internet of Things technologies has fostered the creation of live data streams in multiple domains. Specifically in the public domain, examples of such data streams can be found as sensor observations about air quality, noise level, street occupancy, vacant parking spaces, temperature, river water level, wind speed, state of public lighting systems, traffic lights, among others. Furthermore, in Europe thanks to the European Public Sector Information directive^{Footnote 1}, public authorities are required to publish such data in an open fashion on the Web. This raises new challenges for data publishers as they cannot anticipate the amount of users or type of queries required on the Web, and might not be able to afford expensive infrastructures, required to maintain availability and scalability.

Studying the trade-offs introduced by Linked Data Fragments [5] to publish data streams on the Web, helps to understand possible ways to reduce server costs by transferring query answering related tasks to clients. This requires clients to implement the logic to answer a given query, increasing their complexity and the time required to process the data on the client side. Anticipating this, data publishers may provide multidimensional interfaces [3] containing pre-processed data, relevant for answering common queries and offer them as a service that could benefit clients by reducing query response times and implementation complexity, without limiting query flexibility. The type of interfaces to offer depend directly on the type of data and the related use-cases. Such an approach may help to create revenue sources for data publishers while contributing to the sustainability of the Open Data streams on the Web.

In this demo paper we present an overview of possible multidimensional interfaces, containing query answering features, that may be implemented and offered as a service on different Open Stream data use cases. Furthermore, we introduce the initial developments of a Live Time Series server that support the creation of such interfaces through an extensible modular architecture.

2 Related Work

rdf Stream Processing (rsp) [1] defines a framework for continuous query answering over data streams. rsp engines can take into account one or more rdf streams to answer queries which results will be computed at several time instants to consider new available data on the streams. Triple Pattern Fragments Query Streamer (tpf-qs) [4] was introduced as an alternative to server-side rsp engines. with the goal of making rdf stream server-side publishing possible at a low cost, with a client-side rsp engine. In this approach, several time-annotation techniques were investigated, of which annotation using named graphs caused the least overhead. The results however showed that this approach has scalability issues when querying historic data.

The work presented in [2] raises the fundamental question of the sustainability of the Web of data and introduces a marketplace for federated query answering, giving clients the option to decide from which sources do they want to retrieve the data needed to answer a certain query and who will process the data to obtain that answer. The cost of the answer(s) of a given query can be derived from the cost of hosting the related data. However determining what is the cost of computing such answer(s) is still an open issue. In this direction, benchmarking mechanisms could be used to determine it in terms of computational costs. For instance, the solution proposed in the hobbit project which provides a generic platform for benchmarking question/answering processes centered around the challenges of data heterogeneity and scalability could be used to determine the costs associated to answering a certain query.

3 Multidimensional Interfaces

Multidimensional Interfaces [3] were introduced for generically fragmenting data with a specific order and publishing these fragments in an interface-level index. These interfaces can make multidimensional ordinal data automatically discoverable and consumable by clients using hypermedia controls. The goal of these interfaces is to raise the server expressivity while maintaining low server costs. A vocabulary^{Footnote 2} to formally describe multidimensional interfaces was introduced. It defines the concepts of Range Fragments and Range Gates. A Range Fragment is a Linked Data Fragment that specifies an ordinal interval for a predefined fragmentation strategy. A Range Gate is a Linked Data interface which exposes a set of Range Fragments. Using these concepts it is possible to define different fragmentation strategies that can be exposed as multidimensional interfaces.

In a general sense, for live time series originated from sensor observations it is possible to define ranges as follows:

Time Ranges.:: Time constrained intervals can be used to create Range Fragments or summaries that compute statistical variables. For example to expose average values of measurements at hour, day, week, month and year level.
Geospatial Ranges.:: Sensors locations can be used to create Range Fragments that comprises predefined geographical areas. For example, street occupation can be given on a neighborhood, city or country level.

Depending on the type of data and the specific use case other type of fragmentations and even combinations of them can be further defined.

4 Live Time Series Server

The Live Time Series Server is an ongoing implementation that aims on providing a cost efficient interface for Open Stream data publishing. Through an extensible modular architecture we allow data publishers to define multidimensional interfaces to provide query answering functionalities on top of their data. The code is available in a Github repository^{Footnote 3}, along with the instructions of how to test it.

As shown in Fig. 1, the server is composed by three main modules:

Data Event Manager.:: This module receives RDF stream updates and fires an event to notify the availability of new data.
Communications Manager.:: Handle the communication between the Multidimensional Interfaces and the clients. It can expose the data as Range Fragments, created by each interface through http endpoints or by Websocket channels for publish/subscribe communication.
Multidimensional Interfaces.:: The interfaces expose the data stream according to its predefined logic. Each interface subscribes to a data event with the Data Event Manager and performs a new calculation with each update with the exception of the Raw Interface, which exposes the data as it is received. The data can be exposed as Range Fragments through http or pushed to subscribed clients through Websocket channels.

5 Conclusion and Future Work

We introduced Live Time Series server which provides a Linked Data fragments based interface for publishing live time series on the Web. By integrating the concept of multidimensional interfaces data publishers can define modules that perform predefined calculations over the data that suit a given use case. This increases the expressivity of the server while keeping its costs low. It also reduces clients implementation complexity and data processing time to obtain a query answer. Allowing features to be turned off and on on the time series server, helps data owners to define what features they want to support and make the trade-off with their budget. Data reusers may still implement some of these features as a third party, yet then a revenue model should be thought of [2].

In future work we plan to extend this approach by defining a mechanism that allows to calculate computational cost of multidimensional interfaces through benchmarking processes, in order to help determining their economical cost. Integrating mapping capabilities in order to work with non RDF data streams constitutes yet another future work line.

Notes

References

Dell’Aglio, D., Della Valle, E., van Harmelen, F.: Stream reasoning: a survey and outlook. 1–2, 59–83 (2017). https://doi.org/10.3233/DS-170006
Grubenmann, T., Dell’Aglio, D., Bernstein, A., Moor, D., Seuken, S.: Decentralizing the semantic web: who will pay to realize it? In: ISWC2017 Workshop on Decentralizing the Semantic Web, October 2017
Google Scholar
Taelman, R., Colpaert, P., Verborgh, R., Mannens, E.: Multidimensional interfaces for selecting data within ordinal ranges. In: Proceedings of the 7th International Workshop on Consuming Linked Data, October 2016
Google Scholar
Taelman, R., Verborgh, R., Colpaert, P., Mannens, E.: Continuous client-side query evaluation over dynamic linked data. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 273–289. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_44
Chapter Google Scholar
Verborgh, R.: Triple pattern fragments: a low-cost knowledge graph interface for the web. J. Web Semant. 37–38, 184–206 (2016). https://doi.org/10.1016/j.websem.2016.03.003
Article Google Scholar

Download references

Acknowledgements

This work has been supported by HOBBIT H2020 project (GA no. 688227) and by the Smart Flanders Programme (https://smart.flanders.be).

Author information

Authors and Affiliations

IDLab, Department of Electronics and Information Systems, Ghent University - IMEC, Ghent, Belgium
Julian Andres Rojas Melendez, Gayane Sedrakyan, Pieter Colpaert, Miel Vander Sande & Ruben Verborgh

Authors

Julian Andres Rojas Melendez
View author publications
You can also search for this author in PubMed Google Scholar
Gayane Sedrakyan
View author publications
You can also search for this author in PubMed Google Scholar
Pieter Colpaert
View author publications
You can also search for this author in PubMed Google Scholar
Miel Vander Sande
View author publications
You can also search for this author in PubMed Google Scholar
Ruben Verborgh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julian Andres Rojas Melendez .

Editor information

Editors and Affiliations

University of Bologna, Bologna, Italy
Aldo Gangemi
IBM Research - Almaden, San Jose, CA, USA
Anna Lisa Gentile
CNR-ISTC, Rome, Italy
Andrea Giovanni Nuzzolese
Technische Universität Dresden, Dresden, Germany
Sebastian Rudolph
Karlsruhe Institute of Technology, Karlsruhe, Germany
Maria Maleshkova
University of Mannheim, Mannheim, Germany
Heiko Paulheim
University of Aberdeen, Aberdeen, UK
Jeff Z Pan
CNR-ISTC, Rome, Italy
Mehwish Alam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rojas Melendez, J.A., Sedrakyan, G., Colpaert, P., Vander Sande, M., Verborgh, R. (2018). Supporting Sustainable Publishing and Consuming of Live Linked Time Series Streams. In: Gangemi, A., et al. The Semantic Web: ESWC 2018 Satellite Events. ESWC 2018. Lecture Notes in Computer Science(), vol 11155. Springer, Cham. https://doi.org/10.1007/978-3-319-98192-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-98192-5_28
Published: 02 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98191-8
Online ISBN: 978-3-319-98192-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Supporting Sustainable Publishing and Consuming of Live Linked Time Series Streams

Abstract

Similar content being viewed by others

Ronda: Real-Time Data Provision, Processing and Publication for Open Data

Holistic Analytics of Sensor Data from Renewable Energy Sources: A Vision Paper

TorqueDB: Distributed Querying of Time-Series Data from Edge-local Storage

Keywords

1 Introduction

2 Related Work

3 Multidimensional Interfaces

4 Live Time Series Server

5 Conclusion and Future Work

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Supporting Sustainable Publishing and Consuming of Live Linked Time Series Streams

Abstract

Similar content being viewed by others

Ronda: Real-Time Data Provision, Processing and Publication for Open Data

Holistic Analytics of Sensor Data from Renewable Energy Sources: A Vision Paper

TorqueDB: Distributed Querying of Time-Series Data from Edge-local Storage

Keywords

1 Introduction

2 Related Work

3 Multidimensional Interfaces

4 Live Time Series Server

5 Conclusion and Future Work

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation