Improvement of the LPWAN AMI backhaul’s latency thanks to reinforcement learning algorithms
 135 Downloads
Abstract
Low power wide area networks (LPWANs) have been recently deployed for longrange machinetomachine (M2M) communications. These networks have been proposed for many applications and in particular for the communications of the advanced metering infrastructure (AMI) backhaul of the smart grid. However, they rely on simple access schemes that may suffer from important latency, which is one of the main performance indicators in smart grid communications. In this article, we apply reinforcement learning (RL) algorithms to reduce the latency of AMI communications in LPWANs. For that purpose, we first study the collision probability in an unslotted ALOHAbased LPWAN AMI backhaul which uses the LoRaWAN acknowledgement procedure. Then, we analyse the effect of collisions on the latency for different frequency access schemes. We finally show that RL algorithms can be used for the purpose of frequency selection in these networks and reduce the latency of the AMI backhaul in LPWANs. Numerical results show that noncoordinated algorithms featuring a very low complexity reduce the collision probability by 14% and the mean latency by 40%.
Keywords
Advanced metering infrastructure (AMI) Low power wide area networks (LPWANs) Internet of Things (IoT) LoRaWAN ALOHA Smart grid communicationsAbbreviations
 3GPP
3rd Generation Partnership Project
 AMI
Advanced metering infrastructure
 CR
Cognitive radio
 DA
Distribution automation
 DER
Distributed energy resources
 DSA
Dynamic spectrum acces
 GPRS
General packet radio service
 IoT
Internet of Things
 LPWAN
Low power wide area network
 M2M
Machinetomachine
 MAB
Multiarmed bandit
 NBIoT
NarrowBandInternet of things
 PLC
Power line communications
 RL
Reinforcement learning
 SF
Spreading factor
 TS
Thompson sampling
 UCB
Upper confidence bound
 UNB
Ultra narrow band
1 Introduction
The increasing development of renewable energy production and the high cost associated with power failures have been driving electricity operators towards the development of new functions enabling the realtime management of the electrical grid. Thanks to these improvements, traditional electrical grids have morphed progressively into the socalled smart grids.
The transformation of the electrical grid into the smart grid mainly impacts the distribution grid. Three functions are necessary to manage the smart distribution grid: the advanced metering infrastructure (AMI), the distribution automation (DA) and the management of distributed energy resources (DER) [1]. Furthermore, the management of a smart grid relies on a network of smart sensors and actuators deployed all along the grid. One of the main roles of these devices is to provide an overall view of the state of the grid, in a way that must be as continuous as possible. This cannot be done without an efficient communication system. Each of the functions developed for the management of the grid has its own constraints in term of throughput, latency, security and reliability [2]. As a consequence, the design of an efficient smart grid communication infrastructure is one of the key challenges in the smart grid deployment.
In AMI, smart meters measure and report the electricity consumption to a control center. The information received by the control center is then used to manage both the electricity production and consumption. In particular, the control center is in charge of computing the new electricity price which is applied to consumers.
Many communication standards and protocols are envisioned for the smart grid and in particular for AMI communications [3], and the use of both wired and wireless technologies have been investigated. AMI communications are done through two networks: the neighbourhood network, which links smart meters and local aggregators, and the AMI backhaul linking aggregators and the control center [4]. As an example, in France, powerline communications (PLCs) are used for the neighbourhood network and the General Packet Radio Service (GPRS) network is used for the AMI backhaul [5].
Besides, LPWANs rely on wireless telecommunication standards recently designed to handle a large number of longrange uplink communications and have been identifed as potential networks for AMI communications [6, 7]. In these networks, a large number of low power enddevices send short packets to a base station or gateway. Moreover, in LPWANs, the band is divided in narrowband channels, which are continuously monitored by the base station in order to collect all the uplink packets sent by enddevices.
A wide range of LPWAN standards have recently been proposed [8]. These standards can be sorted in two categories. On the one hand, there are slotted protocols such as the NarrowBand IoT (NBIoT) standard [9], designed by the 3rd Generation Partnership Project (3GPP) and the Weightless standard [10]. On the other hand, there are unslotted (or pure) ALOHAbased protocols [11] such as the LoRaWAN standard [12] and the protocol used by Sigfox^{1}, which is based on ultra narrow band (UNB) [13] communications. In these unslotted protocols, the signalling is reduced so as to mitigate the enddevice energy consumption, and transmissions are asynchronous and eventdriven. Moreover, in some of these LPWANs, an acknowledgement is used to avoid unnecessary retransmissions. Furthermore, in order to limit the impact of the acknowledgement on the enddevice’s energy consumption, the receive window, during which the enddevice waits for an acknowledgement, is shortened. In order to do that, the acknowledgement is always sent at the same time. In other words, the receive delay between the end of the uplink packet, and the transmission of the acknowledgement in the downlink is constant. Thanks to this simple mechanism, the enddevice is able to sense the channel during a very short time and detect the preamble of the downlink packet. This solution is used in the LoRaWAN standard [12, 14]. More precisely, in this standard, and that is the case in several regions (e.g. Europe, China) [15], an acknowledgement is sent into the channel being used for the uplink transmission 1 s after the end of the reception of the uplink packet by the base station [12].
In [16], the authors analyse the capacity limits of LoRaWAN in an AMI scenario. In the present article, we also consider a LoRaWAN AMI backhaul but we focus our analysis on latency. LPWANs operate in unlicensed bands which are shared by many enddevices which use different standards and have different behaviours or capabilities, which depend on their manufacturers and on the requirements of their applications (temperature sensing, smart grid monitoring, etc.). These heterogeneous devices can create a heavy traffic which is unevenly distributed in channels. This causes packet collisions and improves, consequently, the latency which is one of the key performance indicators of AMI communications [2]. In this article, we show that reinforcement learning algorithms and more precisely multiarmed bandit (MAB) learning reduce the latency of the AMI backhaul in LPWANs.
In the first part, we propose to analyse the collision probability in a LPWAN which uses the acknowledgement mechanism of the LoRaWAN standard and the effect of collisions on the latency for several access schemes. The analysis of collisions in ALOHA networks is an old topic [11, 17]. However, recent LPWAN standards implement new solutions which have not been previously considered. As an example, the protocol used by Sigfox is the first time/frequency unslotted standard and its performance has been recently evaluated in [13, 18]. Moreover, one of the specificities of the LoRaWAN standard is its acknowledgement mechanism. Indeed, in this standard, an acknowledgement can be sent into the channel used for the purpose of uplink communications after a fixed receive delay. The probability of collisions and other performance indicators of the LoRaWAN standard can be evaluated using either numerical simulations or analytical derivations. Numerical simulations have been used in [19, 20] to evaluate the capacity and coverage of the LoRaWAN standard. Moreover, analytical derivations of the throughput in a LoRaWAN network have been conducted in [21, 22]. However, in these two papers, the acknowledgement is not considered. In the present article, we derive a closedform expression for the probability of collision in a LoRaWANlike network, in which uplink packets are acknowledged. To the best of the authors’ knowledge, the collision probability in a pure ALOHAbased protocol, in which the acknowledgement is sent after a fixed receive delay in the same channel, as in the LoRaWAN standard, has never been analysed in the literature.
In the second part, we will show that the channel selection in LPWANs can be modeled as a MAB problem [23] and that this problem can be solved using simple learning algorithms such as the upper confidence bound algorithm (UCB) [24] or the Thompson sampling (TS)[25]. These algorithms have already been proposed for dynamic spectrum access (DSA) [26, 27] in a cognitive radio (CR) [28] context. In [29, 30], the authors propose to use MAB learning algorithms in a timeslotted IoT network and in [31] these algorithms have been proposed for Wifi networks. In the present article, we introduce MAB learning algorithms for LPWANs and in particular for unslotted LPWANs in unlicensed bands.

We first derive closedform expressions for the probability of a successful transmission into one channel, in an LPWAN featuring a simple acknowledgement, which is similar to the one used by the LoRaWAN standard in Europe.

Then, we use these probabilities to derive the expression of the latency of AMI communications for different frequency access schemes.

Finally, we show that the channel selection in an LPWAN can be modeled as a MAB problem and that learning algorithms such as the UCB and TS algorithms can be used by aggregators and provide an efficient channel access scheme and reduce the collision probability and the latency in ALOHAbased LPWANs.
The rest of this paper is organized as follows: the system model is introduced in Section 2. The probability of a successful transmission in a channel is calculated in Section 3. The average latency of the AMI backhaul for different access schemes is analysed in Section 4. The multiarmed bandit theory and various learning algorithms are introduced in Section 5. In Section 6, numerical simulations are used to assess the performance of the UCB algorithm in the proposed LPWAN, and Section 7 concludes this paper.
2 System model
This hypothesis is valid in a LoRaWAN network, indeed, in this standard, enddevices can use several spreading factors (SF) depending on their path loss [19]. Furthermore, two packets that use different SF are orthogonal [22] and cannot collide. Consequently, if all the devices use the LoRaWAN standard, a packet sent by an enddevice only interferes with packets that use the same spreading factor. These packets have consequently the same length and a comparable received power. Please note that in the LoRaWAN standard, only 6 SF are available (the value of the SF is an integer between 7 and 12). As a consequence, a large number of enddevices use the same SF. This can cause a large number of collisions in the network.
When a base station successfully receives a packet, it waits for T_{ d } and, if the channel is free, sends the acknowledgement to the enddevice. Since the base station can analyse the presence of a packet in the channel, we can suppose that the base station has a perfect knowledge of the state of the channel (busy or free) and we can thus neglect the sensing time.
3 Probability of successful transmission
In this section, we derive two probabilities, which allow to assess the performance of a LoRaWANlike LPWAN. The first one is the probability of a successful uplink transmission. It is the probability that a packet sent by an enddevice into a channel is received by the base station, i.e. the sent packet did not collide with another packet. This probability is denoted P(su) in this section. The second probability is the probability of a successful transmission, which is the probability that the enddevice receives the acknowledgement. We denote P(sd) this probability. Please note that when a packet is sent by an enddevice, it can collide either with an acknowledgement sent by the base station or with an uplink packet sent by another enddevice.
Description of the events considered for the computation of the probability of successful transmission
Notation  Event 

su  Successful uplink: there is no collision between the uplink packet and another packet. The packet has been received by the base station. 
sd  Successful downlink: the enddevice receives the acknowledgement 
sa  Successful acknowledgement: the downlink transmission is successful, the acknowledgement is sent and does not collide with another packet. 
ca  Collision after: the uplink packet has a collision with another uplink packet sent after it. 
cb  Colision before: the uplink packet has a collision with an uplink packet sent before it or with an acknowledgement. 
cub  Collision uplink before: the uplink packet has a collision with an uplink packet sent before it. 
cd  Collision downlink: the uplink packet has a collision with an acknowledgement. 
pss  Packet successfully sent: a packet is successfully sent in the interval [−T_{ d }−T_{ m }−T_{ a };−T_{ d }−T_{ m }]. 
pb  Packet between: there are packets between the considered packet and its acknowledgement. These packets do not collide with the considered packet or prevent the transmission of the acknowledgement. 
Where P(sa) is the probability of having a successful transmission of the acknowledgement. Furthermore, P(su) and P(sd) depend on the value of T_{ d } and T_{ m }. Indeed, these two probabilities do not have the same expression if T_{ d }≤T_{ m } or where T_{ d }≥T_{ m }.
3.1 Case 1: T_{d}≤T_{m}
Where P(cb) and P(ca) are respectively the probabilities of having a collision with a packet sent before and after packet 1. As the uplink traffic follows a Poisson process, the events cb and ca are independent.
Where, P(cd) is the probability of having a collision with an acknowledgement.
We can finally compute the probability of having no collision:
Proposition 1
And P(sd) can be computed with Eq. (1):
Proposition 2
3.2 Case 2: T_{d}≥T_{m}
 Packet 2 is the last uplink packet sent before packet 1 and does not collide with a packet sent before it (this is the situation studied where T_{ d }≤T_{ m }).

Packet 2 is successfully sent in I_{ a }, and other uplink packets are transmitted between this packet and its acknowledgement but do not prevent the transmission of the acknowledgement.
Where P(pb) is the probability to have at least one packet between a given packet (e.g. packet 2) and its acknowledgement. The first term of this expression has been previously computed. If we do not have any packet between packet 2 and its acknowledgement, this packet is the last uplink packet transmitted before packet 1. We are, consequently, in the case previously studied. Strictly speaking, the expression of \(P({\text {cd}},\overline {{\text {cub}}},\overline {{\text {pb}}})\) if T_{ d }≥T_{ m } is equal to the expression of \(P({\text {cd}},\overline {{\text {cub}}})\) if T_{ d }≤T_{ m }. As a consequence, the expression of \(P({\text {cd}},\overline {{\text {cub}}},\overline {{\text {pb}}})\) is given in Eq. (4).

A packet is successfully sent in I_{ a } (packet 2 is successfully sent).

The last packet transmitted before packet 1 is sent between the packet successfully sent in I_{ a } and its acknowledgement. In other words, packet 3 is sent between packet 2 and its acknowledgement.
We finally derive P(su) from Eq. (7).
Proposition 3
Where f(λ_{ T },T_{ m },T_{ d },T_{ a }) is defined in Eq. (18).
Eq. 20 allows to derive the expression of P(sd).
Proposition 4
Where f(λ_{ T },T_{ m },T_{ d },T_{ a }) is defined in Eq. (18).
3.3 Analysis of the probability of success
Proposition 5
Proof
Which proves that \(P({\text {sd}}) \approx P_{T_{m}}({\text {sd}})\approx P_{\infty }({\text {sd}})\). This finally proves proposition 5. □
We have computed the expression of the probability of a successful transmission in a LoRaWANlike LPWAN. In the following, we analyse the latency of AMI communications in this network for different access schemes as a function of the probability of successful transmission P(su).
4 Latency in an LPWAN
 1.
The aggregator randomly selects the channel for each transmission.
 2.
The aggregator uses the channel with the highest probability of successful transmission for all its transmissions. Please note that this policy requires the aggregator to have perfect knowledge of the probability of success in the channels. We present some learning algorithms which allow to acquire this knowledge in Section 5.
4.1 Case 1: random channel selection
Where T_{ s } is the time during which the enddevice senses the channel so as to detect the preamble of the acknowledgement. This time is short in the LoRaWAN standard. Please note that, after a failed transmission, the acknowledgement is not transmitted by the base station. In that case, in the LoRaWAN standard, the device does not wait for the acknowledgement during T_{ a } but during T_{ s }, a shorter time which is long enough to detect the presence or absence of acknowledgement in the channel [14]. In the following, we will denote T_{ l }=T_{ m }+T_{ d }+T_{ s }≈T_{ m }+T_{ d }.
4.2 Case 2: best channel selection
4.3 Comparison of the two strategies
Where \(\mathbb {E}[\mathcal {L}]_{\text {rand}}\) is the expected latency with a random channel selection and \(\mathbb {E}[\mathcal {L}]_{{\text {BC}}}\) is the expected latency with a best channel selection. Eq. 36 shows that the gain in latency provided by the selection of the best channel, only depends on the difference between the inverse of the average probability of a successful transmission in the random channel selection case and the inverse of this probability in the best channel case.
The selection of the best channel requires the knowledge of the probability of collision in the channels. In the following, we introduce two reinforcement learning algorithms to acquire this knowledge.
5 Reinforcement learning algorithms in LPWAN
5.1 MAB learning
The equations derived in the previous section show that the selection of the best channel can significantly reduce the latency of AMI communications when the traffic is unevenly distributed in the channels. This can occur either if some devices use another LPWAN or base station or if all the devices do not use the same set of channels. In this section, we will show that the channel selection can be viewed as a multiarmed bandit (MAB) problem [23], which can be solved thanks to simple reinforcement learning algorithms. This modelling has already been used in dynamic spectrum access (DSA) [26, 27]. In such a scenario, spectrum sensing is used as a feedback for channel selection. However, spectrum sensing has a poor performance in LPWANs [6]. That is why we use the acknowledgement as a reward for learning. With this acknowledgement, machine learning algorithms can be used by enddevices for the purpose of channel selection.
Please note that, with the proposed MAB learning algorithms, each enddevice optimises its own energy consumption without exchanging information with other enddevices. This solution is, consequently, a noncoordinated solution. One of the main advantages of such a solution is its energy consumption. Indeed, the algorithms proposed here have a low complexity. They consume, consequently, few energy. This energy is negligible compared to the energy that would be consumed to exchange information between enddevices.
If we now consider the problem as a MAB problem, each channel is viewed as a gambling machine (bandit). All bandits lead to the same reward (a successful transmission) but with different probabilities. Indeed, P^{ j }(su) and P^{ j }(sd) change from one channel to another. We denote t the number of transmissions realised by the aggregator, where T_{ j }(t) denotes the number of selections of channel j.
In LPWAN, the reward can be provided by the acknowledgement, and an enddevice considers that the reward is 1 if the acknowledgement is received, and 0 otherwise. With this solution, the proposed algorithms do not require any extra signalling. In the studied problem, an aggregator that uses a reinforcement learning algorithm begins without any information about the probabilities of successful transmission in the N_{ c } channels. The device first explores all the channels and uses the reward to learn about the channels’ probability of successful transmission. On the basis of the acquired knowledge, the device uses more and more the channels that provided the highest reward. It improves consequently its probability of having a successful transmission. After several transmissions, the enddevice has enough knowledge to send almost all its packets into the channel featuring the highest probability of successful transmission and consequently the lowest latency.
Furthermore, two types of reinforcement learning algorithms have been proposed to solve MAB problems: frequentist algorithms where the channel is deterministically chosen on the basis of past experience, and Bayesian algorithms where the decision is drawn from a prior distribution [35]. In this paper, with no loss of generality, we analyse the performance of two algorithms, the upper confidence bound (UCB) algorithm [26] which is frequentist and the Thompson sampling (TS) algorithms [25] which is Bayesian. The main advantages of these two algorithms are their low computational complexity and their low memory requirements, which allow them to be implemented in any enddevice and in particular in aggregators.
5.2 UCB_{1} algorithm
In Eq. (41), α is the exploration coefficient. The UCB_{1} is proven to be order optimal for α> 0.5 [24] and has good performance for lower values of α> 0 [36]. The larger this coefficient is, the longer the exploration is. During the initial transmissions, the empirical mean is low compared to the bias and the aggregator explores all the channels. Progressively, the value of the bias decreases and the empirical mean becomes predominant. With this algorithm, the aggregator learns at each transmission. Once it has learned enough, it starts mostly using a single channel, the one that guarantees the higher empirical mean for the reward. Consequently, in the UCB_{1} algorithm case, and after exploration, the latency of AMI communications will be equal to the one studied in Section 4.2.
In the UCB_{1} algorithm, the computation of indexes is deterministic. It is, consequently, a frequentist algorithm. In the following section, we introduce the Thompson sampling algorithm which is a Bayesian algorithm. With this algorithm, the indexes are sampled from a random distribution.
5.3 Thompson sampling
We can see in Eq. (46) that the higher T_{ j }(t) is, the lower the variance of the distribution of B_{ j }(t). Furthermore, as shown in Eq. (45), the expectation of the index B_{ j }(t) tends towards P_{ j }(sd) when T_{ j }(t) tends to infinity.
Please note that, for each transmission, the TS algorithm only requires to compute N_{ c } values from beta distributions.
6 Numerical evaluation of MAB learning in LPWANs
In this section, we use numerical simulations to assess the performance of the MABlearning algorithms, introduced in the previous section, in an pure ALOHAbased LPWAN.
6.1 Simulation scenario
For simulations, we consider an LPWAN comprising N_{ c }=10 channels. All the devices in the network use the same SF and transmit an uplink packet during T_{ m }=0.7 s (this corresponds to SF 8 in a LoRaWAN network [19]). Moreover, we suppose that T_{ d }=1 s and T_{ a }=0.1 s. We suppose that T_{ s } is short enough to be neglected. When a device does not receive an acknowledgment, it selects a random time T_{ r } between 0 and T_{bo}=10 s. Then, it waits for T_{ r } and resends the packet. The maximum number of repetitions is equal to 5 in all this section.
In order to generate the interfering traffic, we consider a set of nonintelligent devices that use the network. Each of these devices (e.g. temperature sensors, humidity sensors or smart appliances) uses only one channel. The traffic generated by these nonintelligent devices is an interfering traffic for the AMI backhaul. In this article, we suppose that interfering enddevices and aggregators use the same standard; however, similar performance can be obtained when the interfering traffic is generated by devices using different standards. Each of these devices sends a packet following a Poisson distribution. The intensity of the Poisson process verifies λ_{ s }T_{ m }=10^{−4} for all nonintelligent devices. This intensity does not take into account the traffic generated by retransmissions. With this intensity, each device sends approximately one packet every 2 h.
We suppose that there are 1000 nonintelligent enddevices in the first channel, 900 in the second one, 800 in the third one, and so on until 100 in the tenth channel. We simulate the network made of nonintelligent devices so as to estimate the probabilities of a successful transmission in each channel. With this distribution of nonintelligent devices, these probabilities are equal to (0.45, 0.53, 0.57, 0.64, 0.70, 0.77, 0.82, 0.87, 0.92, 0.96).
We suppose that 50 aggregators that have learning capabilities begin to use the LPWAN. These aggregators have the same characteristics than those of other devices, but can use channel selection algorithms. We suppose that each aggregator transmits its packets following a Poisson process whose intensity verifies λ_{ a }T_{ m }=4×10^{−4} (on average an aggregator sends a packet every 30 min). We simulate the network during 14 days, and we analyse the evolution of the probability of a successful transmission P(sd) and that of the mean latency.
6.2 Simulation results in a LoRaWAN network
In the studied network, we evaluate the performance of several learning algorithms, we consider that either UCB_{1} or Thompson sampling algorithms are implemented in aggregators.
An increase in the probability of successful transmission is beneficial for the latency of AMI communications. As seen in Fig. 14, learning algorithms reduce by 0.8 s the latency of aggregators’ communications. This represent a 40% gain compared to the random channel selection.
We now compare the performance of the studied learning algorithms. We can see in Fig. 14 that the Thompson sampling algorithm reduces latency more quickly. This result is in line with the theoretical studies. Indeed, the Thompson sampling has been proven to converge more quickly than the UCB algorithm in case where the interfering traffic follows a Bernoulli process [35]. However, the computation of the TS indexes requires a little bit more computation than the UCB ones. It is important to note that, in the present article, the interfering traffic is generated by both the static interfering traffic and the traffic generated by other aggregators. The static interfering traffic follows a Bernoulli process. However, other aggregators also use learning algorithms and the traffic they generate is not stochastic [30]. In the simulated scenarios, the traffic generated by other aggregators is small compared to the traffic generated by static devices. The interfering traffic can, consequently, be approximated by a Bernoulli process.
Furthermore, the TS and the UCB_{1} algorithm with α = 0.3 provide similar results after 14 days of exploration. For such low value of α (i.e. below α=0.5), we do not have any theoretical proof of convergence. However, the algorithm has good performance in our simulation scenarios. On the basis of the comparison of the performances of the UCB_{1} algorithm for different values of α, we can see that the reduction of the latency is faster with a small α (e.g. α=0.3). Figure 14 shows that, in the proposed scenario, the reduction of the latency is increasingly slowly as the α coefficient increases. The analysis of the α coefficient is done here empirically. A comprehensive empirical study of the impact of the α coefficient in the MAB problem has been conducted in [37].
6.3 Extension to different packet sizes
In the previous section, we analysed the performance of MAB learning algorithms in a network in which all devices use the same standard, and in particular the same SF in a LoRaWAN network. In this section, we confirm the ability of MAB algorithms to reduce the latency of communications and we highlight the ability of the proposed algorithms to cope with different packet sizes.
In this second scenario, after 14 days of transmission, reinforcement learning algorithms provide a gain of 8 to 11% in probability of successful transmission. This reduction in the probability of successful transmission allows to reduce the average latency from 1.95 to around 1.65 s, i.e. a decrease in latency of 15%. These results show that learning algorithms can reduce the latency of communications even when the interfering traffic is generated by devices which use dissimilar packet sizes, i.e. different standards.
7 Conclusions
Unslotted ALOHAbased LPWAN standards such as LoRaWAN are perfect candidates for AMI backhaul. In this paper, we first derive closedform and analyse the probability of successful transmission in a LoRaWANlike LPWAN with acknowledgement in a channel. Then, we use these probabilities to analyse the latency in the network. Furthermore, we propose to use MAB learning algorithms as simple and efficient solutions to tackle the spectrum contention issue in unlicensed bands. We use the acknowledgement as a reward for online learning algorithms. The UCB1 and TS algorithms have a low cost in processing and energy consumption and do not require any extra signalling. Furthermore, in the studied scenario, these algorithms allow to increase by 14% the probability of successful transmission and to reduce by 40% the latency in the network. In our future work, we will either analyse other learning algorithms to tackle spectrum contention issues in IoT networks or consider a more realistic model, e.g. by considering the fading of wireless communications. We can also analyse the potential of MAB learning algorithms in different standards.
Footnotes
 1.
Sigfox is a French LPWAN operator whose network covers a large part of western Europe and is under deployment in the US. – www.sigfox.com.
Notes
Acknowledgements
The authors want to thank Lilian Besson for useful comments and discussions.
Funding
Part of this work is supported by the project SOGREEN (Smart pOwer GRid for Energy Efficient small cell Networks), which is funded by the French national research agency, under the grant agreement coded: N ANR14CE28002502 and by Région Bretagne, France.
Availability of data and materials
The matlab code used for generating our simulation results can be found at https://bitbucket.org/scee_ietr/reinforcementlearninginunslottedlpwan/.
Authors’ contributions
RB is the main author of the current paper. CM and JP contributed to the conception and design of the study and to the structuring and reviewing of the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 1.RE Brown, in Power and Energy Society General Meeting  Conversion and Delivery of Electrical Energy in the 21st Century, 2008 IEEE. Impact of Smart Grid on distribution system design (IEEE, 2008), pp. 1–4.Google Scholar
 2.U.S. Department of Energy, Communications requirements of smart grid technologies (2010).Google Scholar
 3.J Gao, Y Xiao, J Liu, W Liang, CLP Chen, A survey of communication/networking in smart grids. Futur. Gener. Comput. Syst.28(2), 391–404 (2012).CrossRefGoogle Scholar
 4.Smart grid reference architecture, CENCENELECETSI Smart Grid Coordination Group. Technical report (2012).Google Scholar
 5.X Mamo, S Mallet, T Coste, S Grenard, in 2009 IEEE Power Energy Society General Meeting. Distribution automation: the cornerstone for smart grid development strategy (IEEE, 2009), pp. 1–6.Google Scholar
 6.X Xiong, K Zheng, R Xu, W Xiang, P Chatzimisios, Low power wide area machinetomachine networks: key techniques and prototype. IEEE Commun. Mag.53(9), 64–71 (2015).CrossRefGoogle Scholar
 7.M Centenaro, L Vangelista, A Zanella, M Zorzi, Longrange communications in unlicensed bands: the rising stars in the IoT and smart city scenarios. IEEE Wirel. Commun.23:, 60–67 (2016).CrossRefGoogle Scholar
 8.U Raza, P Kulkarni, M Sooriyabandara, Low power wide area networks: an overview. IEEE Commun. Surv. Tutor. PP(99), 1–1 (2017).CrossRefGoogle Scholar
 9.R Ratasuk, B Vejlgaard, N Mangalvedhe, A Ghosh, in 2016 IEEE Wireless Communications and Networking Conference Workshops (WCNCW). NBIoT system for M2M communication (IEEE, 2016), pp. 428–432.Google Scholar
 10.W Webb, Understanding weightless: technology, equipment, and network deployment for M2M communications in white space, 1st (Cambridge University Press, New York, 2012).CrossRefGoogle Scholar
 11.N Abramson, in Proceedings of the November 1719, 1970, Fall Joint Computer Conference. AFIPS ’70 (Fall). THE ALOHA SYSTEM: Another Alternative for Computer Communications (ACMNew York, 1970), pp. 281–285.CrossRefGoogle Scholar
 12.N Sornin, M Luis, T Eirich, T Kramp, O Hersent, LoRaWAN Specification. Technical report, LoRa Alliance, Inc. (2016).Google Scholar
 13.M Anteur, V Deslandes, N Thomas, AL Beylot, in 2015 IEEE Global Communications Conference (GLOBECOM). Ultra Narrow Band Technique for Low Power Wide Area Communications (IEEE, 2015), pp. 1–6.Google Scholar
 14.Recommended SX1272 settings for EU868 LoRaWAN network operation. Technical report, Semtech (2015).Google Scholar
 15.LoRa Alliance Technical committee, LoRaWAN regional parameters. Technical report, LoRa Alliance, Inc. (2016).Google Scholar
 16.N Varsier, J Schwoerer, in Communications (ICC), 2017 IEEE International Conference On. Capacity Limits of LoRaWAN Technology for Smart Metering Applications (IEEE, 2017), pp. 1–6.Google Scholar
 17.PC Pinto, MZ Win, in MILCOM 2008  2008 IEEE Military Communications Conference. A unified analysis of connectivity and throughput in packet radio networks (IEEE, 2008), pp. 1–7.Google Scholar
 18.MT Do, C Goursaud, JM Gorce, in Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), 2014 12th International Symposium On. On the benefits of random FDMA schemes in ultra narrow band networks (IEEE, 2014), pp. 672–677.Google Scholar
 19.K Mikhaylov, J Petaejaejaervi, T Haenninen, in European Wireless 2016; 22th European Wireless Conference. Analysis of capacity and scalability of the LoRa low power wide area network technology (IEEE, 2016), pp. 1–6.Google Scholar
 20.B Vejlgaard, M Lauridsen, H Nguyen, IZ Kovacs, P Mogensen, M Sorensen, in 2017 IEEE 85th Vehicular Technology Conference (VTC Spring). Coverage and capacity analysis of Sigfox, LoRa, GPRS, and NBIoT (IEEE, 2017), pp. 1–5.Google Scholar
 21.D Magrin, M Centenaro, L Vangelista, in Communications (ICC), 2017 IEEE International Conference On. Performance Evaluation of LoRa Networks in a Smart City Scenario (IEEE, 2017), pp. 1–6.Google Scholar
 22.O Georgiou, U Raza, Low power wide area network analysis: can lora scale?IEEE Wirel. Commun. Lett.PP(99), 1–1 (2017).Google Scholar
 23.TL Lai, H Robbins, Asymptotically efficient adaptive allocation rules. Adv. Appl. Math.6(1), 4–22 (1985).MathSciNetCrossRefMATHGoogle Scholar
 24.P Auer, Using confidence bounds for exploitationexploration tradeoffs. J. Mach. Learn. Res.3:, 397–422 (2003).MathSciNetMATHGoogle Scholar
 25.S Agrawal, N Goyal, Analysis of thompson sampling for the multiarmed bandit problem. CoRR.abs/1111.1797:, 39.1–39.26 (2011).Google Scholar
 26.W Jouini, D Ernst, C Moy, J Palicot, in Communications (ICC), 2010 IEEE International Conference On. Upper Confidence Bound Based Decision Making Strategies and Dynamic Spectrum Access (IEEE, 2010), pp. 1–5.Google Scholar
 27.Q Zhao, BM Sadler, A survey of dynamic spectrum access. IEEE Signal Proc. Mag.24(3), 79–89 (2007).CrossRefGoogle Scholar
 28.J Mitola, GQ Maguire, Cognitive radio: making software radios more personal. IEEE Pers. Commun.6(4), 13–18 (1999).CrossRefGoogle Scholar
 29.R Bonnefoi, C Moy, J Palicot, in Smart Grid Communications (SmartGridComm), 2016 IEEE International Conference On. Advanced metering infrastructure backhaul reliability improvement with cognitive radio, (2016).Google Scholar
 30.R Bonnefoi, L Besson, C Moy, E Kaufmann, J Palicot, in CROWNCOM 2017. Multiarmed bandit learning in IoT networks: learning helps even in nonstationary settings (Springer, 2017).Google Scholar
 31.V Toldov, L Clavier, V Loscri, N Mitton, in 2016 IEEE 27th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC). A Thompson sampling approach to channel explorationexploitation problem in multihop cognitive radio networks (IEEE, 2016), pp. 1–6.Google Scholar
 32.C Goursaud, Y Mo, in 2016 23rd International Conference on Telecommunications (ICT). Random unslotted timefrequency ALOHA: theory and application to IoT UNB networks (IEEE, 2016), pp. 1–5.Google Scholar
 33.DJ Goodman, AAM Saleh, The near/far effect in local ALOHA radio communications. IEEE Trans. Veh. Technol.36(1), 19–27 (1987).CrossRefGoogle Scholar
 34.RG Gallager, Discrete Stochastic Processes (Springer, Berlin, 1996).CrossRefMATHGoogle Scholar
 35.E Kaufmann, O Cappe, A Garivier, in Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS12). ND Lawrence, MA Girolami, 22. On Bayesian upper confidence bounds for bandit problems (PMLR (Proceedings of Machine Learning), 2012), pp. 592–600. Journal of Machine Learning Research  Workshop and Conference Proceedings.Google Scholar
 36.L MelianGutierrez, N Modi, C Moy, F Bader, I PerezAlvarez, S Zazo, Hybrid ucbhmm: A machine learning strategy for cognitive radio in hf band. IEEE Trans. Cogn. Commun. Netw.1(3), 347–358 (2015).CrossRefGoogle Scholar
 37.N Modi, C Moy, P Mary, J Palicot, in Cognitive Radio Oriented Wireless Networks: 11th International Conference, CROWNCOM 2016, Grenoble, France, May 30  June 1, 2016, Proceedings. A New Evaluation Criteria for Learning Capability in OSA Context (IEEE, 2016).Google Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.