Keywords

1 Introduction

Many industries, today, prefer chain topology on their networks comprised of nodes connected each other consecutively throughout both long and narrow deployment areas like railways [1], highways [2], underground mines [3, 4], as well as in some special type of wireless sensor and mesh networks [37] and backbones of telecommunication systems [8]. Similarly, it is also a well-known implementation to connect (wi-fi) routers in daisy-chain topology to provide internet access at each floor in towers or high buildings.

A major disadvantage of chain networks is to have high Average Path Length (APL) relative to network size, unlike many other types of networks owning the properties of “small-world” networks [9]. APL is generally desired to be small, and is investigated analytically and numerically in many studies related to network design and optimization [1018], social networks [15, 17, 18], computing [19], and logic design [20].

In this paper, we examine APL for chain networks, and propose an optimization model based on an additional link deployment to the network, with the objective of minimizing APL. We derive analytical formulation for APL prior to and subsequent to optimization process, as well as obtain numerical results which precisely agreed with analytical analysis.

2 Network Model

Suppose we have a chain-topology network with n nodes, containing bidirectional links between consecutive nodes. This network can be represented as a path graph, \(P_{n}\), with undirected edgesFootnote 1 as depicted in Fig. 1.

Fig. 1.
figure 1

A path graph representing a chain-topology network with n nodes

Suppose, we aim at augmenting this network further by adding a new performance enhancing link between a certain pair of nodes on the networkFootnote 2 as illustrated in Fig. 2. We should notice at this point that the augmentation process (i.e. adding a new link) is intentionally confined by just one new link in order to keep the optimization cost minimum, and that the implementation cost of such a new link can be assumed fixed regardless of the distance between connected nodes, which is true especially for the leased lines obtained from ISPs.

Fig. 2.
figure 2

Adding a new enhancement link (i.e. edge) connecting \(v_{x}\) and \(v_{y}\)

We are now ready to ask our optimization problem:

Main Problem: Which nodes should be connected to reach the objective of minimizing average path length (APL) on the network?

Not only does the proposed optimization model minimize APL, but also it improves robustness on chained networks by means of generating alternative routes, as well as reduces cost of packet transmissions.

3 Related Work

3.1 Chain Networks

Given the side effects of unbalanced energy consumptions at nodes in chain networks used in underground mines or on trains, the studies of [1, 3, 4, 6] proposed different protocols or node deployment strategies, aiming to provide balanced energy consumptions at nodes in order to increase network lifetime.

Agbinya [2] discussed a specific application of chain networks on highways, and addressed certain characteristics of the network such as interference level, coverage area and path loss; on the other hand, Zhou et al. [5], in a recent study, considered Chain-typed Wireless Sensor Networks (CWSN) deployed in coal mines, and proposed a source-aware redundant packet forwarding scheme for emergency information delivery in CWSN.

Leu and Huang [7] proposed a mathematical model that calculates the maximum throughput of a Wireless Mesh Network in chain-topology, dealing with signal interference, hidden nodes and STDMA time slots among nodes.

Flammini et al. [8] considered the construction of wireless ATM layouts for a chain of base stations, and showed that the problem studied was NP-complete for special instances, and provided optimal solutions for certain cases.

3.2 Average Path Length (APL)

Several researchers derived analytical formulation of APL for different type of networks. For instance, Kleinrock and Silvester [21] considered random graphs; Fronczak et al. [18] and Guo et al. [14] studied a large class of uncorrelated random networks with hidden variables; Zhang et al. [17] examined Apollonian networks; Peng [16] dealt with Sierpinski pentagon; Gulyás et al. [13] focused on the networks with given size and density; Chen et al. [11] investigated Barabási–Albert scale free model; Zhi-guang et al. [10] discussed belt-type networks; and Gao et al. [22] analysed Sierpinski gasket in a recent article.

In the field of logic design, Butler et al. [20] studied APL of binary decision diagrams by deriving the APL for various functions, and showed that the APL for benchmark functions is typically much smaller than for random functions.

Mao and Zhang [19] considered the computation problem of APL for large scale-free networks, and presented a dynamic programming model to solve the load-balancing problem for coarse-grained parallelization. Yen et al. [12] presented an efficient method for updating the closeness centrality of each vertex and the APL of a network, where edges change dynamically as in the case of social networks. In a recent study, Reppas et al. [15] introduced rewiring rules to tune APL on a network while keeping the degree and clustering coefficient distribution unchanged.

To the best of our knowledge, ours is the first study to propose an optimization model aiming to minimize APL for chain networks by optimal deployment of an incremental link.

4 Mathematical Model

4.1 Pure Path

Average path length, APL, of a network is an important parameter showing the efficiency of information transmission on the network, and can be calculated by finding the shortest path between all pairs of nodes, adding their lengthsFootnote 3 up, and then dividing by the total number of pairs.

To find the mathematical expression for APL of a chain network, let \(P_{n}\) be a path graph including n vertices indexed in sequence from 1 to n, like \(v_{1},v_{2},...,v_{n}\), as depicted in Fig. 1. It is obvious that the shortest path between a certain pair of nodes on a path graph is the subpath, having no alternative, between this pair of nodes. Moreover, the length of such a subpath is equal to the number of edges on itself. Thus, since the vertices are indexed in order, length of a subpath (PL) between vertices of \(v_{j}\) and \(v_{k}\) on \(P_{n}\) can be stated rigorously as follows.

$$PL_{(v_{j},v_{k})} = |k - j|$$

Then, the Eq. 1 gives the sum of path lengths for all (unordered) pairsFootnote 4

$$\begin{aligned} \sum PL_{(All Pairs)} = \sum _{i=1}^{n-1} \sum _{k=1}^{n-i} k \end{aligned}$$
(1)

After rewriting the Eq. 1, and dividing by the number of all pairs, which is \(\frac{n(n-1)}{2}\), we find the APL for the path graph of \(P_{n}\) as given in Eq. 2.

$$\begin{aligned} APL_{P_{n}} = \frac{\sum PL_{(All Pairs)}}{\frac{n(n-1)}{2}} = \frac{n+1}{3} \end{aligned}$$
(2)

According to Eq. 2, the APL for a chain-topology network is linearly proportional with the length of the chain or the number of nodes, i.e. O(n), and almost equal to one third of network diameter.

4.2 Path with an Additional Edge

Let \(P_{n}^{'}\) be a graph obtained by adding a new edge \((v_{x},v_{y})\) to the path graph \(P_{n}\) as depicted in Fig. 2. Rigorously,

$$P_{n}^{'} = P_{n} \cup (v_{x},v_{y})$$

To built a general mathematical expression for APL on \(P_{n}^{'}\), we first studied on small networks (e.g. around 10 nodes), manually calculated APL, and produced a sketchy formula for APL. Then, we extended our work with larger networks, as repeatedly checking accuracy of the formula, and revised it when needed until the formula persistently gave correct values for all networks investigated. This process yielded Eq. 3. Yet we also verified its correctness via experiments as described in the following sections.

$$\begin{aligned} APL_{P_{n}^{'}} = \frac{{\sum _{i=1}^{t-1} \sum _{k=1}^{t-i} k + (h-1)(\sum _{i=1}^{x} i + \sum _{i=1}^{n-y+1} i) - 1 + R}}{\frac{n(n-1)}{2}} \end{aligned}$$
(3)
Fig. 3.
figure 3

Experimental setup for determining APL

where \(h = y - x\), \(t = n - h + 1\), and

$$\begin{aligned} R = {\left\{ \begin{array}{ll} (2n-h-1)\sum _{i=0}^{h/2} i - h(n-h+1) +2, &{} \text {if } h \text { is even} \\ (2n-h-1)\sum _{i=0}^{(h-1)/2} i + \frac{(h+1)^2}{4} - \frac{(h-1)(n-h+3)}{2}, &{} \text {if } h \text { is odd} \end{array}\right. } \end{aligned}$$

Thus, we obtained analytical expressions for APL prior to and subsequent to additional link attachment into a path graph, as in Eqs. 2 and 3, which allows us to formulate our problem in the form of Integer Linear Programming (IP) as follows:

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} APL_{P_{n}^{'}} \\ \text {subject to} &{} \text {n, x, y are integer} \\ &{} x<y\\ &{} 1 \le x \le n\\ &{} 1 \le y \le n \end{array} \end{aligned}$$

where \(APL_{P_{n}^{'}}\) is given in Eq. 3.

It is known that IP is NP-hard [23], which implies that there is no known polynomial-time solution for IP problems. Yet, in the following sections, we will solve certain instances of the problem above by experimentally in the first place, and then, construct a general analytical solution for any value of network size (i.e. n) by means of linear regression method.

5 Finding Optimal Solutions

5.1 Numerical Solutions by Experiment

To find optimal solutions for certain cases of the problem introduced, we prepared an experimental set-up shown as pseudo-code in Fig. 3.

In the experiment, we incremented network size from 3 nodes to 1000 nodes, and varied attachment points (i.e. vertices) of the additional link for all possible cases as a brute-force approach. At each step of network size, we first defined network topology by entering adjacency list for the network, for all possible deployment of the additional link as varying variables of x and y, which represent the relative location of vertices \(v_{x}\) and \(v_{y}\). Then, for each topology, we found the shortest paths for all pairs by implementing Dijkstra’s well-known shortest path algorithm [23, 24]. Notice that Dijkstra determines the shortest path between only one pair of nodes, and for this reason, we iteratively employed it for all pairs in the graph (i.e. topology). After calculating lengths (i.e. hop counts) of the shortest paths for all pairs, we took average of them, and thus found APL. Finally, we identified the minimum APL among all calculated APLs yielded as varying locations of \(v_{x}\) and \(v_{y}\). This experimental process was repeated for certain network sizes of 3, 5, 10, 20, 50, 100, 200, 500 and 1000 nodes.

Table 1. Experimental results obtained by brute force computation

Table 1 contains some of the numerical results acquired in the experiments, including optimal solutions that minimize APL as well as results belong to ring topologies (i.e. the cases in which the first and the last nodes of paths are connected each other by the additional link). The first column in the table includes network size in terms of the number of nodes, while the second and the third columns contain APL for pure path (\(P_{n}\)) and ring topology (i.e. \(P_{n} \cup (v_{1},v_{n}\))) respectively. Notice that ring topology occurs when the first and the last nodes on a path are connected each other. The fourth column involves minimum APL which appears when the additional link is placed optimally (i.e. \(P_{n} \cup (v_{x\_opt},v_{y\_opt})\)). The last column shows optimal values of (\(v_{x}\), \(v_{y}\)) that minimize APL.

Figure 4 shows experimental results in the form of 3-dimensional color mapping when network size equals to 100 nodes. As can be seen in the figure, APL has the minimum value (i.e. dark blue color) at around \(x = 21\) and \(y = 80\), or equivalently, vice versa. Notice that the red area from left bottom corner to right upper corner represent Path topology, whereas the points both at the left up corner and at the right down corner produce Ring topology.

Fig. 4.
figure 4

APL for varying values of x and y, when network contains 100 nodes (Color figure online)

Verification of the Mathematical Model: One might doubt the accuracy of our mathematical model presented in Sect. 4, i.e. Eq. 3. To verify correctness of this mathematical expression, we first computed APL values by using Eq. 3 as assigning all possible values to the variables up to network size of 1000 nodes, and then searched out the instances giving minimum APL for each network size. Afterwards, we compared minimum APL values computed in Eq. 3 with the APL values yielded from the experimental calculations for certain network sizes as listed in Table 1. We eventually observed that both the mathematical model and the experimental calculations give precisely the same outcomes for APL, which shows the consistency between these two different approaches.

5.2 Analytical Solution by Linear Regression

Table 1 contains numerical results of optimal solutions for certain network sizes. However, to make a comprehensive analysis including asymptotic behaviour of optimal solutions and other variables, we need to establish analytical relations between these variables. For this purpose, we applied a linear regression method on the numerical results at hand, based on least square technique, and consequently, found the following relations.

$$\begin{aligned}&APL_{P_{n}^{'}}(n) = 0.195331*n + 0.559447 \end{aligned}$$
(4)
$$\begin{aligned}&x_{opt} = Round(0.207174*n - 0.0251311) \end{aligned}$$
(5)
$$\begin{aligned}&y_{opt} = Round(0.793222*n + 0.0497688) \end{aligned}$$
(6)

where Round(z) is a function which returns the nearest integer to z.

In fact, Eqs. 5 and 6 give precise answers to the main problem asked at the beginning of this paper. Equation 4, on the other hand, yields exact outcome for APL when an optimal solution is applied.

6 Discussions

6.1 Average Path Length (APL)

Figure 5 depicts APL for both \(P_{n}\) and \(P_{n}^{'}\) when optimal values of \((v_{x},v_{y})\) is applied, as network size varies from 3 nodes to 1000 nodes. As seen in the figure, APL linearly increases for both cases as network size grows. However, notice that \(P_{n}\) has higher slope than \(P_{n}^{'}\), which means that adding extra edge reduces APL on a network.

Fig. 5.
figure 5

Average Path Length (APL) for different topologies as network size grows

Notice that there is also model fit (i.e. regression line) which is obtained by linear regression. Goodness of fit can even be visually evaluated in Fig. 5, as the fitted line and numerical data exactly matches each other.

6.2 Improvement

Figure 6 exhibits the Improvement, i.e. the rate of decrement, on APL when an additional edge is placed to the network at optimal positions. As can be followed in the figure, the improvement rate begins with a slow growth at around 24.81 \(\%\) when \(n=3\), followed by a period of moderate growth, and then back to a period of slow growth asymptotically approaching to 41.4 \(\%\), which is consistent with the analytical analysis below.

$$\begin{aligned} \small Improvement&= 100 * (APL_{P_{n}} - APL_{P_{n}}^{'}) / APL_{P_{n}} \\&= 100 * \frac{(\frac{n+1}{3} - 0.195331n - 0.559447)}{(n+1)/3}\\&= 300 * \frac{0.138n - 0.226114}{n+1}\\&= \frac{41.4n - 67.8342}{n+1}\\&\quad \lim _{n \rightarrow +\infty } Improvement = 41.4\,\% \end{aligned}$$
Fig. 6.
figure 6

Improvement on APL and normalized length of optimal solutions (NLOS) as network size grows (smoothed)

6.3 Optimal Solutions

Equations 5 and 6 are optimal solutions in analytic form, while values on the fifth column in Table 1 are numeric solutions for certain network sizes. Thanks to Eqs. 5 and 6, one can readily determine optimal values of (\(v_{x}, v_{y}\)) for any network size. It is interesting to observe that the optimal solutions, when \(n = 3\) and \(n = 5\), are two end points of the path (i.e. \((v_{x},v_{y})\) equals to \((v_{1},v_{3})\) and \((v_{1},v_{5})\) respectively) . As network size grows, the optimal values of \(v_{x}\) and \(v_{y}\) slide gradually towards the center of the network. This observation motivated us to investigate normalized length between two end points (i.e. \(v_{x}\) and \(v_{y}\)) of optimal solutions in the next part.

Another observation here is that there is only one (i.e. unique) optimal solution when the network size (n) is odd, whereas there may emerge many optimal solutions when n is even, as can be observed in the fifth column of Table 1. We discovered that alternative optimal solutions for the same network yield isomorphic graphs when they are applied.

6.4 Normalized Length of Optimal Solutions (NLOS)

NLOS represents the normalized distance between two end points (i.e. \(v_{x\_opt}\) and \(v_{y\_opt}\)) of optimal link deployments that minimize APL. The normalization process is performed with respect to network size. Figure 6 includes NLOS as network size logarithmically grows. It can be deduced from the figure that the NLOS is reduced logarithmically beginning from 100 \(\%\) to around 58.6 \(\%\), which is consistent with the analytical analysis below. This means that the optimal solutions occur at around two end points of chain-topology when network size is small, whereas attachment points of optimal solutions move away from this end points as network size grows.

$$\begin{aligned} NL&= 100*\frac{|y_{opt}-x_{opt}|}{n-1} \\&= \frac{Round(58.6048*n + 7.48999)}{n-1}\\&\quad \lim _{n \rightarrow +\infty } NL = 58.6048\,\% \end{aligned}$$

7 Conclusion

Chain-topology networks performs poorly in certain performance metrics such as throughput, robustness, energy efficiency in data transmissions [17]. This is mostly due to fact that average path length (APL) in chain-topology is extremely high, which is almost one third of network size as we showed.

In this study, we aimed at compensating this deficiency by presenting an optimization model in which incremental link deployment was considered, with the objective of minimizing APL on a chain network. For this purpose, we first discovered mathematical expression of the objective, as well as formulated it in the form of Integer Programming (IP). Then, we prepared an experimental setup in order to determine APLs? of all possible topologies generated by placing an additional link to varying locations on a chain-topology network. Thus, we found optimal solutions that minimize APL for specific network sizes up to 1000 nodes, and also verified accuracy of our mathematical model. Through the experiments, for each specific network size, we implemented Dijkstra’s shortest path algorithm for all pairs, and took average of their lengths in terms of hop count to calculate corresponding APL values. Afterwards, we derived analytical solution by implementing Linear Regression method on the data obtained experimentally, which allowed us to see asymptotic behaviour of the solutions.

Our analyses showed that the optimization model proposed was able to reduce APL on chain-topology networks at a rate of between 24.81\(\%\) and 41.4\(\%\), with gradually increasing ratio as network size grows. Moreover, we found that normalized length of the additional link for optimal solution asymptotically approached to 58.6\(\%\) of network size.

Besides contribution of such an additional link optimally implanted for minimizing the APL, further research is required to improve other performance characteristics of chain-topology networks, such as ensuring load balancing.