1 Introduction

In the present paper we investigate how to hide our location quickly with minimal effort. As our primary motivation we focus on networks consisting of mobile objects, but we believe that our results can also be applied for efficient “losing” information about current state in various systems, even non-physical.

Let us consider a group of mobile devices or sensors called agents performing a task on a given area. The task could be for example collecting/detecting some valuable resource, mounting detectors or installing mines. In all aforementioned examples the system’s owner may want to hide the location of the agents against an adversary observing the terrain from the satellite or a drone. That is, location of the devices may leak sensitive information to the adversary. If we assume that the adversary’s surveillance of the terrain is permanent and precise then clearly no information can be concealed. Hence in our scenario there are periods of time when the adversary cannot observe the system during which the actual tasks are performed. Upon the approaching adversary, the devices launch an algorithm to hide their location, i.e. change their positions to mislead the observer. Clearly in many real life scenarios the additional movement dedicated for hiding their previous position should be possibly short for the sake of saving energy and time. It is also clear that the devices may want to return to their original positions in order to resume their activities when the adversary stops surveillance. On the other hand it is intuitively clear that a very short move may be not sufficient for “losing” the information about the starting positions.

The outlined description is an intuitive motivation for the research presented in our paper. Exactly the same problem can be however considered in many other settings when we demand “quick” reconfiguration of a system such that the observed configuration should say possibly small about the initial state. For that reason we decided to use quite general mathematical model, where the agents are placed in vertices of a graph and can move only through the edges (single edge in a single round). Our aim is to design an algorithm that governs the agents’ movement to change their initial locations in such a way that the adversary given the final assignment of agents cannot learn their initial positions.

At hand one can point the following strategy – every agent chooses independently at random some vertex on the graph and moves to this location. Clearly (but informally) the new location does not reveal any information about the initial one and the initial locations of agents are perfectly hidden from the adversary. The same effect can also be obtained if all agents go to a single, fixed in advance vertex. In this case again the final and initial configurations are stochastically independent. These strategies require however that agents know the topology of the graph. Intuitively, similar effect can be achieved if each agent starts a random walk and stops in a vertex after some number of steps. In this approach, the knowledge of the graph is not necessary, however one can see that the state after any number of steps reveals some knowledge about the initial positions (at least in some graphs). Moreover this strategy requires randomization. To summarize, there are many different methods for hiding the initial locations. It turns out that possible solutions and their efficiency depend greatly on the assumed model – if the graph is known to the agents, what memory is available, if the agents can communicate and if they have access to a source of random bits. Our paper formalizes this problem and discusses its variants in chosen settings.

Organization of the Paper. In Sect. 2 we describe the problem and the formal model. Section 3 summarizes the obtained results. The most important related work is mentioned in Sect. 4. In Sect. 5 we present results for the model wherein stations know the topology of the graph representing the terrain. We show both optimal algorithms as well as respective lower bounds. The case with unknown topology is discussed in Sect. 6. We conclude in Sect. 7. Some basic facts and definitions from Information Theory and theory of Markov chains are recalled in Appendix 1 and Appendix 2, respectively.

2 Model

The Agents in the Network. We model the network as a simple, undirected, connected graph with n vertices, m edges and diameter D. The nodes of the graph are uniquely labeled with numbers \(\{1,2,\dots ,n\}\). We have also \(k\ge 1\) agents representing mobile devices. Time is divided into synchronous rounds. At the beginning of each round each agent is located in a single vertex. In each round the agent can change its position to any of neighboring vertices. We allow many agents to be in a single vertex in the same round. The agents need to locally distinguish the edges in order to navigate in the graph hence we assume that the edges outgoing from a node with degree d are uniquely labeled with numbers \(\{1,2,\dots ,d\}\). We assume no correlation between the labels on two endpoints of any node. A graph with such a labeling is sometimes called port-labeled.

When an agent is located in a vertex we assume that it has access to the degree of the node and possibly the value or the estimate of n and to some internal memory sufficient for local computations it performs. In our paper we consider various models of mobile agents depending on the resources at their disposal. This will involve settings where the devices have or have not an access to a source of random bits and they are given a priori the topology of the network or they have no such knowledge. In the latter case we will consider two different scenarios depending on whether the agent has an access to operational memory that remains intact when it traverses an edge or its memory is very limited and does not allow to store any information about the network gathered while it moves from one vertex to another.

Our primary motivation is the problem of physical hiding of mobile devices performing tasks in some terrain. Nevertheless, our work aims for formalizing the problem of losing information on agents’ initial placement in a given network. Thus, we focus on proposing a theoretical model related to the logical topology.

Model of the Adversary. From the adversary’s point of view, the agents are indistinguishable and the nodes of the underlying graph are labeled. The assumption on indistinguishability is adequate for systems with very similar devices. Thus the state of the system in a given round t can be seen as a graph G and a function \(n_t(v)\) denoting the number of agents located at node v. Let \(X_t\), \(t \in \{ 0,1, \ldots \}\), represents the state of the network at the beginning of \(t^{\mathrm {th}}\) round.

We assume that in round 0 the agents complete (or interrupt due to approaching adversary) their actual tasks and run hiding algorithm \(\mathcal {A}\) that takes T rounds. Just after the round T the adversary is given the final state \(X_T\) and, roughly speaking, its aim is to learn as much as possible about the initial state \(X_0\). That is, the adversary gets an access to a single configuration (representing a single view of the system). Note that the adversary may have some a priori knowledge that is modeled as a distribution of \(X_0\). In randomized settings the adversary has no information about agents’ local random number generators. On the other hand, the aim of agents is to make learning the adversary \(X_0\) from \(X_T\) impossible for any initial state (or distributions of states).Footnote 1 Moreover the number of rounds T should be as small as possible (we need to hide the location quickly). We also consider energy complexity understood as the maximal number of moves (i.e. moving to a neighboring vertex) over all agents in the execution of \(\mathcal {A}\). Such definition follows from the fact that we need to have all agents working and consequently we need to protect the most loaded agent against running out of batteries. As we shall see, in all cases considered in this paper the energy complexity is very closely related to the time of getting to the “safe” configuration by all devices, namely it is asymptotically equal T.

Security Measures. Let \(X_0\) be a random variable representing the knowledge of the adversary about the initial state and let \(X_T\) denotes the final configuration of the devices after executing algorithm \(\mathcal {A}\). We aim to define a measure of efficiency of algorithm \(\mathcal {A}\) in terms of location hiding. In case of problems based on “losing” knowledge, there is no single, commonly accepted definition. This is reflected in dozens of papers including [6] and [18]. Nevertheless the good security measure needs to estimate “how much information” about \(X_0\) is in \(X_T\).

Let \(X \sim \mathcal {L}\) be a random variable with probability distribution \(\mathcal {L}\). We denote by \(E\left[ X\right] \) the expected value of X. By \(\mathrm {Unif}(A)\) we denote the uniform distribution over the set A and by \(\mathrm {Geo}(p)\) the geometric distribution with parameter p. An event E occurs with high probability (w.h.p.) if for an arbitrary constant \(\alpha > 0\) we have \(\Pr [E] \ge 1 - {{\mathrm{O}}}\left( n^{-\alpha }\right) \). Let \(H\left( X\right) \) denotes the entropy of the random variable X, \(H\left( X|Y\right) \) denotes conditional entropy and \(I\left( X,Y\right) \) mutual information. All that notations and definitions are recalled in Appendix 1.

Our definition is based on the following notion of normalized mutual information, also known as uncertainity coefficient (see e.g. [20], Chap. 14.7.4)

$$ U\left( X|Y\right) = \frac{I\left( X,Y\right) }{H\left( X\right) }. $$

From the definition of mutual information it follows that \(U\left( X|Y\right) = 1 - \frac{H\left( X|Y\right) }{H\left( X\right) }\) and \(0 \le U\left( X|Y\right) \le 1\). The uncertainity coefficient \(U\left( X|Y\right) \) takes the value 0 if there is no association between X and Y and the value 1 if Y contains all information about X. Intuitively, it tells us what portion of information about X is revealed by the knowledge of Y. \(H\left( X\right) = 0\) implies \(H\left( X|Y\right) = 0\) and we may use the convention that \(U\left( X|Y\right) = 0\) in that case. Indeed, in such case we have stochastic independence between X and Y and the interpretation in terms of information hiding can be based on the simple observation that \(H\left( X\right) = 0\) means that there is nothing to reveal about X (as we have full knowledge of X) and Y does not give any extra information.

Definition 1

The algorithm \(\mathcal {A}\) is \(\varepsilon \)-hiding if for any distribution of the initial configuration \(X_0\) with non-zero entropy (i.e. \(H\left( X_0\right) > 0\)) and for any graph G representing the underlying network

$$\begin{aligned} U\left( X_0|X_T\right) = \frac{I\left( X_0,X_T\right) }{H\left( X_0\right) } \le \varepsilon , \end{aligned}$$
(1)

where \(X_T\) is the state just after the execution of the algorithm \(\mathcal {A}\).

Definition 2

We say that the algorithm \(\mathcal {A}\) is

  • well hiding if it is \(\varepsilon \)-hiding for some \(\varepsilon (n) = \varepsilon \in {{\mathrm{o}}}\left( 1\right) \);

  • perfectly hiding if it is 0-hiding.

Fig. 1.
figure 1

Positions of \(k = 4\) agents (represented by black dots) in consecutive steps of sample execution of location hiding algorithm in a network with \(n = 8\) nodes.

Intuitively, this definition says that the algorithm works well if the knowledge of the final state reveals only a very small fraction of the total information about the initial configuration regardless of the distribution of initial placement of devices. Let us mention that these definitions state that any hiding algorithm should work well regardless of the network topology. If an algorithm \(\mathcal {A}\) is \(\varepsilon \)-hiding, then for any simple connected graph G and for any probability distribution of agents’ initial positions \(X_0\) with non-zero entropy the final configuration \(X_T\) after \(\mathcal {A}\) terminates should fulfill (1). Notice also that there are some cases when it is not feasible to hide the initial location in a given graph. Assuming the adversary knows the agents’ initial distribution \(X_0\), \(H\left( X_0\right) = 0\) means that the agents with probability 1 are initially placed in some fixed locations which are known to the adversary. In particular, this is the case when the graph has only one vertex (it can be a model of system with exactly one state). All devices must be then located in that vertex and no hiding algorithm exists for this setting.

The main idea of location hiding algorithms is depicted in Fig. 1. The agents are initially placed in vertices of a graph G (Fig. 1a) according to some known distribution of initial state \(X_0\). In each step every agent located in some vertex \(v_i\) can move along an edge incident to \(v_i\) or stay in \(v_i\). After T steps the algorithm terminates resulting in a final configuration \(X_T\) (Fig. 1d). Our goal is to ensure that any adversary observing the positions \(X_T\) of devices after execution of an location hiding algorithm can infer as small information as possible about their actual initial placement \(X_0\), regardless of G and the distribution of \(X_0\).

3 Our Results

Most of our results apply to the single-agent case. We first show that if the topology is known then any well-hiding algorithm in a graph with n nodes, m edges and diameter D requires \(\varOmega (D)\) steps and there exists a perfectly hiding algorithm that needs O(D) steps. Then we generalize this result to multi-agent scenario. Secondly we consider the case with unknown network topology. We show that in the model with no memory there exists no deterministic well hiding algorithm and for the randomized setting we present a well-hiding algorithm whose expected running time is \(\widetilde{O}(n^3)\) w.h.p. Finally if the agents have unlimited memory then \(\varTheta (m)\) steps is sufficient and necessary for well-hiding algorithms. Table 1 summarizes our results.

Table 1. Overview of our results

Let us mention that in the considered models it is feasible to “lose” information about the initial state not only in a randomized manner, but also fully deterministically. As we shall show, the algorithms are completely different. We find this property somehow surprising. Moreover, let us note that possible rate of losing information as well as the adequate algorithms strongly depend on the assumed model and agents’ capabilities (knowledge of the topology, memory).

4 Previous and Related Work

The problems of security and privacy protection in distributed systems have received a lot of attention. Various security aspects of such systems have been extensively discussed and a lot of novel solutions for some practical settings have been proposed over the last years. One of the major examples is the problem of designing routing protocols for wireless ad hoc networks which can hide the network topology from potential external and internal adversaries (see e.g. [16, 24]). The goal of such protocols is to find reliable routes between the source and destination nodes in the network which are as short as possible, reducing exposure of the network topology to malicious nodes by data transmitted in the packets. This will prevent adversaries (at least to some extent) for launching some kinds of attacks requiring the knowledge of the network topology which may be particularly harmful for the whole network and the tasks performed.

Another important line of research is assuring privacy of the users of mobile applications relying on location-based services and hence gathering information of their location. The examples are applications providing various information related to the user’s current location (e.g. real-time traffic information, places to visit) or activity-based social networks where users share the information about location-based activities they perform (cf. [19]). Various protocols for protecting location data together with some formal models and privacy metrics were proposed (see e.g. [12, 14]). However, in some cases the performance of designed protocols is evaluated only experimentally and the discussion of their security properties is informal, without referring to any theoretical model (cf. [16, 19]).

To the best of our knowledge, there is no rigid and formal analysis on the problem of location hiding in graphs and it has never been studied before in the context considered in this paper. The problems of ensuring security and privacy in distributed systems mentioned above are similar to our only to a certain extent. The aim of our approach is to propose a general formal model of hiding the positions of a set of mobile agents from an external observer and consider its basic properties and limitations. However, the problem considered by us is closely related to some of the most fundamental agent-based graph problems.

First of all observe the relation to the exploration that comes from the global nature of our problem. Clearly if the agent has at least logarithmic memory then we can use algorithms for graph exploration. Indeed, since the graph is labeled, it is sufficient to explore the graph and move to a vertex with minimum ID. Hence the vast body of literature about exploration in various models applies to our problem. In particular there exist polynomial deterministic algorithms (using Universal Sequences) that need only logarithmic memory [1, 21].

In the randomized setting, location hiding becomes related to the problem of reaching the stationary distribution by a Markov Chain (Mixing Time) as well as visiting all the states (Cover Time), i.e. the expected number of steps that are needed by a random walk to visit all the vertices. It is known that for a (unbiased) random walk, the cover time is between \(\varOmega (n\log n)\) [10] and \(O(n^3)\) [11], depending on the underlying graph structure. There exist biased random walks that achieve worst-case cover time of \(\widetilde{O}(n^2)\) [17], however in order to implement them the agent requires an access to some memory to acquire information necessary to compute the transition probabilities. It has been recently shown that in some graphs multiple random walks are faster than a single one [2, 8, 9]. Another interesting line of work is deriving biased random walks [3, 23].

5 Location Hiding for Known Topology

Let us first focus on the setting where the topology of the underlying network is known to the agents and consider one of the simplest possible protocols, namely every mobile agent goes from its initial positions to some fixed vertex \(v^{*} \in V\) (this is possible, because in the considered scenario the vertices in the graph have unique labels known to all the agents). One can easily see that this simple protocol is perfectly hiding. Indeed, regardless of the distribution of the agents’ initial placement, after executing the protocol all devices are always located in the same vertex known in advance. Hence, \(X_{T}\) and \(X_0\) are independent and \(I\left( X_0,X_T\right) \) = 0 (and therefore \(U\left( X_0|X_T\right) \) = 0, as required). But this approach leads to the worst case time and energy complexity for a single device of order \(\varTheta (D)\), where D is the graph diameter. Appropriate selection of the vertex \(v^{*}\) as an element of the graph center can reduce the worst case complexity only by a constant, but it does not change its order. The natural question that arises in this context is whether there exist a perfectly hiding (or at least well hiding) protocol that requires asymptotically smaller number of rounds for ensuring privacy than the simple deterministic protocol discussed above. In general, we are interested in determining the minimal number of steps required by any location hiding protocol in considered scenarios for ensuring a given level of security (in terms of the amount of information being revealed) for arbitrary distribution of initial configuration of the agents and for arbitrary underlying network.

5.1 Single Agent Scenario

Let us consider the simple scenario where there is only one mobile device in the network located in some vertex \(v \in V\) according to some known probability distribution \(\mathcal {L}\) over the set of vertices. Assume that the network topology is known to the agent. Our goal is to find the lower bound on the number of steps that each well hiding protocol requires to hide the original location of the device in this scenario for arbitrary graph G and initial distribution \(\mathcal {L}\).

We will start with a general lemma showing that if within t steps the sets of vertices visited by the algorithm starting from two different vertices are disjoint with significant probability, then the algorithm is not well hiding within time t.

Lemma 1

Let \(\mathcal {A}\) be any hiding algorithm and \(G = (V,E)\) be an arbitrary graph. Suppose that for some \(t > 0\) and some positive constant \(\gamma \) there exist two distinct vertices \(u, v \in V\) s.t. with probability at least \(1{/}2 + \gamma \) the following property holds: sets \(V_1\) and \(V_2\) of vertices reachable after execution of t steps of \(\mathcal {A}\) when starting from u and v, respectively, are disjoint. Then \(\mathcal {A}\) is not well hiding in time t.

Proof

Fix an arbitrary graph \(G = (V,E)\) with \(|V| = n\) and hiding algorithm \(\mathcal {A}\). Let \(u, v \in V\) be two vertices such that the sets \(V_1\) and \(V_2\) of possible location of the agent after performing t steps of \(\mathcal {A}\) when starting in u and v, respectively, are disjoint with probability at least \(1{/}2 + \gamma \) for some constant \(\gamma > 0\) regardless of the starting point, i.e. \(\Pr [\xi _V = 1] \ge 1{/}2 + \gamma \), where \(\xi _V\) is an indicator random variable of the event \(V_1 \cap V_2 = \emptyset \). Consider the following two-point distribution \(\mathcal {L}\) of the agents’ initial location \(X_0\): \(\Pr [X_0 = u] = \Pr [X_0 = v] = 1{/}2\), \(\Pr [X_0 = w] = 0\) for \(w \in V \setminus \{u, v\}\). We will prove that such \(\mathcal {A}\) does not ensure that the initial position \(X_0\) of the device is well hidden at time t when \(X_0 \sim \mathcal {L}\).

Because \(H\left( X_0\right) = 1\), \(U\left( X_0|X_t\right) = I\left( X_0,X_t\right) \) and it suffices to show that the mutual information \(I\left( X_0,X_t\right) \ge \eta > 0\) for some positive constant \(\eta \). This is equivalent to \(H\left( X_0|X_t\right) \le 1 - \eta \), as follows from Fact 4. Clearly, for \(y \in V_1\)

$$\begin{aligned} \Pr [X_0 = u | X_t = y] \ge \Pr [X_0 = u, \xi _V = 1 | X_t = y] \ge 1{/}2 + \gamma \end{aligned}$$
(2)

and the same holds after replacing u with v. Moreover \(\Pr [X_0 = v | X_t = y] = 1 - \Pr [X_0 = u | X_t = y]\). Denoting \(\Pr [X_0 = u | X_t = y]\) by \(p_{u|y}\) we have

$$\begin{aligned} H\left( X_0|X_t\right)&= -\sum _{y \in V} \Pr [X_t=y] \sum _{x \in V} \Pr [X_0=x|X_t=y] \log (\Pr [X_0=x|X_t=y])\\ \nonumber&= -\sum _{y \in V} \Pr [X_t=y] \left( p_{u|y} \log (p_{u|y}) + (1-p_{u|y}) \log (1 -p_{u|y})\right) . \end{aligned}$$
(3)

Let us consider the function \(f(p) = -(p\log (p) + (1-p)\log (1-p))\) for \(p \in (0,1)\). Clearly, \(\lim _{p \rightarrow 0} f(p) = \lim _{p \rightarrow 1} f(p) = 0\) and f(p) has its unique maximum on the interval (0,1) equal to 1 at \(p = 1{/}2\). From (2) we have \((\forall {y \in V_1})(p_{u|y} \ge 1{/}2 + \gamma )\) and \((\forall {y \in V_2})(p_{u|y} \le 1{/}2 - \gamma )\). Therefore, there exists some positive constant \(\eta \) such that \(f(p_{u|y}) \le 1-\eta \). From the definition of the sets \(V_1\) and \(V_2\) we also have \(\Pr [X_t = y \notin V_1 \cup V_2] = 0\). Using these facts, (3) can be rewritten as

$$\begin{aligned} H\left( X_0|X_t\right)= & {} \sum _{y \in V_1} \Pr [X_t=y] f(p_{u|y}) + \sum _{y \in V_2} \Pr [X_t=y] f(p_{u|y}) \\\le & {} (1-\eta ) \Pr [X_t \in V_1 \cup V_2] = 1 - \eta \end{aligned}$$

for some constant \(\eta > 0\), as required. Hence, the lemma is proved.    \(\square \)

From the Lemma 1 we get the lower bound of \(\varOmega (D)\) on the expected number of steps needed by any well hiding algorithm in the model with known topology. Note that the lower bound matches the simple O(D) upper bound.

Theorem 1

For a single agent and known network topology and for an arbitrary graph G there exists a distribution \(\mathcal {L}\) of agent’s initial position such that any well hiding algorithm \(\mathcal {A}\) needs to perform at least \(\left\lfloor D/2 \right\rfloor \) steps with probability \(c \ge 1{/}2 - {{\mathrm{o}}}\left( 1\right) \), where D is the diameter of G.

Proof

We will show that for each graph G there exist a distribution of the initial state of the mobile agent such that each well hiding algorithm \(\mathcal {A}\) needs at least \(\left\lfloor D/2 \right\rfloor \) rounds with some probability \(c \ge 1{/}2 - {{\mathrm{o}}}\left( 1\right) \).

Fix an arbitrary graph \(G = (V,E)\) with \(|V| = n\). Let \(u, v \in V\) be two vertices such that \(d(u,v) = D\). Denote by \(\delta = \left\lfloor D/2 \right\rfloor \) and consider the following two-point distribution \(\mathcal {L}\) of the agents’ initial location \(X_0\): \(\Pr [X_0 = u] = \Pr [X_0 = v] = 1{/}2\), \(\Pr [X_0 = w] = 0\) for \(w \in V \setminus \{u, v\}\). Suppose that some hiding algorithm \(\mathcal {A}\) terminates with probability at least \(1{/}2 + \gamma \) for some constant \(\gamma > 0\) after \(T < \delta \) steps regardless of the starting point, i.e. \(\Pr [T < \delta ] \ge 1{/}2 + \gamma \).

Obviously there is no \(z \in V\) such that \(d(u,z) < \delta \) and \(d(v,z) < \delta \) (if so, \(D = d(u, v) < 2 \left\lfloor D/2\right\rfloor \le D\) and we will get a contradiction). Let us define \(B(u,\delta ) = \{y \in V :d(u,y) < \delta \}\) and \(B(v,\delta ) = \{y \in V :d(v,y) < \delta \}\). It is clear that \(B(u,\delta ) \cap B(v,\delta ) = \emptyset \). From the assumptions on the running time of \(\mathcal {A}\) with probability at least \(1{/}2 + \gamma \) the sets \(V_1\) and \(V_2\) of vertices reachable from u and v, respectively, fulfills \(V_1 \subseteq B(u,\delta )\) and \(V_2 \subseteq B(v,\delta )\), therefore they are disjoint. Hence it suffices to apply the results from Lemma 1 to complete the proof.    \(\square \)

5.2 Location Hiding for k Agents and Known Network Topology

Let us recall that the energy complexity of an algorithm \(\mathcal {A}\) in the multi-agent setting is defined as the maximal distance covered (i.e. number of moves) in the execution of \(\mathcal {A}\) over all agents. This allows us for direct translation of results from single-device setting, as presented below.

In the general scenario considered in this section a similar result holds as for the single-agent case. Namely, each algorithm which ensures the well hiding property regardless of the distribution of agents’ initial placement requires in the worst case \(\varOmega (D)\) rounds.

Lemma 2

For known network topology and \(k > 1\) indistinguishable agents initially placed according to some arbitrary distribution \(\mathcal {L}\), any well hiding algorithm for an arbitrary underlying graph G has energy complexity at least \(\left\lfloor D/2 \right\rfloor \) with probability \(c \ge 1{/}2 - {{\mathrm{o}}}\left( 1\right) \), where D is the diameter of G.

The proof of the Lemma 2 proceeds in the same vein as in Theorem 1. We choose two vertices uv in distance D and put all agents with probability 1/2 in any of these vertices. Denoting by \(T_i\), \(1 \le i \le k\), the number of steps performed by the agent i and by \(T = \max _{1 \le i \le k} T_i\) the energy complexity of the algorithm we consider a hiding algorithm \(\mathcal {A}\) such that \(\Pr [T < \delta ] \ge 1{/}2 + \gamma \) for \(\delta = \left\lfloor D/2 \right\rfloor \) and some positive constant \(\gamma > 0\). The only difference is that instead of the sets \(B(u,\delta ) = \{y \in V :d(u,y) < \delta \}\) and \(B(v,\delta ) = \{y \in V :d(v,y) <\delta \}\) itself we consider the subsets \(\mathcal {S}_1\) and \(\mathcal {S}_2\) of the state space consisting of such states that all of the agents are located only in the vertices from the set \(B(u,\delta )\) or \(B(v,\delta )\), respectively. Similar calculations as previously led to the conclusion that any such algorithm cannot ensure the well hiding property.

6 Location Hiding for Unknown Topology

6.1 No Memory

If no memory and no information about the topology is available, but the agent is given access to a source of randomness, it can perform a random walk in order to conceal the information about its starting position. However, the agent would not know when to stop the walk. If in each step it would choose to terminate with probability depending on the degree of the current node, one could easily construct an example in which the agent would not move far from its original position (with respect to the network size). Hence in this section we assume that the size of the network is known. Then the problem becomes feasible. Consider the following Algorithm 1: in each step we terminate with probability q (roughly \(n^{-3}\)) and with probability \(1-q\) we make one step of a lazy random walk. We will choose q later. Let us note that letting the walk to stay in current vertex with some fixed constant probability is important for ensuring aperiodicity of the Markov chain (see e.g. [13, 15]). Otherwise we can easily provide an example where such algorithm does not guarantee the initial position to be hidden. Namely, consider any bipartite graph and any initial distribution s.t. the agent starts with some constant probability either in a fixed black or white vertex. If the adversary is aware only of the running time (i.e. the number of steps the agent performed), when observing the network after T steps it can with probability 1 identify agent’s initial position depending on T is even or odd. Nevertheless, the probability of remaining in a given vertex can be set to arbitrary constant \(0< c < 1\). For the purposes of analysis we let \(c = 1{/}2\) which leads to the classical definition of lazy random walk (see Definition 10 and Fact 8 in Appendix 2).

figure a

Theorem 2

The algorithm \(\mathcal {A}(q)\) based on random walk with termination probability \(q = \frac{f(n)}{n^{3} \log {h(n)}}\) for any fixed \(f(n) = {{\mathrm{o}}}\left( 1\right) \) and h(n) = \({{\mathrm{\omega }}}\left( \max \{n^{2}, \frac{1}{H\left( X_0\right) }\}\right) \) is well hiding for any graph G and any distribution of agent’s initial location \(X_0\).

Proof

Fix \(\varepsilon > 0\). Let \(t_{\mathrm {mix}}\left( \varepsilon \right) \) denote the mixing time and \(\pi \) the stationary distribution of the random walk performed by the algorithm according to Definition 9. We will choose the exact value for \(\varepsilon \) later. Let \(X_0\) and \(X_T\) be the initial and final configuration, respectively. To prove the lemma it suffices to show that \(\frac{H\left( X_0|X_T\right) }{H\left( X_0\right) } = 1 - {{\mathrm{o}}}\left( 1\right) \), what is equivalent to \(\lim _{n \rightarrow \infty } \frac{H\left( X_0|X_T\right) }{H\left( X_0\right) } = 1\). This implies that \(U\left( X_0|X_T\right) = {{\mathrm{o}}}\left( 1\right) \) as required by Definition 2. Let \(\xi _{\varepsilon } = \mathbf {1}[T > t_{\mathrm {mix}}\left( \varepsilon \right) ]\) be the indicator random variable taking value 1 if \(T > t_{\mathrm {mix}}\left( \varepsilon \right) \) and 0 otherwise. We need to ensure that the algorithm \(\mathcal {A}\) will stop with probability at least \(1 - {{\mathrm{o}}}\left( 1\right) \) after \(t_{\mathrm {mix}}\left( \varepsilon \right) \) steps. The time T when \(\mathcal {A}\) terminates follows \(\mathrm {Geo}(q)\) distribution, hence \(\Pr [\xi _{\varepsilon } = 1] = (1-q)^{t_{\mathrm {mix}}\left( \varepsilon \right) }\). Letting \(q = f(n)/t_{\mathrm {mix}}\left( \varepsilon \right) \) for some \(f(n) = {{\mathrm{o}}}\left( 1\right) \) implies \(\Pr [\xi _{\varepsilon } = 1] = 1 - {{\mathrm{o}}}\left( 1\right) \), as required.

Let us consider \(H\left( X_0|X_T\right) \). By Fact 1 (see Appendix 1) we have

$$\begin{aligned} H\left( X_0|X_T\right)&\ge H\left( X_0|X_T, \xi _{\varepsilon }\right) \ge H\left( X_0|X_T, \xi _{\varepsilon } = 1\right) \Pr [\xi _{\varepsilon } = 1]\\&= (1 - {{\mathrm{o}}}\left( 1\right) )\ H\left( X_0|X_T, \xi _{\varepsilon } = 1\right) \ge (1 - {{\mathrm{o}}}\left( 1\right) )\ H\left( X_0|X_{t_{\mathrm {mix}}\left( \varepsilon \right) }\right) , \end{aligned}$$

where the last inequality follows directly from Fact 9 in Appendix 1.

Let \(p_{0}(x) = \Pr [X_0 = x]\), \(p_{t}(y) = \Pr [X_{t_{\mathrm {mix}}\left( \varepsilon \right) } = y]\), \(p_{0}(x|y) = \Pr [X_0 = x | X_{t_{\mathrm {mix}}\left( \varepsilon \right) } = y]\) and \(p_{t}(y|x) = \Pr [X_{t_{\mathrm {mix}}\left( \varepsilon \right) } = y | X_0 = x]\). As \(p_{0}(x|y) = \frac{p_{t}(y|x)}{p_{t}(y)} p_{0}(x)\),

$$\begin{aligned} H\left( X_0|X_{t_{\mathrm {mix}}\left( \varepsilon \right) }\right)&= - \sum _{y \in V} p_{t}(y) \sum _{x \in V} p_{0}(x|y) \log p_{0}(x|y) \nonumber \\&= - \sum _{y \in V} \sum _{x \in V} p_{t}(y|x) p_{0}(x) \log p_{0}(x) \\&{{}=} - \sum _{y \in V} \sum _{x \in V} p_{t}(y|x) p_{0}(x) \log \frac{p_{t}(y|x)}{p_{t}(y)}. \nonumber \end{aligned}$$
(4)

The properties of mixing time imply that there exist \(\{\varepsilon _{y}^{(1)}\}_{y \in V}\) and \(\{\varepsilon _{y}^{(2)}\}_{y \in V}\) such that \(\sum _{y \in V} \varepsilon _{y}^{(i)} \le 2\varepsilon \) for \(i \in \{1,2\}\) and \(\pi (y) - \varepsilon _{y}^{(1)} \le p_{t}(y|x) \le \pi (y) + \varepsilon _{y}^{(1)}\) and \(\pi (y) - \varepsilon _{y}^{(2)} \le p_{t}(y) \le \pi (y) + \varepsilon _{y}^{(2)}\). Let \(\varepsilon _{y} = \max \{\varepsilon _{y}^{(1)}, \varepsilon _{y}^{(2)}\}\). As for any \(y \in V\) \(\pi (y) \ge 1/n^{2}\), letting \(\varepsilon \) being arbitrary \(\varepsilon (n) = {{\mathrm{o}}}\left( \min \{\frac{1}{n^2}, H\left( X_0\right) \}\right) \) we get

$$ \frac{p_{t}(y|x)}{p_{t}(y)} \le \frac{\pi (y) + \varepsilon _{y}}{\pi (y) - \varepsilon _{y}} = 1 + {{\mathrm{o}}}\left( \min \{1, H\left( X_0\right) \}\right) . $$

Thus, the above relations allow us to find the lower bound on the conditional entropy \(H\left( X_0|X_{t_{\mathrm {mix}}\left( \varepsilon \right) }\right) \). The first sum in (4) gives us

$$\begin{aligned} - \sum _{y \in V} \sum _{x \in V} p_{t}(y|x) p_{0}(x) \log p_{0}(x)&\ge \sum _{y \in V} (\pi (y) - \varepsilon _{y}) H\left( X_0\right) \ge H\left( X_0\right) (1 - 4\varepsilon ) \nonumber \\&= H\left( X_0\right) (1 - {{\mathrm{o}}}\left( 1\right) ), \end{aligned}$$
(5)

whereas the second sum can be expressed as

$$\begin{aligned} - \sum _{y \in V} \sum _{x \in V} p_{t}(y|x) p_{0}(x) \log \frac{p_{t}(y|x)}{p_{t}(y)}&= - \sum _{x \in V} p_{0}(x) \sum _{y \in V} p_{t}(y|x) \log \frac{p_{t}(y|x)}{p_{t}(y)} \\&= - \sum _{x \in V} p_{0}(x) D\left( p_{t}(y|x)||p_{t}(y)\right) , \end{aligned}$$

where \(D\left( \cdot ||\cdot \right) \) is relative entropy recalled in Definition 5 in Appendix 1.

Applying the upper bound on the relative entropy from Fact 3 we get

$$\begin{aligned} \sum _{x \in V} p_{0}(x) D\left( p_{t}(y|x)||p_{t}(y)\right)&\le \sum _{x \in V} p_{0}(x) \frac{1}{\ln 2} \left( \sum _{y \in V} \frac{(p_{t}(y|x))^{2}}{p_{t}(y)} - 1 \right) \nonumber \\&\le \frac{1}{\ln 2} \sum _{x \in V} p_{0}(x) \sum _{y \in V} \left( \frac{(\pi (y) + \varepsilon _{y})^{2}}{\pi (y) - \varepsilon {y}} - \pi (y) \right) \\&= {{\mathrm{o}}}\left( H\left( X_0\right) \right) . \nonumber \end{aligned}$$
(6)

Combining the estimations (5) and (6) we obtain \(H\left( X_0|X_{t_{\mathrm {mix}}\left( \varepsilon \right) }\right) \ge H\left( X_0\right) (1 - {{\mathrm{o}}}\left( 1\right) ) - {{\mathrm{o}}}\left( H\left( X_0\right) \right) \), what results in

$$ \frac{H\left( X_0|X_T\right) }{H\left( X_0\right) } \ge \frac{H\left( X_0\right) (1 - {{\mathrm{o}}}\left( 1\right) ) - {{\mathrm{o}}}\left( H\left( X_0\right) \right) }{H\left( X_0\right) } = 1 - {{\mathrm{o}}}\left( 1\right) , $$

as required.

In the above we have set \(q = f(n)/t_{\mathrm {mix}}\left( \varepsilon \right) \) for arbitrary fixed \(f(n) = {{\mathrm{o}}}\left( 1\right) \) and \(\varepsilon = {{\mathrm{o}}}\left( \min \{n^{-2}, H\left( X_0\right) \}\right) \). From Fact 7 and Fact 8 we have \(t_{\mathrm {mix}}\left( \varepsilon \right) \le n^{3} \log \varepsilon ^{-1}\). Hence, there exists some \(g(n) = {{\mathrm{\omega }}}\left( \max \{n^2,{H\left( X_0\right) }^{-1}\}\right) \) dependent on \(\varepsilon \) such that \(t_{\mathrm {mix}}\left( \varepsilon \right) \le n^{3} \log g(n)\) and \(q = \left( h(n) \cdot n^{3} \log g(n) \right) ^{-1}\), where \(h(n) = 1/f(n) = {{\mathrm{\omega }}}\left( 1\right) \).    \(\square \)

As previously mentioned, the running time T of the considered hiding algorithm follows geometric distribution with parameter q, hence the expected running time is \(E\left[ T\right] = 1/q = h(n) \cdot n^{3} \log g(n)\), where h(n) and g(n) are as in the proof of Theorem 2. If \(H(X_0) = {{\mathrm{\varOmega }}}\left( \frac{1}{n^2}\right) \), as in the case of most distribution considered in practice, we can simply select f(n) to be some function decreasing to 0 arbitrary slowly and \(\varepsilon \) such that \(g(n) = c n^{3} \log {n}\) for some constant \(c > 0\). In such cases the entropy of the distribution of agent’s initial position has no impact on the upper bound on the algorithm’s running time.

Algorithm 1 works also for the scenario with many agents (each agent can run independent walk). The interesting question is whether it is possible to hide the initial state in multi-agent case faster by taking advantage of performing simultaneously many random walks. As the speedup of multiple random walks in any graph remains a conjecture [2], we leave this as an open question.

We conclude this section with a simple observation that the agent must have access to either memory, source of randomness or the topology of the network in order to hide.

Theorem 3

In the model with unknown topology and with no memory there exists no well-hiding deterministic algorithm.

Proof

Take any hiding algorithm. If this algorithm never makes any move it obviously is not well-hiding. Otherwise observe that in the model without memory the move is decided only based on local observations (degree of the node) and some global information (value of n), hence every time the agent visits a node it will make the same decision. Assume that the agent decides to move from a node of degree d via port p. We construct a graph from two stars with degree d joined by an edge e with port p on both endpoints. Since the agent has no memory and no randomness it will end up in an infinite loop traversing edge e. Hence this algorithm cannot be regarded as well-hiding since it never terminates.    \(\square \)

6.2 Unlimited Memory

In this section we assume that the agent is endowed with unlimited memory that remains intact when the agent traverses an edge. We first observe that a standard search algorithm (e.g. DFS) can be carried out in such a model.

Theorem 4

There exists a perfectly hiding algorithm in the model with unlimited memory that needs O(m) steps in any graph.

Proof

The algorithm works as follows: it runs a DFS search of the graph (it is possible if the agent has memory) and moves to the node with minimum ID.    \(\square \)

Now we would like to show that \(\varOmega (m)\) steps are necessary for any well-hiding algorithm in this model. We will construct a family of graphs such that for any well-hiding algorithm and any n and m we can find a graph with n nodes and m edges in this family such that this algorithm will need on average \(\varOmega (m)\) steps.

Theorem 5

For a single agent and unknown network topology, for any n and m and any well hiding algorithm \(\mathcal {A}\) there exists a port labeled graph G with n vertices and m edges representing the network and a distribution \(\mathcal {L}\) of agent’s initial position on which the agent needs to perform \(\varOmega (m)\) steps in expectation.

Proof

If \(m = O(n)\) we can construct a graph in which \(D = \varOmega (m)\) and use Theorem 1. Now assume that \(m = \omega (n)\) and consider a graph constructed by connecting a chain of y cliques of size x. If \(m=\omega (n)\) we can find such xy that \(x = \varTheta (m/n)\) and \(y = \varTheta (n^2/m)\). The adjacent cliques are connected by adding an vertex on two edges (one from each clique) and connecting these new vertices by an additional edge. We call this edge by bridge and the vertices adjacent to a bridge by bridgeheads. Let \(\mathcal {G}_{x,y}\) be the family of all such chains of cliques on n nodes and m edges (note that we take only such chains in which an edge has at most one bridgehead). We want to calculate the expected time that \(\mathcal {A}\) needs to reach the middle of the chain (if y is even then it is the middle bridge, otherwise it is the middle clique) on a graph chosen uniformly at random from \(\mathcal {G}_{x,y}\).

When the agent is traversing edges of the clique, each edge contain a bridgehead with probability \(\left( \left( {\begin{array}{c}x\\ 2\end{array}}\right) -1)\right) ^{-1}\), hence with probability at least \(\frac{1}{2}\) the agent needs to traverse \(\left( \left( {\begin{array}{c}x\\ 2\end{array}}\right) -1\right) /2\) different edges. As in each clique bridgeheads are chosen independently, we can choose the constants so that by the Chernoff Bound the time to reach the middle of the chain is \({{\mathrm{\varOmega }}}\left( y \cdot x^2\right) = {{\mathrm{\varOmega }}}\left( m\right) \) with probability at least \(\frac{3}{4}\) if G is chosen uniformly at random from \(\mathcal {G}_{x,y}\). By symmetry, this holds for both endpoints (reaching middle from the first or the y-th clique). Hence, there exists \(G^* \in \mathcal {G}_{x,y}\) such that with probability at least \(\frac{3}{4}\) the time of \(\mathcal {A}\) to reach the middle from both endpoints is at least \(c \cdot m\) for some constant \(c>0\). By Lemma 1, \(\mathcal {A}\) is not well-hiding on \(G^*\) if the number of steps is at most \(c\cdot m\).    \(\square \)

7 Conclusions and Further Research

We introduced the problem of location hiding, discussed efficient algorithms and lower bounds for some settings. Nevertheless, some questions are left unanswered.

The model considered by us encompasses wide range of scenarios with large variety of possible agents arrangements. Moreover, some natural classes of graphs may provide reasonable approximation in the cases where terrain should be modeled as a connected region on a plane. Some examples of such graphs are those from families of grid-like graphs. They contain edges joining only those pairs of vertices which are close to each other (in the sense that they have small euclidean distance after embedding the graph on a plane). Increasing the number of vertices leads then to more close resemblance to connected continuous regions. It would be, however, an interesting line of research to fully extend our approach to continuous connected terrains and derive analogous results for that model.

Another line of research is the model with dynamic topology that may change during the execution of the protocol. Similarly, we believe that it would be interesting to investigate the model with the weaker adversary that is given only partial knowledge of the graph topology and the actual assignment of agents. On the other hand, one may study more powerful adversaries being able to observe some chosen part of the network for a given period of time. It would be also worth considering how high level of security can be achieved if each agent is able to perform only \({{\mathrm{O}}}\left( 1\right) \) steps. Motivated by the fact that the mobile devices are in fact similar objects, we considered the setting where they are indistinguishable. It would be useful to study the case when the adversary can distinguish between different agents. We also plan to deeper understand the relation of location hiding problem with classic, fundamental problems like rendez-vous or patrolling.

We defined the energy complexity as maximal energetic expenditure over all agents. In some cases, however, it would be more adequate to consider total energy used by all stations. Finally, it would also be interesting to construct more efficient protocols for given classes of graphs with some common characteristic (e.g., lines, trees) and algorithms desired for restricted distributions of \(X_0\).