Encyclopedia of Database Systems

Living Edition
| Editors: Ling Liu, M. Tamer Özsu

Air Indexes for Spatial Databases

  • Baihua ZhengEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7993-3_15-2

Definition

Air indexes refer to indexes employed in wireless broadcast environments to address scalability issue and to facilitate power saving on mobile devices [4]. To retrieve a data object in wireless broadcast systems, a mobile client has to continuously monitor the broadcast channel until the data arrives. This will consume a lot of energy since the client has to remain active during its waiting time. The basic idea of air indexes is that by including index information about the arrival times of data items on the broadcast channel, mobile clients are able to predict the arrivals of their desired data. Thus, they can stay in power saving mode during waiting time and switch to active mode only when the data of their interests arrives.

Historical Background

In spatial databases, clients are assumed to be interested in data objects having spatial features (e.g., hotels, ATM, gas stations). “Find me the nearest restaurant” and “locate all the ATMs that are within 100 miles of my current location” are two examples. A central server is allocated to keep all the data, based on which the queries issued by the clients are answered. There are basically two approaches to disseminating spatial data to clients: (i) on-demand access: a mobile client submits a request, which consists of a query and the query’s issuing location, to the server. The server returns the result to the mobile client via a dedicated point-to-point channel. (ii) periodic broadcast: data are periodically broadcast on a wireless channel open to the public. After a mobile client receives a query from its user, it tunes into the broadcast channel to receive the data of interest based on the query and its current location.

On-demand access is particularly suitable for light-loaded systems when contention for wireless channels and server processing is not severe. However, as the number of users increases, the system performance deteriorates rapidly. Compared with on-demand access, broadcast is a more scalable approach since it allows simultaneous access by an arbitrary number of mobile clients. Meanwhile, clients can access spatial data without reporting to the server their current location and hence the private location information is not disclosed.

In the literature, two performance metrics, namely access latency and tuning time, are used to measure access efficiency and energy conservation, respectively [4]. The former means the time elapsed between the moment when a query is issued and the moment when it is satisfied, and the latter represents the time a mobile client stays active to receive the requested data. As energy conservation is very critical due to the limited battery capacity on mobile clients, a mobile device typically supports two operation modes: active mode and doze mode. The device normally operates in active mode; it can switch to doze mode to save energy when the system becomes idle.

With data broadcast, clients listen to a broadcast channel to retrieve data based on their queries and hence are responsible for query processing. Without any index information, a client has to download all data objects to process spatial search, which will consume a lot of energy since the client needs to remain active during a whole broadcast cycle. A broadcast cycle means the minimal duration within which all the data objects are broadcast at least once. A solution to this problem is air indexes [4]. The basic idea is to broadcast an index before data objects (see Fig. 1 for an example). Thus, query processing can be performed over the index instead of actual data objects. As the index is much smaller than the data objects and is selectively accessed to perform a query, the client is expected to download less data (hence incurring less tuning time and energy consumption) to find the answers. The disadvantage of air indexing, however, is that the broadcast cycle is lengthened (to broadcast additional index information). As a result, the access latency would be worsen. It is obvious that the larger the index size, the higher the overhead in access latency.
Fig. 1

Air indexes in wireless broadcast environments

An important issue in air indexes is how to multiplex data and index on the sequential-access broadcast channel. Figure 1 shows the well-known (1, m) scheme [4], where the index is broadcast in front of every 1/m fraction of the dataset. To facilitate the access of index, each data page includes an offset to the beginning of the next index. The general access protocol for processing spatial search involves following three steps: (i) initial probe: the client tunes into the broadcast channel and determines when the next index is broadcast; (ii) index search: The client tunes into the broadcast channel again when the index is broadcast. It selectively accesses a number of index pages to find out the spatial data object and when to download it; and (iii) data retrieval: when the packet containing the qualified object arrives, the client downloads it and retrieves the object.

To disseminate spatial data on wireless channels, well-known spatial indexes (e.g., R-trees) are candidates for air indexes. However, unique characteristics of wireless data broadcast make the adoption of existing spatial indexes inefficient (if not impossible). Specifically, traditional spatial indexes are designed to cluster data objects with spatial locality. They usually assume a resident storage (such as disk and memory) and adopt search strategies that minimize I/O cost. This is achieved by backtracking index nodes during search. However, the broadcast order (and thus the access order) of index nodes is extremely important in wireless broadcast systems because data and index are only available to the client when they are broadcast on air. Clients cannot randomly access a specific data object or index node but have to wait until the next time it is broadcast. As a result, each backtracking operation extends the access latency by one more cycle and hence becomes a constraint in wireless broadcast scenarios.

Figure 2 depicts an example of spatial query. Assume that an algorithm based on R-tree first visits root node, then the node R2, and finally R1, while the server broadcasts nodes in the order of root, R1, and R2. If a client wants to backtrack to node R1 after it retrieves R2, it will have to wait until the next cycle because R1 has already been broadcast. This significantly extends the access latency and it occurs every time a navigation order is different from the broadcast order. As a result, new air indexes which consider both the constraints of the broadcast systems and features of spatial queries are desired.
Fig. 2

Linear access on wireless broadcast channel

Foundations

Several air indexes have been recently proposed to support broadcast of spatial data. These studies can be classified into two categories, according to the nature of the queries supported. The first category focuses on retrieving data associated with some specified geographical range, such as “Starbucks Coffee in New York City’s Times Square” and “Gas stations along Highway 515.” A representative is the index structure designed for DAYS project [1]. It proposes a location hierarchy and associates data with locations. The index structure is designed to support query on various types of data with different location granularity. The authors intelligently exploit an important property of the locations, i.e., containment relationship among the objects, to determine the relative location of an object with respect to its parent that contains the object. The containment relationship limits the search range of available data and thus facilitates efficient processing of the supported queries. In brief, a broadcast cycle consists of several sub-cycles, with each containing data belonging to the same type. A major index (one type of index buckets) is placed at the beginning of each sub-cycle. It provides information related to the types of data broadcasted, and enables clients to quickly jump into the right sub-cycle which contains her interested data. Inside a sub-cycle, minor indexes (another type of index buckets) are interleaved with data buckets. Each minor index contains multiple pointers pointing to the data buckets with different locations. Consequently, a search for a data object involves accessing a major index and several minor indexes.

The second category focuses on retrieving data according to specified distance metric, based on client’s current location. An example is nearest neighbor (NN) search based on Euclidian distance. According to the index structure, indexes of this category can be further clustered into two groups, i.e., central tree-based structure and distributed structure. In the following, we review some of the representative indexes of both groups.

D-tree is a paged binary search tree to index a given solution space in support of planar point queries [6]. It assumes a data type has multiple data instances, and each instance has a certain valid scope within which this instance is the only correct answer. For example, restaurant is a data type, and each individual restaurant represents an instance. Take NN search as an example, Fig. 3a illustrates four restaurants, namely o1, o2, o3, and o4, and their corresponding valid scopes p 1, p2, p3, and p4. Given any query location q in, say, p3, o3 is the restaurant to which q is nearest. D-tree assumes the valid scopes of different data instances are known and it focuses only on planar point queries which locate the query point into a valid scope and return the client the corresponding data instance.
Fig. 3

Index construction using the D-tree

The D-tree is a binary tree built based on the divisions between data regions (e.g., valid scopes). A space consisting of a set of data regions is recursively partitioned into two complementary subspaces containing about the same number of regions until each subspace has one region only. The partition between two subspaces is represented by one or more polylines. The overall orientation of the partition can be either x-dimensional or y-dimensional, which is obtained, respectively, by sorting the data regions based on their lowest/uppermost y-coordinates, or leftmost/rightmost x-coordinates. Figure 3b shows the partitions for the running example. The polyline pl(v2, v3, v4, v6) partitions the original space into p5 and p6, and polylines pl(v1,v3) and pl(v4,v5) further partition p5 into p1 and p2, and p6 into p3 and p4, respectively. The first polyline is y-dimensional and the remaining two are x-dimensional. Given a query point q, the search algorithm works as follows. It starts from the root and recursively follows either the left subtree or the right subtree that bounds the query point until a leaf node is reached. The associated data instance is then returned as the final answer.

Grid-partition index is specialized for NN problem [8]. It is motivated by the observation that an object is the NN only to the query points located inside its Voronoi Cell. Let O = {o1, o 2,…,o n } be a set of points. V(oi), the Voronoi cell (VC) for o i , is defined as the set of points q in the space such that dist(q,o i ) < dist(q,o j ), ∀ji. That is, V(oi) consists of the set of points for which o i is the NN. As illustrated in Fig. 3a, p1, p2, p3, and p4 denote the VCs for four objects, o1, o2, o3, and o4, respectively. Grid-partition index tries to reduce the search space for a query at the very beginning by partitioning the space into disjoint grid cells. For each grid cell, all the objects that could be NNs of at least one query point inside the grid cell are indexed, i.e., those objects whose VCs overlap with the grid cell are associated with that grid cell.

Figure 4a shows a possible grid partition for the running example, and the index structure is depicted in Fig. 4b. The whole space is divided into four grid cells; i.e., G1, G2, G3, and G 4. Grid cell G1 is associated with objects o1 and o2, since their VCs, p1 and p2, overlap with G1; likewise, grid cell G2 is associated with objects o1, o2, o3, and so on. If a given query point is in grid cell G1, the NN can be found among the objects associated with G1 (i.e., o1 and o2), instead of among the whole set of objects. Efficient search algorithms and partition approaches have been proposed to speed up the performance.
Fig. 4

Index construction using the grid-partition

Conventional spatial index R-tree has also been adapted to support kNN search in broadcast environments [2]. For R-tree index, the kNN search algorithm would visit index nodes and objects sequentially as backtracking is not feasible on the broadcast. This certainly results in a considerably long tuning time especially when the result objects are located in later part of the broadcast. However, if clients know that there are at least k objects in the later part of the broadcast that are closer to the query point than the currently found ones, they can safely skip the downloading of the intermediate objects currently located. This observation motivates the design of the enhanced kNN search algorithm which caters for the constraints of wireless broadcast. It requires each index node to carry a count of the underlying objects (object count) referenced by the current node. Thus, clients do not blindly download intermediate objects.

Hilbert Curve Index (HCI) is designed to support general spatial queries, including window queries, kNN queries, and continuous nearest-neighbor (CNN) queries in wireless broadcast environments. Motivated by the linear streaming property of the wireless data broadcast channel and the optimal spatial locality of the Hilbert Curve (HC), HCI organizes data according to Hilbert Curve order [7, 9], and adopts B+-tree as the index structure. Figure 5 depicts a 8 × 8 grid, with solid dots representing data objects. The numbers next to the data points, namely index value, represent the visiting orders of different points at Hilbert Curve. For instance, data point with (1,1) as the coordinates has the index value of 2, and it will be visited before data point with (2,2) as the coordinates because of the smaller index value.
Fig. 5

Hilbert curve index

The filtering and refining strategy is adopted to answer all the queries. For window query, the basic idea is to decide a candidate set of points along the Hilbert curve which includes all the points within the query window and later to filter out those outside the window. Suppose the rectangle shown in Fig. 5 is a query window. Among all the points within the search range, the first point is point a and the last is b, sorted according to their occurring orders on the Hilbert curve, and both of them are lying on the boundary of the search range. Therefore, all the points inside this query window should lie on the Hilbert curve segmented by points a and b. In other words, data points with index values between 18 and 29, but not the others, are the candidates. During the access, the client can derive the coordinates of data points based on the index values and then retrieve those within the query window.

For kNN query, the client first retrieves those k nearest objects to the query point along the Hilbert curve and then derives a range which for sure bounds at least k objects. In the filtering phase, a window query which bounds the search range is issued to filter out those unqualified. Later in the refinement phase, k nearest objects are identified according to their distance to the query point. Suppose an NN query at point q (i.e., index value 53) is issued. First, the client finds its nearest neighbor (i.e., point with index value 51) along the curve and derives a circle centered at q with r as the radius (i.e., the green circle depicted in Fig. 5). Since the circle bounds point 51, it is certain to contain the nearest neighbor to point q. Second, a window query is issued to retrieve all the data points inside the circle, i.e., points with index values 11, 32, and 51. Finally, the point 32 is identified as the nearest neighbor. The search algorithm for CNN adopts a similar approach. It approximates a search range which is guaranteed to bound all the answer objects, issues a window query to retrieve all the objects inside the search range, and finally filters out those unqualified.

All the indexes mentioned above are based on a central tree-based structure, like R-tree and B-tree. However, employing a tree-based index on a linear broadcast channel to support spatial queries results in several deficiencies. First, clients can only start the search when they retrieve the root node in the channel. Replicating the index tree in multiple places in the broadcast channel provides multiple search starting points, shortening the initial root-probing time. However, a prolonged broadcast cycle leads to a long access latency experienced by the clients. Second, wireless broadcast media is not error-free. In case of losing intermediate nodes during the search process, the clients are forced to either restart the search upon an upcoming root node or scan the subsequential broadcast for other possible nodes in order to resume the search, thus extending the tuning time. Distributed spatial index (DSI), a fully distributed spatial index structure, is motivated by these observations [5]. A similar distributed structure was proposed in [3] as well to support access to spatial data on air.

DSI is very different from tree-based indexes, and is not a hierarchical structure. Index information of spatial objects is fully distributed in DSI, instead of simply replicated in the broadcast. With DSI, the clients do not need to wait for a root node to start the search. The search process launches immediately after a client tunes into the broadcast channel and hence the initial probe time for index information is minimized. Furthermore, in the event of data loss, clients resume the search quickly.

Like HCI, DSI also adopts Hilbert curve to determine broadcast order of data objects. Data objects, mapped to point locations in a 2-D space, are broadcast in the ascending order of their HC index values. Suppose there are N objects in total, DSI chunks them into n F frames, with each having n o objects (n F = ⌈Nn o ⌉). The space covered by Hilbert Curve shown in Fig. 5 is used as a running example, with solid dots representing the locations of data objects (i.e., N = 8). Figure 6 demonstrates a DSI structure with n o set to 1, i.e., each frame contains only one object.
Fig. 6

Distributed spatial index

In addition to objects, each frame also has an index table as its header, which maintains information regarding to the HC values of data objects to be broadcast with specific waiting interval from the current frame. This waiting interval can be denoted by delivery time difference or number of data frames apart, with respect to the current frame. Every index table keeps n i entries, each of which, τ j , is expressed in the form of 〈HC j ,P j 〉, j ∈ [0,n i ). P j is a pointer to the r j -th frame after the current frame, where r (>1) is an exponential base (i.e., a system-wide parameter), and HC j is the HC value of the first object inside the frame pointed by P j . In addition to τ j , an index table also keeps the HC values HC k (k ∈ [1,n o ]) of all the objects obj k that are contained in the current frame. This extra information, although occupying litter extra bandwidth, can provide a more precise image of all the objects inside current frame. During the retrieval, a client can compare HC k s of the objects against the one she has interest in, so the retrieval of unnecessary object whose size is much larger than an HC value can be avoided.

Refer to the example shown in Fig. 5, with corresponding DSI depicted in Fig. 6. Suppose r = 2, n o  = 1, n F  = 8, and n i  = 3. The index tables corresponding to frames of data objects O 6 and O32 are shown in the figure. Take the index table for frame O6 as an example: τ0 contains a pointer to the next upcoming (20-th) frame whose first object’s HC value is 11, τ1 contains a pointer to the second (21-th) frame with HC value for the first object (the only object) 17, and the last entry τ2 points to the fourth (22-th) frame. It also keeps the HC value 6 of the object O6 in the current frame. Search algorithm for window queries and kNN searches are proposed.

Key Applications

Location-Based Service

Wireless broadcast systems, because of the scalability, provide an alternative to disseminate location-based information to a large number of users. Efficient air indexes enable clients to selectively tune into the channel and hence the power consumption is reduced.

Moving Objects Monitoring

Many moving objects monitoring applications are interested in finding out all the objects that currently satisfy certain conditions specified by the users. In many cases, the number of moving objects is much larger than the number of submitted queries. As a result, wireless broadcast provides an ideal way to deliver subscribed queries to the objects, and those objects that might affect the queries can then report their current locations.

Cross-References

Recommended Reading

  1. 1.
    Acharya D, Kumar V. Location based indexing scheme for days. In: Proc. 4th ACM Int. Workshop on Data Eng. for Wireless and Mobile Access; 2005. p. 17–24.Google Scholar
  2. 2.
    Gedik B, Singh A, Liu L. Energy efficient exact knn search in wireless broadcast environments. In: Proc. 12th ACM Int. Symp. on Geographic Inf. Syst.; 2004. p. 137–46.Google Scholar
  3. 3.
    Im S, Song M, Hwang C. An error-resilient cell-based distributed index for location-based wireless broadcast services. In: Proc. 5th ACM Int. Workshop on Data Eng. for Wireless and Mobile Access; 2006. p. 59–66.Google Scholar
  4. 4.
    Imielinski T, Viswanathan S, Badrinath BR. Data on air – organization and access. IEEE Trans Knowl Data Eng. 1997;9(3):353.CrossRefGoogle Scholar
  5. 5.
    Lee WC, Zheng B. Dsi: a fully distributed spatial index for wireless data broadcast. In: Proc. 23rd Int. Conf. on Distributed Computing Systems; 2005. p. 349–58.Google Scholar
  6. 6.
    Xu J, Zheng B, Lee W-C, Lee DL. The d-tree: an index structure for location-dependent data in wireless services. IEEE Trans Knowl Data Eng. 2002;16(12):1526–42.Google Scholar
  7. 7.
    Zheng B, Lee W-C, Lee DL. Spatial queries in wireless broadcast systems. ACM/Kluwer J Wirel Netw. 2004;10(6):723–36.CrossRefGoogle Scholar
  8. 8.
    Zheng B, Xu J, Lee W-C, Lee L. Grid-partition index: a hybrid method for nearest-neighbor queries in wireless location-based services. VLDB J. 2006;15(1):21–39.CrossRefGoogle Scholar
  9. 9.
    Zheng B, Lee W-C, Lee DL. On searching continuous k nearest neighbors in wireless data broadcast systems. IEEE Trans Mobile Comput. 2007;6(7):748–61.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Singapore Management UniversitySingaporeSingapore

Section editors and affiliations

  • Dimitris Papadias
    • 1
  1. 1.Dept. of Computer Science and Eng.Hong Kong Univ. of Science and TechnologyKowloonHong Kong SAR