Indoor localization based on cellular telephony RSSI fingerprints containing very large numbers of carriers
 4.2k Downloads
 8 Citations
Abstract
A new approach to indoor localization is presented, based upon the use of Received Signal Strength (RSS) fingerprints containing data from very large numbers of cellular base stationsup to the entire GSM band of over 500 channels. Machine learning techniques are employed to extract good quality location information from these highdimensionality input vectors. Experimental results in a domestic and an office setting are presented, in which data were accumulated over a 1month period in order to assure time robustness. Roomlevel classification efficiencies approaching 100% were obtained, using Support Vector Machines in oneversusone and oneversusall configurations. Promising results using semisupervised learning techniques, in which only a fraction of the training data is required to have a room label, are also presented. While indoor RSS localization using WiFi, as well as some rather mediocre results with lowcarrier count GSM fingerprints, have been discussed elsewhere, this is to our knowledge the first study to demonstrate that good quality indoor localization information can be obtained, in diverse settings, by applying a machine learning strategy to RSS vectors that contain the entire GSM band.
Keywords
Support Vector Machine Receive Signal Strength Indoor Localization Standard Support Vector Machine Base Transceiver StationList of abbreviations
 BTS
base transceiver station
 CV
crossvalidation
 KNN K
nearest neighbor
 OEM
original equipment manufacturer
 PCA
principal components analysis
 RSS
received signal strength
 SVM
support vector machines.
1. Introduction
The accurate localization of persons or objects, both indoors and out of doors, is an interesting scientific challenge with numerous practical applications [1]. With the advent of inexpensive, implantable GPS receivers, it is tempting to suppose that the localization problem is today solved. Such receivers, however, require a minimum number of satellites in visibility in order to function properly, and as a result become virtually unusable in 'urbancanyon' and indoor scenarios.
The use of received signal strength measurements, or RSS, from local beacons, such as those found in WiFi, Bluetooth, Infrared, or other types of wireless networks, has been widely studied as an alternative solution when GPS is not available [2, 3, 4, 5, 6, 7, 8, 9]. A major drawback of this approach, of course, is the necessity of installing and maintaining the wireless networking equipment upon which the system is based.
Solutions exploiting RSS measurements from radiotelephone networks such as GSM and CDMA, both for indoor and outdoor localization, have also been discussed in the literature [10, 11, 12, 13, 14]. The nearubiquity of cellular telephone networks allows in this case to imagine systems for which the required network infrastructure and maintenance are assured from the start, and recent experimental results [15, 16, 17] have furthermore suggested that efficient indoor localization may be achievable in a home environment using RSS measurements in the GSM band. The main contribution of the present article is to demonstrate conclusively that GSM can indeed provide an attractive alternative to WiFibased and other techniques for indoor localization, as long as the GSM RSS vectors used are allowed to include the entire GSM band. The article outlines a new technique for accurate indoor localization based on RSS vectors containing up to the full complement of more than 500 GSM channels, derived from monthlong data runs taken in two different geographical locations.
Input RSS vectors of such high dimensionality are known to be problematical for simple classification and regression methods. In the present article, we analyze the RSS vectors with machine learning tools [18, 19] in order to extract localization information of good quality. The use of statistical learning techniques to analyze real or simulated WLAN and GSM RSS vectors has been discussed in [5, 6, 12], with promising results, however never using very high RSS dimensionalities such as those treated here. A second major contribution of our article is thus to demonstrate that good indoor localization can be obtained by extending machine learningbased localization techniques to RSS vectors of very high dimensionality, in this case the full GSM band. This use of the entire available set of GSM carrierswhich may include base stations far away from the mobile to be locatedallows the algorithms to extract a maximum of information from the radio environment, and thereby provide better localization than what is possible using the more standard approach of RSS vectors containing a few tens, at most, of the most powerful carriers.
It is worth stating from the outset that a classification approach to localization has been chosen in this work. In the literature examples may be found of localization treated as a problem of regression, i.e., estimating an actual physical position and quoting a mean positioning error ([3, 12], etc.) or of classification, in which localization space is partitioned and the performance evaluated as a percentage of correct localizations ([4, 11, 15], etc.). One of the objectives of our research is to determine if measurements taken in different rooms can be grouped together reliably, which would allow to envisage, for example, a persontracking system for use in a multiroom interior environment. It is for this reason that a classification approach was chosen here. This choice constitutes a third particularity of the approach presented in our article.
Section 2 of the article describes the experimental conditions and geographical sites at which the data were taken; the different RSS vectors used, which, following standard nomenclature, we call fingerprints, are also defined here. The machine learning techniques used are presented in Section 3, where we adopt a classification approach which labels each fingerprint with the index number of the room in which it was recorded. In Section 4, we introduce the idea of applying semisupervised learning techniques to our datasets, in order to make our method applicable in the case where only a fraction of the training data are positionlabeled. The semisupervised approach is interesting, as has been pointed out, for example, in [4], because obtaining position labels for all points in a large dataset is expensive and time consuming. Finally, in Section 5, we present some conclusions and ideas for further study. An appendix provides basic information on the machine learning techniques used in the present investigation.
2. Measurement sites and datasets
2.1. Datataking environment
The data used in our study were obtained by scanning the entire GSM band, which is one of the original aspects of our work. Two distinct datasets were created.
For each dataset (home and lab), the identical measuring device (TEMS for home, M2M for lab) was used for all scans. Indeed, tests showed that training with one M2M device and testing on another often gave poor results. This effect was later found to be due to variations in the device antennas used, and could be eliminated in future work. Nevertheless, the use of two different types of devices for our data recording (TEMS and M2M), as well as the choice of acquisition sites which are well separated both geographically and in time, gives an indication of the general applicability of our method.
The TEMS trace mobile is in appearance identical to a standard GSM telephone, the trace characteristics being implemented via hardware modification to the handset. The M2M modems are essentially bare GSM modem chipsets meant to be incorporated into various OEM (original equipment manufacturer) products such as vending machines, vehicles, etc.
Most commercial implementations of fingerprintbased outdoor GSM localization exploit the standard Network Measurement Reports, NMR, which, according to the GSM norm, the mobile station transmits to its serving Base Transceiver Station (BTS) roughly twice per second during a communication. Each 7element NMR contains the RSS measurements of fixedpower beacon signals emanating from the serving BTS and its six strongest neighbors. In contrast, the frequency scans recorded by our TEMS and M2M modules are performed in idle mode, that is, when no call is in progress. Although NMRs are thus not available in our data, the scans nonetheless contain data on all channels, and include, at least in principle, the BSIC of each channel. This allows, for example, to 'construct' an NMR artificially, as was done in the definition of the Current Top 7 fingerprint in Section 2.2.
During a scan, in addition to obtaining the RSS value at each frequency, the trace mobile attempts to synchronize with the beacon signal in order to read the BSIC value. Failure to obtain a BSIC can occur for two reasons: (1) the signal to noise + interference ratio is poor, perhaps because the BTS in question is located far from the mobile; or (2) the channel being measured is a traffic channel which therefore does not contain a BSIC. As traffic channels are not emitted at constant power and may employ frequency hopping, one might initially conclude that they will not useful for localization (as the hopping sequence is unknown, an RSS value in this case just represents the observed power at a given frequency, averaged over a few GSM frames). Rather than introduce this bias into our data a priori, we chose to ignore BSICs and allow the variable selection procedure to decide which inputs were useful. This choice is not without cost, as it does not guarantee that from one scan to the next the data at a particular frequency is always from the same BTS. As we shall discover later, however, traffic channels do in fact turn out to be amongst those selected by the learning algorithm as being important.
As described earlier, to create a database entry, a human operator manually positions the trace mobile, initiates the scan, and labels the resulting RSS vector with its class index (i.e., room number). The training set thus accumulated over a period of time can then be used to build a classifier capable of labeling new RSS vectors obtained in the same geographical area. In such a supervised training scenario, the necessity of an extensive handlabeled training set for each measurement site is clearly a drawback. For this reason we also examine, in Section 4, semisupervised training techniques, which require only a fraction of the database entries to be labeled.
2.2. Preprocessing and variable selection
In the home (TEMS) scans, 10 empty carrier slots which always contained a small, fixed value were removed, leaving 488 values. This procedure was not found necessary for the lab (M2M) scans, and all 534 carriers were retained. For both scan sets, the total number of dataset entries is quite limited compared to the dimensionality of the RSS vectors. To address this problem, three types of fingerprints, containing subsets of carriers, were defined as described below.
In the following, we denote by N_{max} the total number of carriers in the carrier set under study: N_{max} = 488 in the home scans, and N_{max} = 534 for the lab scans. We define matrix RSS as the full observation matrix, whose element RSS _{ ij } is the strength value of carrier j in dataset entry i. In other words, each row of RSS contains the received signal strength values measured at a given location, and each column contains the received signal strength values of a given carrier in the carrier set under investigation. Thus, RSS has M rows and N_{max} columns, where M is the number of dataset entries (i.e., the number of GSM band scans in the dataset).
All N_{max}Carriers
This fingerprint includes the entire set of carriers, i.e., each column of RSS is a fingerprint, of dimension N_{max}. Its consequent high dimensionality limits the complexity of the classifiers which can be used in its evaluation, as we shall see in the presentation of the results.
N Strongest
The N Strongest fingerprint contains the RSS values of the N carriers which are strongest when averaged over the entire training set. Therefore, it involves a reduced observation matrix RSS_{1}, derived from the full observation matrix by deleting the columns corresponding to carriers that are not among the N strongest on the average; therefore, RSS_{1} has M rows and N columns. The value of N is determined as follows: the strongest (on average) carrier is selected, a classifier is trained with this onedimensional fingerprint, and the number of correctly classified examples on the validation set (see Section 3.1 on model training and selection) is computed. Another classifier is trained with the (twodimensional) fingerprint comprised of the measured RSS values of the strongest and second strongest carriers. The procedure is iterated, increasing the fingerprint dimension by appending successively new carriers, in order of decreasing average strength, to the fingerprint. The procedure is stopped when the number of correctly classified examples of the validation set no longer increases significantly. N is thus the number of carriers which maximizes classifier performance. It may be different for different types of classifiers, as shown in the results section; it is typically in the 200400 range.
Current Top 7
As mentioned earlier, since our scans were obtained in idle mode, we do not have access to standard NMRs. It is nevertheless interesting to have a 'benchmark' fingerprint of low dimensionality to which we may compare results obtained with our 'wider' fingerprints. This is the role of Current Top 7. While it would be desirable to use as fingerprint of location i the vector of measured strengths of the seven strongest carriers at location i, this is problematical since most classifiers require an input vector of fixed format. Therefore, the Current Top 7 fingerprint is defined as follows: it contains the measured strengths of the carriers which were among the seven strongest on at least one training set entry. This fingerprint has a fixed format, for a given training set, and a typical length of about 40 carriers for our data. Therefore, in this context, the reduced observation matrix RSS_{2} has M rows and about 40 columns. In each row, i.e., for a given GSM band scan, only seven elements are defined; the remaining elements of the row are simply set to zero.
Once a fingerprint has been chosen, a subsequent principal component analysis (PCA, see appendix) can be applied in order to obtain a further reduction in dimensionality. This allows us to construct more parsimonious classifiers, which can then be compared to those which use the primary variables only.
3. Supervised classification algorithms
An introduction to supervised classification by machine learning methods is provided in the Appendix, with emphasis on the classification method (support vector machines) and preprocessing technique (principal component analysis) adopted in the present article.
3.1. Model training and selection
We consider the indoor localization problem as a multiclass classification problem, where each room is a class. Therefore, given a fingerprint that is not present in the training dataset, the classifier should provide the label of the room where it was measured. We describe in Section 3.2 two strategies that turn multiclass classification problems into a combination of twoclass (also termed 'binary' or 'pairwise') classification problems; therefore, the present section focuses on training and model selection for twoclass classifiers.
where σ is a hyperparameter whose value is obtained by crossvalidation (see below).
where α_{ i } and b are the parameters of the classifier, y_{ i } = ± 1 and x_{ i } are the class label and the fingerprint of dataset entry i (i.e., row i of RSS, RSS_{1}, or RSS_{2} depending on the fingerprint used by the classifier), respectively, and K(.) is the chosen kernel.
As the procedure outlined corresponds to supervised classification, all dataset entries are labeled. The numbers of examples of each class were balanced in each fold.
The SVMs used in our study, both with linear and Gaussian kernels, were implemented using the Spider toolbox [23].
In order to obtain baseline results, K nearest neighbor (KNN) classifiers using the Euclidean distance in RSSspace were implemented. The hyperparameter K was determined by the same crossvalidation procedure as for SVM's.
3.2. Decision rules for multiclass discrimination
When the discrimination problem involves more than two classes, it is necessary, for pairwise classifiers such as SVM, to define a method that allows to combine multiple pairwise classifiers into a single multiclass classifier. This can be done in two ways: onevsall and onevsone.
3.2.1. The onevsall approach
3.2.2. Onevsone classification
The decision rule in this case is based on a vote. First, the outputs of all classifiers are calculated. Now let C_{ i,j } be the output of the classifier specializing in separating class i from class j. If C_{ i,j } is 1, the tally for class i is incremented; if it is 1, the class tally of class j is increased by 1. Finally, the class assigned to the example is that having the highest vote tally.
A disadvantage of the onevsone technique is of course the increase in the number of classifiers required as compared to onevsall. In our case of five classes, 10 classifiers are required, which still remains manageable.
3.3. Results

recorded at two different locations;

taken at moments widely separated in time (approx. 2 years);

realized under substantially different experimental conditions.
The performance of each classifier is presented as the percentage of test set examples which are correctly classified. There is no rejection class.
On the home set, when PCA is not used, the number of input variables exceeds the number of training set examples for all but the Current Top 7 fingerprint. Using geometrical arguments, Cover's theorem [24] states that in this case, the training set will always be linearly separable, which can of course also be verified using the HoKashyap algorithm. From a practical standpoint, this means that, due to the small size of the training set, it is not meaningful to test nonlinear classifiers on these fingerprints (unless a dimensionality reducing PCA is applied first).
This difficulty is less frequently posed in the lab set, which is of somewhat larger size. Cover's theorem in fact comes into play here only in the cases of onevsone classifiers (with the exception of the Current Top 7 fingerprint), and of onevsall classifiers applied to the All N_{max}Carriers fingerprint.
3.3.1. Results on the home set
We recall that the home set is composed of 241 scans containing RSS vectors with 488 GSM carriers. Of the 241, 61 scans are chosen at random to make up the test set. The remaining 180 examples are used to tune and select classifiers using the cross validation strategy.
Percentage of correctly classified test set examples (home set)
Classifier  Current Top 7  N Strongest  All N_{ max }(= 488) carriers 

Linear SVM  
Onevsone  
w/PCA  57.4 (PC = 8)  96.7 (N = 360, PC = 8)  96.7 (PC = 8) 
w/o PCA  68.9  95.1 (N = 210)  96.7 
Onevsall  
w/PCA  62.3 (PC = 8)  85.2 (N = 420, PC = 4)  85.2 (PC = 4) 
w/o PCA  60.6  98.4 (N = 340)  95.1 
Gaussian SVM  
Onevsone  *  *  * 
Onevsall  
w/PCA  65.6 (PC = 8)  88.5 (N = 420, PC = 4)  88.5 (PC = 4) 
w/o PCA  68.8  98.4 (N = 140)  ** 
K NN  54.1 (K = 7)  95.1 (N = 240, K = 10)  91.8 (K = 12) 
From Table 1, we may immediately remark that the Current Top 7 fingerprint, which is meant to mimic a standard 7carrier NMR, never provides better than 69% classification efficiency. In comparison, when the RSS vectors are extended to include the strongest 340 carriers, for example, a linear, onevsall SVM correctly classifies 98.4% of the test set examples. Indeed, when large numbers of carriers are retained, seven of the nine SVM classifiers presented in the table are able to correctly classify over 95% of the test set examples. The application of PCA to the high carrier count fingerprints leads to a performance degradation in the onevsall mode, which can be recovered, however, by preferring the more sensitive onevsone approach. The principal result, which including large numbers of GSM carriers in the RSS fingerprints leads to very good performance, is very clear.
3.3.2. Results on the lab set
The lab dataset is made up of 601 scans containing RSS vectors of 534 carriers. A test set was constructed from 101 randomly selected scans, leaving 500 for the crossvalidation procedure.
Percentage of correctly classified test set examples (lab set)
Classifier  Current Top 7  N Strongest  All N_{ max }(= 534) carriers 

Linear SVM  
Onevsone  
w/PCA  38.6 (PC = 8)  70.3 (N = 490, PC = 10)  70.3 (PC = 8) 
w/o PCA  35.6  98 (N = 280)  100 
Onevsall  
w/PCA  32.6 (PC = 8)  59.6 (N = 520, PC = 10)  59.6 (PC = 10) 
w/o PCA  45.5  95.1 (N = 390)  94.1 
Gaussian SVM  
Onevsone  *  *  * 
Onevsall  
w/PCA  49.5 (PC = 10)  76.6 (N = 530, PC = 10)  68.3 (PC = 10) 
w/o PCA  54.5  96.6 (N = 290)  ** 
K NN  52.5 (K = 6)  68.3 (N = 320, K = 13)  71.3 (K = 10) 
4. Semisupervised classification
As was pointed out earlier, the RSS scans are manually labeled during data acquisition. In largescale environments, this is a tedious and time consuming task, which impinges in a negative way on the future development of real world applications of the localization techniques proposed here. A more favorable scenario would be one in which the acquisitions take place automatically, and the user is required to intervene only occasionally to provide labels to help the learning algorithm discover the appropriate classes. Semisupervised learning algorithms function in exactly this way.
Several methods of performing semisupervised classification are described in the machine learning literature [25, 26]. Encouraged by the good performance obtained with supervised SVMs, we have chosen to test a kernelbased semisupervised approach known as the Transductive SVM, or TSVM [27], which has been applied with success, for example, in text recognition [27] and image processing [28].
A TSVM functions similarly to a standard SVM, that is, by finding the hyperplane which is as far as possible from the nearest training examples, with the key difference that some of the examples have class labels, and others do not. The TSVM learning algorithm consists of two stages:

In the first stage, a standard SVM classification is performed using only the labeled data. The classification function of Equation 2 is then used to assign classes to the unlabeled points in the training set.

The second stage of the algorithm solves an optimization problem whose goal is to move the unlabeled points away from the class boundary by minimizing a cost function. This function is composed of a regularization term and two errorpenalization terms, one for the labeled examples, and the other for those which were initially unlabeled (and for which labels were predicted in the first stage). The optimization is carried out by successive permutation of the predicted labels. Permutations of two labels which lead to a reduction in the cost function are carried out, while all others are forbidden. The optimization terminates when no further permutations are possible.
As in the case of standard SVMs, regularization and the use of a nonlinear kernel introduce hyperparameters whose values are to be estimated during the crossvalidation process. In our study, the TSVM was implemented using the SVM^{light} toolbox [29].
The presence of unlabeled data renders a data partition like that of Figure 3 impossible. In order to build a classifier with the best possible generalization performance, we have defined a new partition which differs from the one traditionally proposed [27, 30]. The procedure is described below.
The results are presented in the next section. A K NN classifier was also evaluated, for comparison. K NN cannot make use of the unlabeled data: the nearest neighbors that are relevant for classifying an entry are its labeled neighbors only. The hyperparameter K was determined in the validation procedure.
4.1. Results
We note first that since the class labels of many of the training examples are unknown, it is not possible to carry out a onevsone strategy. Thus, only the onevsall approach was implemented here.
4.1.1. Results on the home set
In order to make the performances of the TSVM classifiers directly comparable to those obtained using SVMs, the test set was chosen to be the same 61 example one that was used to make Table 1. The data partition was implemented as indicated in Figure 6, allocating 40 examples to the validation set, and 140 to the training set, 100 of which are unlabeled. This choice thus imitates a scenario in which some 80/180 = 44% of the data is labeled (where we consider that the test set is used here only for purposes of evaluating the viability of our method).
Percentage of correctly classified test set examples for the TSVM (home set)
TSVM Classifier  Current Top 7  N Strongest  All N_{max} (= 488) carriers 

Linear  
w/PCA  54.1 (PC = 4)  95.1 (N = 350, PC = 4)  93.4 (PC = 4) 
w/o PCA  55,7  98.4 (N = 370)  98.4 
Gaussian  
w/PCA  52.5 (PC=10)  98.4 (N = 280, PC = 6)  96.7 (PC = 7) 
w/o PCA  62,3  98.4 (N = 330)   
K NN  50.8 (K=4)  91.8 (N = 200, K = 4)  86.8 (K = 5) 
4.1.2. Results on the lab set
We recall that the lab dataset contains 601 scans. The test set of 101 examples that was used to create Table 2 is again employed for the TSVM. The training set here contains 400 examples, of which 100 are labeled, with the validation being performed on the 100 remaining examples. Thus, for the lab set, the operating scenario is one in which 200/500 = 40% of the data is labeled, the 101 examples of the test being used only to evaluate the validity of our approach.
Percentage of correctly classified test set examples (lab set)
TSVM Classifier  Current Top 7  N Strongest  All N_{max} (= 534) carriers 

Linear  
w/PCA  40.6 (PC = 10)  60.4 (N = 260, PC = 10)  62.4 
w/o PCA  32.7  87.1 (N = 350)  81.2 
Gaussian  
w/PCA  38.6 (PC = 10)  47.5 (N = 250, PC = 10)  48.5 
w/o PCA  37.6  75.2 (N = 350)   
KNN  37.6 (K = 6)  55.5 (N = 450, K = 5)  55.4 (K = 5) 
5. Conclusion
We have presented a new approach to indoor localization, founded upon the inclusion of very large numbers of carriers in the GSM RSS fingerprints followed by an analysis with appropriate machine learning techniques. The method has been tested on datasets taken at two different geographical locations and widely separated in time. In both cases, roomlevel classification performance approaching 100% was obtained. To the best of our knowledge, this is the first demonstration that indoor localization of very good quality can be obtained from fullband GSM fingerprints, by making proper use of relatively unsophisticated machine learning tools. We have also presented promising results from a new variant of the TSVM semisupervised machine learning algorithm, which should go a long way towards alleviating the difficulty of obtaining large numbers of positionlabeled RSS fingerprints.
The results obtained in our study allow to imagine new localization services and applications which are of very low cost and complexity, due to being based upon the cellular telephone networks which today are almost ubiquitous throughout the world. In the study presented here, the localization algorithms were always executed offline on standard processors. In future, such a system could be implemented either on the handset or on a server. In the first case, the GSM band scan and location estimation calculations are performed in the handset itself; in the second, GSM band scans performed by the handset are sent to a server, where the position is estimated.
A more ambitious measurement campaign, including several more geographical locations, finer positioning grids, and multiple RSS measuring devices, is currently in the development stage. In addition to helping assess the viability of our approach over a wider range of environments, this study will also allow us to answer certain questions which were not addressed in the current work, for example:

What is the ability of the method to identify on which the floor of a building a mobile is localized?

How will the performance behave in environments with relatively poor GSM coverage (rural areas, etc.)?

What is the true nature of the time stability of the method? Will the database need to be updated regularly and if so on what time scale? Although our tests showed that coherence over a onemonth period is possible, these temporal aspects need to be evaluated rigorously.
Studies of additional types of semisupervised learning algorithms, as well as methods of predicting RSS values, are envisioned in order to continue to address the time consuming labeling task in large scale environments. An exploration of timedependent modeling techniques, a more elaborate variable selection procedure, a more sophisticated multiclass discrimination approach, and the incorporation of other types of sensors in our measuring devices for added redundancy, are also envisioned.
Appendix
We provide here basic information that may be useful to readers who are not familiar with supervised classification by statistical machine.
Supervised classification by machine learning
Supervised classification consists of assigning one class, out of several known classes, to an object described by a vector of variables (also termed 'descriptors') x. In the present article, x is a GSM fingerprint. For simplicity, we consider here twoclass problems: an object i belonging to class A has label y_{ i } = +1, while an object belonging to class B has label y_{ i } = 1; two extensions to multiclass problems are described in the text.
We take the traditional classifier design strategy that consists of (i) postulating a parameterized function f(x, θ) where θ is a vector of adjustable parameters, and (ii) estimating the vector θ such that the classification rule the object described by x belongs to A if sgn(f(x,θ)) > 0 , and it belongs to B otherwise classifies all possible objects of the two classes with a minimal rate of classification errors. The equation of the surface that separates the two classes in descriptor space is thus f(x,θ) = 0.
In order to estimate the parameters of the classifier, a database called training set is necessary; it contains a collection of objects ('examples') that are known and that have been labeled by a 'supervisor', hence the term 'supervised learning'. In the present study, fingerprint measurements have been performed, and each fingerprint has been recorded together with the label of the room where the measurement was performed. The difficulty of the training task stems from the fact that a finite number of examples are available, while the resulting classifier should be optimal for all possible objects: there is a risk that the classifier classify correctly all available examples but perform poorly on other objects of the class. Such a classifier is said to be overfitted to the training data; it generalizes poorly. Clearly, if the postulated function is given a very large number of adjustable parameters that can vary on an arbitrarily large scale, i.e., if the postulated function is very flexible, it may define a very complicated separation surface between the two classes, which classifies correctly all examples of the training set and generalizes poorly to other objects of the classes. One way to alleviate this problem consists of preventing the parameters from becoming too large; this is known as regularization. Conversely, if the postulated function is not complex enough, i.e., is too 'stiff', it may define a boundary surface that lacks flexibility to accommodate the training data, hence generalize poorly. Therefore, the central problem in classifier design by machine learning methods is that of finding a boundary surface of appropriate complexity; the complexity of a function is accurately defined by its VapnikCervonenkis (VC) dimension, whose description goes beyond the scope of the present appendix.
Support vector machines
Support Vector Machines (SVMs) are classifiers that feature a builtin regularization mechanism, and are guaranteed to produce classifiers of optimal complexity.
First assume that the examples present in the training set are linearly separable, i.e., a postulated function of the form f(x,θ) = x θ provides a boundary surface that classifies all examples of the training set without errors. In other words, all examples of the training set can be perfectly separated by a straight line if x is of dimension 2, by a plane if x is of dimension 3, and by a hyperplane if descriptor space is of dimension larger than 3. Then the parameters of the optimal hyperplane are obtained by solving numerically the following constrained optimization problem: minimize θ under the constraint that all examples are correctly classified; these constraints are linear inequalities. The fact that θ is minimized during training provides SVMs with an automatic regularization mechanism.
where the sum runs over all examples of the training set.
It is shown that the only nonzero parameters α_{ i } pertain to the examples of the training set that lie exactly on the margin (the support vectors), i.e., are located closest to the separating surface. Therefore, the number of nonzero parameters is usually much smaller than the number of examples, a straightforward consequence of the regularization mechanism present in the definition of the SVM.
If the examples are not linearly separable, the dot product (x _{ i } x_{ j }) can be replaced by an appropriate kernel function K (x_{ i }, x_{ j }), which is equivalent to defining a new feature space z = φ(x) such that φ(x _{ i } ) φ(x _{ j } ) = K(x _{ i }, x _{ j }). If the training examples are linearly separable in the new feature space, the SVM machinery can be applied exactly as described above. The most popular kernel is the Gaussian kernel $K\left(x,y\right)=exp\frac{{\u2225xy\u2225}^{2}}{{\sigma}^{2}}$; the width σ is a hyperparameter whose value is found by crossvalidation as described below.
where C is a hyperparameter, termed regularization constant, whose value is found by crossvalidation. The larger the value of C, the more stringent the constraint of correct classification of all examples. The number of support vectors is equal to the number of examples that lie within the margin of the classifier; in the present study, about 25% of the training examples were found to be support vectors.
Crossvalidation consists of the following procedure: the available dataset is divided into two disjoint subsets: a training/validation set and a test set.
The training/validation set is in turn divided into D disjoint subsets or folds. A classifier is trained on D  1 folds, and the resulting classifier is applied to the examples present in the remaining fold. The number of classification errors on these examples is stored in memory, and the procedure is iterated D times, so that each example in the training/validation set is present once and only once in a validation set. The validation score is the overall classification error rate, computed by counting the error on all examples in the validation sets, thereby providing an estimate of the performance of the classifier. The same procedure is repeated for various values of the hyperparameters, and the combination of the hyperparameters giving the classifier with the smallest validation score is retained. Finally, a classifier with the optimal hyperparameter combination is trained with all examples of the training/validation set; its performance is subsequently assessed on the test set, whose examples have never been used before, thereby providing a statistically valid estimate of the classifier performance.
Data preprocessing by principal component analysis
In this study, when PCA was used, classifiers with an increasing number of principal components were trained in succession; the error rate of each classifier was computed on a validation set, until the addition of a new principal component did not increase the validation score significantly. As shown in Tables 1 and 2, 4 to 10 principal components were found useful in our study.
PCA should not be confused with variable selection procedures that assess the relevance of the variables, since PCA takes into account the variables only, and does not take into account the quantity to be predicted, i.e., the class of the item to be classified.
Notes
Acknowledgements
The authors wish to acknowledge the reviewers and the editorinchief for numerous comments and suggestions for improving our article. They also acknowledge contributions by Rémi Dubois of Sigma Laboratory and the numerous research interns who contributed to the project over the past few years.
Supplementary material
References
 1.Küpper A: LocationBased Services: Fundamentals and Operation. John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England; 2005.CrossRefGoogle Scholar
 2.Ladd AM, Bekris KE, Rudys A, Kavraki LE, Wallach DS: On the feasibility of using wireless ethernet for indoor localization. IEEE Trans Robot Autom 2004, 20(3):555559. 10.1109/TRA.2004.824948CrossRefGoogle Scholar
 3.Brunato M, Battiti R: Statistical Learning Theory for Location Fingerprinting in Wireless LANs, Computer Networks and ISDN Systems April 2005. Volume 47. Elsevier Science Publishers, Amsterdam; 2005:825845.Google Scholar
 4.Yang Q, Jialin Pan S, Wenchen Zheng V: Estimating location using WiFi. IEEE Intell Syst 2008, 23(1):813.CrossRefGoogle Scholar
 5.Fang SH, Lin TN, Lin PC: Location fingerprinting in a decorrelated space. IEEE Trans Knowledge Data Eng 2008, 20(5):685691.CrossRefGoogle Scholar
 6.Fang SH, Lin TN: Indoor location system based on discriminantadaptive neural network in IEEE 802.11 environments. IEEE Trans Neural Netw 2008, 19(11):19731978.CrossRefGoogle Scholar
 7.Hong SH, Kim BK, Eom DS: Localization algorithm in wireless sensor networks with network mobility. IEEE Trans Consum Electron 2009, 55(4):19211928.CrossRefGoogle Scholar
 8.Kuo SP, Tseng YC: A scrambling method for fingerprint positioning based on temporal diversity and spatial dependency. IEEE Trans Knowledge Data Eng 2008, 20(5):678684.CrossRefGoogle Scholar
 9.Lee H, Lee S, Kim Y, Chong H: Grouping multiduolateration localization using partial space information for indoor wireless sensor networks. IEEE Trans Consum Electron 2009, 55(4):19501958.CrossRefGoogle Scholar
 10.Zimmerman D, Baumann J, Layh M, Landstorfer F, Hoppe R, Wölfle G: database correlation for positioning of mobile terminals in cellular networks using wave propagation models. Proceedings of IEEE 60th Vehicular Technology Conference 2629 2004, 7: 46824686.Google Scholar
 11.Otsason V, Varshavsky A, LaMarca A, de Lara E: Accurate GSM indoor localization. In Proceedings of the 7th International Conference on Ubiquitous Computing, UbiComp 2005. Edited by: M Beigl. SpringerVerlag, Berlin, Heidelberg; 2005:141158.CrossRefGoogle Scholar
 12.Wu Z, Li C, Ng JKY, Leung KRPH: Location estimation via support vector regression. IEEE Trans Mob Comput 2007, 6(3):311321.CrossRefGoogle Scholar
 13.Denby B, Oussar Y, Ahriz I: Geolocalisation in Cellular Telephone Networks, Proceedings of the NATO 2007 Advanced Study Institute on Mining Massive DataSets for Security. Edited by: F FogelmanSoulié, D Perrotta, J Piskorski, R Steinberger. IOS Press, Amsterdam, The Netherlands; 2008.Google Scholar
 14.ur Rehman W, de Lara E, Saroiu S, CILoS: a CDMA indoor localization system. In Proceedings of the 10th International Conference on Ubiquitous Computing, UbiComp 2008. Seoul, Korea; 2008:2124.Google Scholar
 15.Denby B, Oussar Y, Ahriz I, Dreyfus G: Highperformance indoor localization with fullband GSM fingerprints. In Proceedings IEEE International Conference on Communications, Workshop on Synergies in Communication and Localization (SyCoLo). Dresden, Germany; 2009.Google Scholar
 16.Ahriz I, Oussar Y, Denby B, Dreyfus G: Carrier relevance study for indoor localization using GSM. In Proceedings of the 7th Workshop on Positioning, Navigation and Communication 2010. Dresden, Germany; 2010:1112.Google Scholar
 17.Ahriz I, Oussar Y, Denby B, Dreyfus G: Fullband GSM fingerprints for indoor localization using a machine learning approach. International Journal of Navigation and Observation 2010: 7. Article ID 497829Google Scholar
 18.Cristianini N, ShaweTaylor J: Support Vector Machines and Other KernelBased Learning Methods. Cambridge University Press, Cambridge; 2000.CrossRefGoogle Scholar
 19.Dreyfus G: Neural Networks Methodology and Applications. Springer, SpringerVerlag Berlin Heidelberg; 2005.Google Scholar
 20.Tems Mobile System. [Online][http://www.ericsson.com/solutions/tems/]
 21.Telit GM862GPS module. [Online][http://www.telit.com/en/products/gsmgprs.php]
 22.Ho YC, Kashyap RL: An algorithm for linear inequalities and its applications. IEEE Trans Electron Comput 1965, 14(5):683688.CrossRefGoogle Scholar
 23.The Spider. [Online][http://www.kyb.tuebingen.mpg.de/bs/people/spider/]
 24.Cover TM: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput 1965, 14: 326334.CrossRefGoogle Scholar
 25.Chapelle O, Schölkopf B, Zien A: SemiSupervised Learning. MIT Press, Cambridge, MA; 2006.CrossRefGoogle Scholar
 26.Zhu X: SemiSupervised Learning Literature Survey, Department of Computer Sciences, University of Wisconsin at Madison, Madison, WI, Technical Report 1530.2006. [http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf]Google Scholar
 27.Joachim T: Transductive inference for text classification using support vector machines. International Conference on Machine Learning (ICML) 1999, 200209.Google Scholar
 28.Jia J, Cai L: A TSVMbased minutiae matching approach for fingerprint verification. International Workshop on Biometric Recognition Systems (IWBRS) 2005, 8594.Google Scholar
 29.SVM^{light}. [Online][http://svmlight.joachims.org/]
 30.Wang J, Shen X, Pan W: On transductive support vector machines. Contem Math 2007, (443):719.Google Scholar
Copyright information
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.