# Algorithms and Hardness Results for Nearest Neighbor Problems in Bicolored Point Sets

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10807)

## Abstract

In the context of computational supervised learning, the primary objective is the classification of data. Especially, the goal is to provide the system with “training” data and design a method which uses the training data to classify new objects with the correct label. A standard scenario is that the examples are points from a metric space, and “nearby” points should have “similar” labels. In practice, it is desirable to reduce the size of the training set without compromising too much on the ability to correctly label new objects. Such subsets of the training data are called as edited sets. Wilfong [SOCG ’91] defined two types of edited subsets: consistent subsets (those which correctly label all objects from the training data) and selective subsets (those which correctly label all new objects the same way as the original training data). This leads to the following two optimization problems:
• : Given k sets of points $$P_1, P_2, \ldots , P_k$$ in a metric space $$\mathcal X$$, the goal is to choose subsets of points $$P'_i \subseteq P_i$$ for $$i=1,2,\ldots ,k$$ such that $$\forall \ p \in P_i$$ its nearest neighbor among $$\bigcup _{j=1}^{k} P'_j$$ lies in $$P'_i$$ for each $$i\in [k]$$ while minimizing (Note that we also enforce the condition $$|P'_i|\ge 1\ \forall \ i\in [k]$$.) the quantity $$\sum _{i=1}^k |P'_i|$$.

• : Given k sets of points $$P_1, P_2, \ldots , P_k$$ in a metric space $$\mathcal X$$, the goal is to choose subsets of points $$P'_i \subseteq P_i$$ for $$i=1,2,\ldots ,k$$ such that $$\forall \ p \in P_i$$ its nearest neighbor among $$\Big (\bigcup _{j=1, j\ne i}^{k} P_j\Big ) \cup P'_i$$ lies in $$P'_i$$ for each $$i\in [k]$$ while minimizing (Note that we again enforce the condition $$|P'_i|\ge 1\ \forall \ i\in [k]$$.) the quantity $$\sum _{i=1}^k |P'_i|$$.

While there have been several heuristics proposed for these two problems in the computer vision and machine learning community, the only theoretical results for these problems (to the best of our knowledge) are due to Wilfong [SOCG ’91] who showed that both 3-MCS-($$\mathbb {R}^2$$) and 2-MSS-($$\mathbb {R}^2$$) are NP-complete. We initiate the study of these two problems from a theoretical perspective, and obtain several algorithmic and hardness results.

On the algorithmic side, we first design an $$O(n^2)$$ time exact algorithm and $$O(n\log n)$$ time 2-approximation for the 2-MCS-($$\mathbb {R}$$) problem, i.e., the points are located on the real line. Moreover, we show that the exact algorithm also extends to the case when the points are located on the circumference of a circle. Next, we design an $$O(r^2)$$ time online algorithm for the 2-MCS-($$\mathbb {R}$$) problem such that $$r<n$$, where n is the set of points and r is an integer. Finally, we give a PTAS for the k-MSS-($$\mathbb {R}^2$$) problem. On the hardness side, we show that both the 2-MCS and 2-MSS problems are NP-complete on graphs. Additionally, the problems are W[2]-hard parameterized by the size k of the solution. For points on the Euclidean plane, we show that the 2-MSS problem is contained in W[1]. Finally, we show a lower bound of $$\varOmega (\sqrt{n})$$ bits for the storage of any (randomized) algorithm which solves both 2-MCS-($$\mathbb {R}$$) and 2-MSS-($$\mathbb {R}$$).

## References

1. 1.
Lokshtanov, D., Marx, D., Saurabh, S.: Lower bounds based on the exponential time hypothesis. Bull. EATCS 105, 41–72 (2011)
2. 2.
Kushilevitz, E., Nisan, N.: Communication Compelxity. Cambridge University Press, Cambridge (1997)
3. 3.
Wilfong, G.T.: Nearest neighbor problems. Int. J. Comput. Geom. Appl. 2(4), 383–416 (1992)
4. 4.
Levinson, S.E.: Structural methods in automated speech recognition. Proc. IEEE 73(11), 1625–1650 (1985)
5. 5.
Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14(3), 515–516 (1968)
6. 6.
Tappert, C.C., Suen, C.Y., Wakahara, T.: The state of the art in online handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12(8), 787–808 (1990)
7. 7.
Gates, G.W.: The reduced nearest neighbour rule. IEEE Trans. Inf. Theory 18(3), 431–433 (1972)
8. 8.
Masuyama, S., Ibaraki, T., Hasegawa, T.: The computational complexity of the m-center problems in the plane. IEEE Trans. IECE Jpn. 64(2), 57–64 (1981)Google Scholar
9. 9.
Agarwal, P., Pach, J., Sharir, M.: State of the union-of geometric objects. In: Godman, J., Pach, J., Pollack, R. (eds.) Surveys in Discrete and Computational Geometry Twenty Years Later. Contemporary Mathematics, vol. 453, pp. 9–48 (2008)Google Scholar
10. 10.
Mustafa, N.H., Ray, S.: PTAS for geometric hitting set problem. In: Proceedings of the 27th(ACM) Symposium on Computational Geometry, pp. 17–22 (2009)Google Scholar
11. 11.
Flum, J., Grohe, M.: Parameterized Complexity Theory, Texts in Theoretical Computer Science. An EATCS Series. Springer, Heidelberg (2006).
12. 12.
Hitter, G.L., Woodruff, H.B., Lowry, S.R., Isenhour, T.L.: An algorithm for a selective nearest neighbor rule. IEEE Trans. Inf. Theory 21, 665–669 (1975)
13. 13.
Agarwal, P.K., Sharir, M.: Red-blue intersection detection algorithms, with applications to motion planning and collision detection. SIAM J. Comput. 19(2), 297–321 (1990)
14. 14.
Arkin, E.M., Daz-Bez, J.M., Hurtado, F., Kumar, P., Mitchell, J.S.B., Palop, B., Prez-Lantero, P., Saumell, M., Silveira, R.I.: Bichromatic 2-center of pairs of points. Comput. Geom. 48(2), 94–107 (2015)