Abstract
The following problem is considered. Given a finite sequence of Euclidean points, find a subsequence of the longest length (size) such that the sum of squared distances between the elements of this subsequence and its unknown centroid (geometrical center) is at most a given percentage of the sum of squared distances between the elements of the input sequence and its centroid. This problem models, in particular, one of the data analysis problems, namely, search for the maximum subset of elements close to each other in the sense of the bounded from above the total quadratic scatter in the set of time-ordered data. It can be treated as a data editing problem aimed at the removal of extraneous (dissimilar) elements. It is shown that the problem is strongly NP-hard. A polynomial time approximation algorithm is proposed. It either finds out that the problem has no solutions or outputs a 1/2-approximate solution if the length \(M^*\) of an optimal subsequence is even, or it outputs a \((M^* - 1)/2M^*\)-approximate solution if \(M^*\) is odd. Some examples of numerical experiments illustrating the algorithm suitability are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kel’manov, A.V., Pyatkin, A.V.: On the complexity of some problems of choosing a vector subsequence. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 52(12), 2284–2291 (2012). (in Russian)
Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: Approximation algorithms for some intractable problems of choosing a vector subsequence. J. Appl. Indust. Math. 6(4), 443–450 (2012)
Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: Exact pseudopolynomial algorithms for some NP-hard problems of searching a vectors subsequence. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 53(1), 143–153 (2013). (in Russian)
Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: An approximation scheme for the problem of finding a subsequence. Numer. Anal. Appl. 10(4), 313–323 (2017)
de Waal, T., Pannekoek, J., Scholtus, S.: Handbook of Statistical Data Editing and Imputation. Wiley, Hoboken (2011)
Osborne, J.W.: Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data, 1st edn. SAGE Publication, Inc., Los Angeles (2013)
Greco, L.: Robust Methods for Data Reduction Alessio Farcomeni. Farcomeni. Chapman and Hall/CRC, Boca Raton (2015)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Aggarwal, C.C.: Data Mining: The Textbook. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning (Adaptive Computation and Machine Learning Series). The MIT Press, Cambridge (2017)
Fu, T.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Kuenzer, C., Dech, S., Wagner, W. (eds.): Remote Sensing Time Series. RSDIP, vol. 22. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15967-6
Liao, T.W.: Clustering of time series data — a survey. Pattern Recognit. 38(11), 1857–1874 (2005)
Ageev, A.A., Kel’manov, A.V., Pyatkin, A.V., Khamidullin, S.A., Shenmaier, V.V.: Approximation polynomial algorithm for the data editing and data cleaning problem. Pattern Recognit. Image Anal. 27(3), 365–370 (2017)
Kel’manov, A.V., Romanchenko, S.M.: An approximation algorithm for solving a problem of search for a vector subset. J. Appl. Indust. Math. 6(1), 90–96 (2012)
Kel’manov, A.V., Romanchenko, S.M.: An FPTAS for a vector subset search problem. J. Appl. Indust. Math. 8(3), 329–336 (2014)
Kel’manov, A.V., Khamidullin, S.A.: Posterior detection of a given number of identical subsequences in a quasi-periodic sequence. Comput. Math. Math. Phys. 41(5), 762–774 (2001)
Acknowledgments
The study presented in Sects. 2, 3 and 5 was supported by the Russian Science Foundation, project 16-11-10041. The study presented in Sects. 4 and 6 was supported by the Russian Foundation for Basic Research, projects 16-07-00168 and 18-31-00398, by the Russian Academy of Science (the Program of Basic Research), project 0314-2016-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Kel’manov, A., Pyatkin, A., Khamidullin, S., Khandeev, V., Shamardin, Y.V., Shenmaier, V. (2018). An Approximation Polynomial Algorithm for a Problem of Searching for the Longest Subsequence in a Finite Sequence of Points in Euclidean Space. In: Eremeev, A., Khachay, M., Kochetov, Y., Pardalos, P. (eds) Optimization Problems and Their Applications. OPTA 2018. Communications in Computer and Information Science, vol 871. Springer, Cham. https://doi.org/10.1007/978-3-319-93800-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-93800-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93799-1
Online ISBN: 978-3-319-93800-4
eBook Packages: Computer ScienceComputer Science (R0)