Advertisement

Efficient List-Based Computation of the String Subsequence Kernel

  • Slimane Bellaouar
  • Hadda Cherroun
  • Djelloul Ziadi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8370)

Abstract

Kernel methods are powerful tools in machine learning. They have to be computationally efficient. In this paper, we present a novel list-based approach to compute efficiently the string subsequence kernel (SSK). Our main idea is that our list-based SSK reduces to range query problem. We started by the construction of a match list L(s,t) = {(i,j):s i  = t j } where s and t are the strings to be compared; such match list contains only the required data that contribute to the result. To do some intermediate processing efficiently, we constructed a layered range tree and applied the corresponding computational geometry algorithms. Moreover, we extended our match list to be a list of lists in order to improve the computation efficiency of the SSK. The whole process takes O(|L|log|L| + pK) time and O(|L|log|L| + K) space, where |L| is the size of the match list, p is the length of the SSK and K is the total reported points by range queries over all the entries of the list.

Keywords

string kernel computational geometry layered range tree range query 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975), http://doi.acm.org/10.1145/361002.361007 CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Bentley, J.L.: Decomposable searching problems. Inf. Process. Lett. 8(5), 244–251 (1979), http://dblp.uni-trier.de/db/journals/ipl/ipl8.html#Bentley79 CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Bentley, J.L., Maurer, H.A.: Efficient worst-case data structures for range searching. Acta Inf. 13, 155–168 (1980), http://dblp.uni-trier.de/db/journals/acta/acta13.html#BentleyM80 CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Berg, M.D., Cheong, O., Kreveld, M.V., Overmars, M.: Computational Geometry: Algorithms and Applications, 3rd edn. Springer-Verlag TELOS, Santa Clara (2008)Google Scholar
  5. 5.
    Chazelle, B., Guibas, L.J.: Fractional cascading: I. a data structuring technique. Algorithmica 1(2), 133–162 (1986)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Cristianini, N., Shawe-Taylor, J.: An introduction to support Vector Machines: and other kernel-based learning methods. Cambridge University Press, New York (2000)CrossRefGoogle Scholar
  7. 7.
    Leslie, C., Eskin, E., Noble, W.: Mismatch String Kernels for SVM Protein Classification. In: Neural Information Processing Systems, vol. 15, pp. 1441–1448 (2003), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.4737
  8. 8.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002), http://dx.doi.org/10.1162/153244302760200687 zbMATHGoogle Scholar
  9. 9.
    Rousu, J., Shawe-Taylor, J.: Efficient computation of gapped substring kernels on large alphabets. J. Mach. Learn. Res. 6, 1323–1344 (2005), http://dl.acm.org/citation.cfm?id=1046920.1088717 zbMATHMathSciNetGoogle Scholar
  10. 10.
    Samet, H.: The design and analysis of spatial data structures. Addison-Wesley Longman Publishing Co., Inc., Boston (1990)Google Scholar
  11. 11.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Slimane Bellaouar
    • 1
  • Hadda Cherroun
    • 1
  • Djelloul Ziadi
    • 2
  1. 1.Laboratoire LIMUniversité Amar TelidjiLaghouatAlgérie
  2. 2.Laboratoire LITIS - EA 4108Université de RouenRouenFrance

Personalised recommendations