A Second-Order Perceptron Algorithm

  • Nicolò Cesa-Bianchi
  • Alex Conconi
  • Claudio Gentile
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2375)


We introduce a variant of the Perceptron algorithm called second-order Perceptron algorithm, which is able to exploit certain spectral properties of the data. We analyze the second-order Perceptron algorithm in the mistake bound model of on-line learning and prove bounds in terms of the eigenvalues of the Gram matrix created from the data. The performance of the second-order Perceptron algorithm is affected by the setting of a parameter controlling the sensitivity to the distribution of the eigenvalues of the Gram matrix. Since this information is not preliminarly available to on-line algorithms, we also design a refined version of the second-order Perceptron algorithm which adaptively sets the value of this parameter. For this second algorithm we are able to prove mistake bounds corresponding to a nearly optimal constant setting of the parameter.


Unit Norm Vector Nonzero Eigenvalue Adaptive Parameter Forward Algorithm Hinge Loss 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Angluin, D. (1988). Queries and concept learning. Machine Learning, 2(4), 319–342.Google Scholar
  2. 2.
    Auer, P., & Warmuth, M. K. (1998). Tracking the best disjunction. Machine Learning, 32(2), 127–150.zbMATHCrossRefGoogle Scholar
  3. 3.
    Auer, P., Cesa Bianchi, N., & Gentile, C. (2001). Adaptive and self-confident online learning algorithms. Journal of Computer and System Sciences, to appear.Google Scholar
  4. 4.
    Auer, P. (2000). Using Upper Confidence Bounds for Online Learning. In 41st FOCS, IEEE, pp. 270–279.Google Scholar
  5. 5.
    Azoury K. S., & Warmuth, M. K. (2001). Relative loss bounds for on-line density estimation with the exponential familiy of distributions. Machine Learning, 43(3), 211–246.zbMATHCrossRefGoogle Scholar
  6. 6.
    Ben-Israel, A. & Greville, T. N. E. (1974). Generalized Inverses: Theory and Applications. John Wiley and Sons.Google Scholar
  7. 7.
    Block, H. D. (1962). The perceptron: A model for brain functioning. Reviews of Modern Physics, 34, 123–135.zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth, M. K. (1997). How to use expert advice. J. ACM, 44(3), 427–485.zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2001). On the generalization ability of on-line learning algorithms. In NIPS 13, MIT Press, to appear.Google Scholar
  10. 10.
    Cristianini, N. & Shawe-Taylor, J. (2001). An Introduction to Support Vector Machines. Cambridge University Press.Google Scholar
  11. 11.
    Deerwester, S., Dumais, S. T., Furnas, G. W., Laundauer, T. K., & Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.CrossRefGoogle Scholar
  12. 12.
    Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern Classification. John Wiley and Sons.Google Scholar
  13. 13.
    Gentile, C. & Warmuth, M. (1998). Linear hinge loss and average margin. In NIPS 10, MIT Press, pp. 225–231.Google Scholar
  14. 14.
    Gentile, C. (2001). A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2, 213–242.CrossRefMathSciNetGoogle Scholar
  15. 15.
    Grove, A. J., Littlestone, N., & Schuurmans, D. (2001). General convergence results for linear discriminant updates. Machine Learning Journal, 43(3), 173–210.zbMATHCrossRefGoogle Scholar
  16. 16.
    Herbster, M., & Warmuth, M. K. (1998). Tracking the best expert. Machine Learning Journal, 32(2), 151–178.zbMATHCrossRefGoogle Scholar
  17. 17.
    Hoerl, A., & Kennard, R. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.zbMATHCrossRefGoogle Scholar
  18. 18.
    Horn, R. A., & Johnson, C. R. (1985). Matrix Analysis. Cambridge University Press.Google Scholar
  19. 19.
    Kivinen, J., Warmuth, M. K., & Auer, P. (1997). The perceptron algorithm vs. winnow: linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, 97, 325–343.zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Li, Y., & Long, P. (2002). The relaxed online maximum margin algorithm. Machine Learning Journal, 46(1/3), 361–387.zbMATHCrossRefGoogle Scholar
  21. 21.
    Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2(4), 285–318.Google Scholar
  22. 22.
    Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108:2, 212–261.CrossRefMathSciNetGoogle Scholar
  23. 23.
    Marcus, M., & Minc, H. (1965). Introduction to Linear Algebra. Dover.Google Scholar
  24. 24.
    Novikov, A. B. J. (1962). On convergence proofs on perceptrons. Proc. of the Symposium on the Mathematical Theory of Automata, vol. XII, pp. 615–622.Google Scholar
  25. 25.
    Press, W. H., Flannery, B. P., Teukolsky, S. A., & Wetterling. W. T. (1989). Numerical Recipes in Pascal. Cambridge University Press.Google Scholar
  26. 26.
    Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408.CrossRefMathSciNetGoogle Scholar
  27. 27.
    Vapnik, V. (1998). Statistical learning theory. New York: J. Wiley & Sons.zbMATHGoogle Scholar
  28. 28.
    Vovk, V. (1990). Aggregating strategies. In 3rd COLT, Morgan Kaufmann, pp. 371–383.Google Scholar
  29. 29.
    Vovk, V. (2001). Competitive on-line statistics. International Statistical Review, 69, 213–248.zbMATHCrossRefGoogle Scholar
  30. 30.
    Williamson, R. C., Shawe-Taylor, J., Schölkopf, B., & Smola, A. (1999). Sample based generalization bounds. Technical Report NC-TR-99-055, NeuroCOLT.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Nicolò Cesa-Bianchi
    • 1
  • Alex Conconi
    • 1
  • Claudio Gentile
    • 2
  1. 1.Dept. of Information TechnologiesUniversità di MilanoItaly
  2. 2.CRIIUniversità dell’InsubriaItaly

Personalised recommendations