Pattern Classification Using a Penalized Likelihood Method

  • Ahmed Al-Ani
  • Amir F. Atiya
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5998)


Penalized likelihood is a well-known theoretically justified approach that has recently attracted attention by the machine learning society. The objective function of the Penalized likelihood consists of the log likelihood of the data minus some term penalizing non-smooth solutions. Subsequently, maximizing this objective function would lead to some sort of trade-off between the faithfulness and the smoothness of the fit. There has been a lot of research to utilize penalized likelihood in regression, however, it is still to be thoroughly investigated in the pattern classification domain. We propose to use a penalty term based on the K-nearest neighbors and an iterative approach to estimate the posterior probabilities. In addition, instead of fixing the value of K for all pattern, we developed a variable K approach, where the number of neighbors can vary from one sample to another. The chosen value of K for a given testing sample is influenced by the K values of its surrounding training samples as well as the most successful K value of all training samples. Comparison with a number of well-known classification methods proved the potential of the proposed method.


Support Vector Machine Posterior Probability Class Membership Training Pattern Reproduce Kernel Hilbert Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Green, P.: Penalized likelihood. In: Encyclopedia of Statistical Sciences, Update vol. 3 (1999)Google Scholar
  2. 2.
    Gu, C., Kim, Y.-J.: Penalized likelihood regression: general formulation and efficient approximation. Canadian Journal of Statistics 29 (2002)Google Scholar
  3. 3.
    Green, P.J., Silverman, B.W.: Nonparametric Regression and Generalized Linear Models: a Roughness Penalty Approach. Chapman and Hall, London (1994)zbMATHGoogle Scholar
  4. 4.
    Berry, S.M., Carroll, R.J., Ruppert, D.: Bayesian smoothing and regression splines for measurement error problems. J. Amer. Statist. Assoc. 97, 160–169 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Wahba, G.: Spline Models for Observational Data. SIAM, Philadelphia (1990)zbMATHGoogle Scholar
  6. 6.
    OSullivan, F., Yandell, B., Raynor, W.: Automatic smoothing of regression functions in generalized linear models. J. Amer. Statist, Assoc. 81, 96–103 (1986)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Gu, C.: Cross-validating non-gaussian data. J. Comput. Graph. Statist. 1, 169–179 (1992)CrossRefGoogle Scholar
  8. 8.
    Lu, F., Hill, G.C., Wahba, G., Desiati, P.: Signal probability estimation with penalized likelihood method on weighted data, Technical Report, No. 1106. Department of Statistics, University of Wisconsin (2005)Google Scholar
  9. 9.
    Wahba, G.: Soft and hard classification by reproducing kernel hilbert space methods. Proc. Nat. Acad. Sciences 99, 16524–16530 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Wahba, G., Gu, C., Wang, Y., Chappell, R.: Soft classification, a.k.a. risk estimation, via penalized log likelihood and smoothing spline analysis of variance, Technical Report, No. 899. Department of Statistics, University of Wisconsin (1993)Google Scholar
  11. 11.
    Loader, C.: Local Regression and Likelihood. Springer, Heidelberg (1999)zbMATHGoogle Scholar
  12. 12.
    Cawley, G., Talbot, N.L., Girolami, M.: Sparse multinomial logistic regression via bayesian l1 regularisation. In: Proceedings NIPS, pp. 209–216 (2007)Google Scholar
  13. 13.
    Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman and Hall, Boca Raton (1990)zbMATHGoogle Scholar
  14. 14.
  15. 15.
    Holmes, C.C., Adams, N.M.: A probabilistic nearest neighbour method for statistical pattern recognition. Journal Royal Statistical Society B 64, 295–306 (2002)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Jensen, R., Erdogmus, D., Principe, J.C., Eltoft, T.: The laplacian classifier. IEEE Trans. Signal Processing 55, 3262–3271 (2007)CrossRefGoogle Scholar
  17. 17.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)Google Scholar
  18. 18.
    Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, Boca Raton (1986)zbMATHGoogle Scholar
  19. 19.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge (2005)Google Scholar
  20. 20.
    Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
  21. 21.
    Zouhal, L., Denoeux, T.: An evidence-theoretic k-nn rule with parameter optimization. IEEE Trans. Syst. Man Cyber. 28, 263–271 (1998)CrossRefGoogle Scholar
  22. 22.
    Asuncion, D.J.: UCI Machine Learning Repository (2007),
  23. 23.
    Zhang, C.-X., Zhang, J.-S.: Rotboost: a technique for combining roataion forest and adaboost. Pattern Recognition Letters 29, 1524–1536 (2008)CrossRefGoogle Scholar
  24. 24.
    Webb, G.: Multiboosting: a technique for combining boosting and wagging. Machine Learning 40, 159–196 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Ahmed Al-Ani
    • 1
  • Amir F. Atiya
    • 2
  1. 1.Faculty of Engineering and Information TechnologyUnivesity of TechnologySydneyAustralia
  2. 2.Department of Computer EngineeringCairo UniversityGizaEgypt

Personalised recommendations