Machine Learning

, Volume 60, Issue 1–3, pp 73–96 | Cite as

Ranking and Reranking with Perceptron

  • Libin Shen
  • Aravind K. Joshi


This work is inspired by the so-called reranking tasks in natural language processing. In this paper, we first study the ranking, reranking, and ordinal regression algorithms proposed recently in the context of ranks and margins. Then we propose a general framework for ranking and reranking, and introduce a series of variants of the perceptron algorithm for ranking and reranking in the new framework. Compared to the approach of using pairwise objects as training samples, the new algorithms reduces the data complexity and training time. We apply the new perceptron algorithms to the parse reranking and machine translation reranking tasks, and study the performance of reranking by employing various definitions of the margins.


natural language processing perceptron ranking reranking margin 


  1. Charniak, E. (2000). A maximum-entropy-inspired parser. In J. Wiebe (Ed.), Proceedings of the 1st meeting of the north American chapter of the association for computational linguistics (pp. 132–139). Washington, USA: Seattle.Google Scholar
  2. Collins, M., (1999). Head-Driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania.Google Scholar
  3. Collins, M. (2000). Discriminative reranking for natural language parsing. In P. Langley (Ed.), Proceedings of the 17th International Conference on Machine Learning (pp. 175–182). Standord, CA, USA: Morgan Kaufmann.Google Scholar
  4. Collins, M. (2004). Parameter estimation for statistical parsing models: Theory and practice of distribution-free methods. In H. Bunt, J. Carroll, & G. Satta (Eds.), New developments in parsing technology. Kluwer Academic Publishers.Google Scholar
  5. Collins, M., & Duffy N. (2002). New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In E. Charniak, & D. Lin (Eds.), Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL)(pp. 263–270). Philadelphia, PA, USA.Google Scholar
  6. Crammer, K. & Singer, Y. (2003). A family of additive online algorithms for category ranking. Journal of Machine Learning Research, 3:Feb, 1025–1058.CrossRefMathSciNetGoogle Scholar
  7. Crammer, K. & Singer, Y. (2001). PRanking with ranking. In Z. G. Thomas, G. Dietterich, & Suzanna Becker (Eds.), Proceedings of the 15th Annual Conference Neural Information Processing Systems (pp. 641–647). Vancouver, British Columbia, Canada: The MIT Press.Google Scholar
  8. Crammer, K. & Singer, Y. (2002). On the learnability and design of output codes for multiclass problems. Machine Learning, 47:(2/3), 201–233.CrossRefGoogle Scholar
  9. Cristianini, N., & Shawe-Taylor J. (2000). An introduction to support vector machines and other kernel-based learning mathods. Cambridge University Press.Google Scholar
  10. Har-Peled, S., Roth, D., & Zimak, D. (2002). Constraint classification: A newapproach to multiclass classification. In N. Cesa-Bianchi, M. Numao, & R. Reischuk (Eds.), Proceedings of the 13th Conference on Algorithmic Learning Theory (pp. 365–379). Lubeck, Germany: Springer-Verlag.Google Scholar
  11. Harrington, E. F. (2003). Online ranking/collaborative filtering using the perceptronalgorithm. In T. Fawcett & N. Mishra (Eds.), Proceedings of the 20th International Conference on Machine Learning (pp. 250–257). Washington, DC, USA: AAAI Press.Google Scholar
  12. Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large Margin Rank Boundaries for Ordinal Regression. In Advances in Large Margin Classifiers. (pp. 115–132). The MIT Press.Google Scholar
  13. Krauth, W. & Mezard, M. (1987). Learning algorithms with optimal stability in neural networks. Journal of Physics A, 20, 745–752.Google Scholar
  14. Li, Y., Zaragoza, H., Herbrich, R., Shawe-Taylor, J., & Kandola, J. (2002). The perceptron algorithm with uneven margins. In A. G. H. Claude Sammut (Ed.), Proceedings of the 19th International Conference on Machine Learning (pp. 379–386). Sydney, Australia: Morgan Kaufmann.Google Scholar
  15. Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1994). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19:2), 313–330.Google Scholar
  16. Novikoff, A. B. J. (1962). On convergence proofs on perceptrons. The Symposium on the Mathematical Theory of Automata, 12, 615–622.Google Scholar
  17. Och, F. J. (2003). Minimum error rate training for statistical machine translation. In E. W. Hinrichs, & D. Roth (Eds.), Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 160–167). Sapporo, Japan.Google Scholar
  18. Och, F. J., Gildea, D., Khudanpur, S., Sarkar, A., Yamada, K., Fraser, A., Kumar, S., Shen, L., Smith, D., Eng, K., Jain, V., Jin, Z., & Radev, D. (2004). A smorgasbord of features for statistical machine translation. In S. Dumais, D. Marcu, & S. Roukos (Eds.), Proceedings of the 2004 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp. 161–168). Boston, MA, USA.Google Scholar
  19. Papineni, K., Roukos, S., & Ward, T. (2001). Bleu: a method for automatic evaluation of machine translation. IBM Research Report, RC22176.Google Scholar
  20. Ratnaparkhi, A. (1997). A Linear Observed Time Statistical Parser based on Maximum Entropy Models. In C. Cardie, & R. Weischedel (Eds.), Proceedings of the 2nd Conference of Empirical Methods in Natural Language Processing (pp. 1–10). Providence, Rhode Island, USA.Google Scholar
  21. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408.PubMedGoogle Scholar
  22. Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1997). Boosting the margin: A new explanation for the effectiveness of voting methods. In D. H. Fisher (Ed.), Proceedings of the 14th International Conference on Machine Learning (pp. 322–330). Nashville, Tennessee, USA: Morgan Kaufmann.Google Scholar
  23. Shen, L. & Joshi, A. K. (2003). An SVM based voting algorithm with application to parse reranking. In W. Daelemans & M. Osborne (Eds.), Proceedings of the 7th Conference on Computational Natural Language Learning (pp. 9–16). Edmonton, Canada: Morgan Kaufmann.Google Scholar
  24. Shen, L., & Joshi, A. K. (2004). Flexible margin selection for reranking with full pairwise samples. In K. Su, & J. Tsujii (Eds.), Proceedings of the 1st International Joint Conference of Natural Language Processing (pp. 467–474). Sanya, Hainan Island, China.Google Scholar
  25. Shen, L., Sarkar, A., & Och, F. J. (2004). Discriminative Reranking for Machine Translation. In S. Dumais, D. Marcu, & S. Roukos (Eds.), Proceedings of the 2004 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (pp. 177–184). Boston, MA, USA.Google Scholar
  26. Vapnik, V. N. (1998). Statistical Learning Theory. John Wiley.Google Scholar
  27. Widrow, B., & Hoff, M. E. (1960). Adaptive switching circuits. IRE WESCON Convention Record, part 4.Google Scholar
  28. Zhang, T. (2000). Large margin winnow methods for text categorization. In KDD-2000 Workshop on Text Mining(pp. 81–87). Boston, MA, USA.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.Department of Computer and Information ScienceUniversity of PennsylvaniaPhiladelphia

Personalised recommendations