Path Kernels and Multiplicative Updates
We consider a natural convolution kernel defined by a directed graph. Each edge contributes an input. The inputs along a path form a product and the products for all paths are summed. We also have a set of probabilities on the edges so that the outflow from each node is one. We then discuss multiplicative updates on these graphs where the prediction is essentially a kernel computation and the update contributes a factor to each edge. Now the total outflow out of each node is not one any more. However some clever algorithms re-normalize the weights on the paths so that the total outflow out of each node is one again. Finally we discuss the use of regular expressions for speeding up the kernel and re-normalization computation. In particular we rewrite the multiplicative algorithms that predict as well as the best pruning of a series parallel graph in terms of efficient kernel computations.
KeywordsEdge Weight Regular Expression Kernel Computation Syntax Tree Path Weight
Unable to display preview. Download preview PDF.
- [FS97]Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, August 1997.Google Scholar
- [Hau99]David Haussler. Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, Univ. of Calif. Computer Research Lab, Santa Cruz, CA, 1999.Google Scholar
- [HPW02]D. P. Helmbold, S. Panizza, and M. K. Warmuth. Direct and indirect algorithms for on-line learning of disjunctions. Theoretical Computer Science, 2002. To appear.Google Scholar
- [KRS01]Roni Khardon, Dan Roth, and Rocco Servedio. Efficiency versus convergence of Boolean kernels for on-line learning algorithms. In Advances in Neural Information Processing Systems 14, 2001.Google Scholar
- [KW97]J. Kivinen and M. K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1–64, January 1997.Google Scholar
- [Moh98]Mehryar Mohri. General algebraic frameworks and algorithms for shortest distance problems. Technical Report 981219-10TM, AT&T Labs-Research, 1998.Google Scholar
- [MW98]M. Maass and M. K. Warmuth. Efficient learning with virtual threshold gates. Information and Computation, 141(1):66–83, February 1998.Google Scholar
- [TW99]Eiji Takimoto and Manfred K. Warmuth. Predicting nearly as well as the best pruning of a planar decision graph. In 10th ALT, volume 1720 of Lecture Notes in Artificial Intelligence, pages 335–346, 1999. To appear in Theoretical Computer Science.Google Scholar
- [Wat99]Chris Watkins. Dynamic alignment kernels. Technical Report CSD-TR-98-11, Royal Holloway, University of London, 1999.Google Scholar