Realtime Data Mining pp 209-225 | Cite as
The Big Picture: Toward a Synthesis of RL and Adaptive Tensor Factorization
- 1.7k Downloads
Abstract
We explore the subject of uniting the control-theoretic with the factorization-based approach to recommendation, arguing that tensor factorization may be employed to vanquish combinatorial complexity impediments related to more sophisticated MDP models that take a history of previous states rather than one single state into account. Specifically, we introduce a tensor representation of transition probabilities of Markov-k-processes and devise a Tucker-based approximation architecture that relies crucially on the notion of an aggregation basis described in Chap. 6. As our method requires a partitioning of the set of state transition histories, we are left with the challenge of how to determine a suitable partitioning, for which we propose a genetic algorithm.
Keywords
Factorization-based Approach State Value Function Core Tensor Prolonged Accumulation Generalized Markov PropertyReferences
- [BT96]Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)zbMATHGoogle Scholar
- [Hol92]Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. The MIT Press, Cambridge (1992)Google Scholar
- [Pap11]Paprotny, A.: Multilevel Methods for Dynamic Programming: Deterministic and Stochastic Iterative Methods with Application to Recommendation Engines. AVM – Akademische Verlagsgemeinschaft, München (2011)Google Scholar
- [TVR97]Tsitsiklis, J.N., Roy, B.V.: An analysis of temporal-difference learning with function approximation. IEE Trans. Autom. Control 42(5), 674–690 (1997)CrossRefzbMATHGoogle Scholar
- [Zim06]Zimmermann K.-H.: Diskrete Mathematik (in German). Books on Demand, Norderstedt (2006)Google Scholar
- [Ziv04]Ziv, O.: Algebraic multigrid for reinforcement learning. Master’s Thesis, Technion (2004)Google Scholar
- [ZS05]Ziv, O., Shimkin, N.: Multigrid methods for policy evaluation and reinforcement learning. In: 2005 International Symposium on Intelligent Control (2005)Google Scholar