Abstract
This paper explores the possibility of combining an actor and critic in one architecture and uses a mixture of updates to train them. It describes a model for robot navigation that uses architecture similar to an actor-critic reinforcement learning architecture. It sets up the actor as a layer seconded by another layer which deduce the value function. Therefore, the effect is to have similar to a critic outcome combined with the actor in one network. The model hence can be used as the base for a truly deep reinforcement learning architecture that can be explored in the future. More importantly this work explores the results of mixing conjugate gradient update with gradient update for the mentioned architecture. The reward signal is back propagated from the critic to the actor through conjugate gradient eligibility trace for the second layer combined with gradient eligibility trace for the first layer. We show that this mixture of updates seems to work well for this model. The features layer have been deeply trained by applying a simple PCA on the whole set of images histograms acquired during the first running episode. The model is also able to adapt to a reduced features dimension autonomously. Initial experimental result on real robot shows that the agent accomplished good success rate in reaching a goal location.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vardy, A., Moller, R.: Biologically plausible visual homing methods based on optical flow techniques. Connection Sci. 17, 47–89 (2005)
Tomatis, N., et al.: Combining topological and metric: a natural integration for simultaneous localization and map building. In: Presented at Proceedings of the Fourth European Workshop on Advanced Mobile Robots (Eurobot) (2001)
Zeil, J.: Visual homing: an insect perspective, Current Opinion in Neurobiology. 22(2), 285–293 (2012). ISSN 0959-4388
Sutton, R.S., Barto, A.: Reinforcement Learning, an introduction. MIT Press, Cambridge (1998)
Konda, V., Tsitsiklis, J.: Actor-Critic algorithms. In: Presented at NIPS 12 (2000)
Ziv, O., Shimkin, N.: Multigrid methods for policy evaluation and reinforcement learning. In: Presented at IEEE International Symposium on Intelligent Control, Limassol (2005)
Zhang, C., et al.: Efficient multi-agent reinforcement learning through automated supervision. In: Presented at International Conference on Autonomous Agents Estoril, Portugal (2008)
Bhatnagar, S., et al.: Incremental natural actor-critic algorithms. In: Presented at Neural Information Processing Systems (NIPS19) (2007)
Hinton, G., et al.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Coates, A., et al.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS 14 (2011)
Vincent, P., et al.: Extracting and composing robust features with denoising autoencoders. In: ICML (2008)
Andrew, Ng et al.: Tutorial in Deep Learning: Stanford University (2010). http://ufldl.stanford.edu/tutorial/
LeCun, Y., et al.: Learning methods for generic object recognition with invariance to pose and lighting. In: CVPR (2004)
Altahhan, A.: A robot visual homing model that traverses conjugate gradient TD to a variable λ TD and uses radial basis features. In: Mellouk, A. (ed.) Advances in Reinforcement Learning, pp. 225–254. InTech Education and Publishing, Vienna (2011)
Altahhan, A.: Robot visual homing using conjugate gradient temporal difference learning, radial basis features and a whole image measure. In: International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain (2010). ISBN: 978-1-4244-6916-1
Altahhan, A., et al.: Visual robot homing using sarsa(λ), whole image measure, and radial basis function. In: International Joint Conference on Neural Networks (IJCNN), Hong Kong (2008)
Nocedal, J., Wright, S.: Numerical Optimization. Springer-Verlag, New York, 978-0-387-30303-1, 2nd Edition (2006)
Sutton, R.S., et al.: A new Q(lambda) with interim forward view and Monte Carlo equivalence. In: Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP vol. 32 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Altahhan, A. (2015). Deep Feature-Action Processing with Mixture of Updates. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9492. Springer, Cham. https://doi.org/10.1007/978-3-319-26561-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-26561-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26560-5
Online ISBN: 978-3-319-26561-2
eBook Packages: Computer ScienceComputer Science (R0)