Abstract
Aligning goals of superintelligent machines with human values is one of the ways to pursue safety in AGI systems. To achieve this, it is first necessary to learn what human values are. However, human values are incredibly complex and cannot easily be formalized by hand. In this work, we propose a general framework to estimate the values of a human given its behavior.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dewey, D.: Learning what to value. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 309–314. Springer, Heidelberg (2011)
Hibbard, B.: Avoiding unintended AI behaviors. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS, vol. 7716, pp. 107–116. Springer, Heidelberg (2012)
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)
Muehlhauser, L., Helm, L.: The singularity and machine ethics. In: Eden, A.H., Moor, J.H., Sraker, J.H., Steinhart, E. (eds.) Singularity Hypotheses, pp. 101–126. Springer, Heidelberg (2012). The Frontiers Collection
Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 663–670. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Schmidhuber, J.: The speed prior: a new simplicity measure yielding near-optimal computable predictions. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, p. 216. Springer, Heidelberg (2002)
Soares, N.: The value learning problem. Tech. rep., Machine Intelligence ResearchInstitute, Berkeley, CA (2015)
Solomonoff, R.: A formal theory of inductive inference. part i. Information and Control 7(1), 1–22 (1964)
Yudkowsky, E.: Complex value systems in friendly AI. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 388–393. Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sezener, C.E. (2015). Inferring Human Values for Safe AGI Design. In: Bieger, J., Goertzel, B., Potapov, A. (eds) Artificial General Intelligence. AGI 2015. Lecture Notes in Computer Science(), vol 9205. Springer, Cham. https://doi.org/10.1007/978-3-319-21365-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-21365-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21364-4
Online ISBN: 978-3-319-21365-1
eBook Packages: Computer ScienceComputer Science (R0)