How can we ensure that AI benefits rather than harms humanity? This is the problem of "Friendly AI."
- What is Friendly AI?
- Can you explain Friendly AI in 200 words with no jargon?
- What is the Singularity?
- What is the history of the Friendly AI concept?
- Who is working on Friendly AI research today?
- What should I read to catch up with the leading Friendly AI researchers?
1. What is Friendly AI?
A Friendly AI (FAI) is an artificial intelligence that benefits humanity. It is contrasted with Unfriendly AI (uFAI), which includes both Malicious AI and Uncaring AI. More specifically, Friendly AI may refer to:
- a very powerful and general AI that acts autonomously in the world to benefit humanity.
- an AI that continues to benefit humanity during and after an intelligence explosion.
- a research program concerned with the production of such an AI.
- The Machine Intelligence Research Institute's approach (Yudkowsky 2001, 2004) to designing such an AI:
- Goals should be defined by the Coherent Extrapolated Volition of humanity.
- Goals should be reliably preserved during recursive self-improvement.
- Design should be mathematically rigorous and proof-apt.
Friendly AI is a more difficult project than often supposed. As explored in other sections, commonly suggested solutions for Friendly AI are likely to fail because of two features possessed by any superintelligence (Muehlhauser & Helm, forthcoming):
Superpower: a superintelligent machine will have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires.
Literalness: a superintelligent machine will make decisions using the mechanisms it is designed with, not the hopes its designers had in mind when they programmed those mechanisms. It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety (Kringelbach & Berridge 2009; Schroeder 2004; Glimcher 2010) of what humans value. A demand like "maximize human happiness" sounds simple to us because it contains few words, but philosophers and scientists have failed for centuries to explain exactly what this means, and certainly have not translated it into a form sufficiently rigorous for AI programmers to use.
2. Can you explain Friendly AI in 200 words with no jargon?
Every year, computers surpass human abilities in new ways. Computers can beat us at doing calculations, playing chess and Jeopardy!, reading road signs, and more. Recently, a robot named Adam was programmed with our scientific knowledge about yeast, then posed its own hypotheses, tested them, and assessed the results.
Many experts predict that during this century we will design a machine that can improve its own intelligence better than we can, which will make the machine even more skilled at improving its own intelligence, and so on. By this method the machine could become vastly more intelligent than the smartest human being.
A relatively small difference in intelligence between humans and other apes gave us dominance of this planet. A machine with vastly more intelligence than humans will be able to rapidly out-smart our feeble attempts to constrain it.
A machine with that much power will reshape reality according to its goals, for good or bad. If we want a desirable future, we need to make sure a super-powerful machine has (and keeps) the same goals we do. That is the challenge of building a "Friendly AI".
3. What is the Singularity?
There are many types of mathematical and physical singularities, but in this FAQ we use the term 'Singularity' to refer to the technological singularity.
There are also many things someone might have in mind when they refer to a 'technological Singularity' (Sandberg 2010). Below, we’ll explain just three of them (Yudkowsky 2007):
- Intelligence explosion
- Event horizon
- Accelerating change
Every year, computers surpass human abilities in new ways. A program written in 1956 was able to prove mathematical theorems, and found a more elegant proof for one of them than Russell and Whitehead had given in Principia Mathematica (MacKenzie 1995). By the late 1990s, 'expert systems' had surpassed human skill for a wide range of tasks (Nilsson 2009). In 1997, IBM's Deep Blue computer beat the world chess champion (Campbell et al. 2002), and in 2011 IBM's Watson computer beat the best human players at a much more complicated game: Jeopardy! (Markoff 2011). Recently, a robot named Adam was programmed with our scientific knowledge about yeast, then posed its own hypotheses, tested them, and assessed the results (King et al. 2009; King 2011).
Computers remain far short of human intelligence, but the resources that aid AI design are accumulating (including hardware, large datasets, neuroscience knowledge, and AI theory). We may one day design a machine that surpasses human skill at designing artificial intelligences. After that, this machine could improve its own intelligence faster and better than humans can, which would make it even more skilled at improving its own intelligence. This could continue in a positive feedback loop such that the machine quickly becomes vastly more intelligent than the smartest human being on Earth: an 'intelligence explosion' resulting in a machine superintelligence (Good 1965).
Vernor Vinge (1993) wrote that the arrival of machine superintelligence represents an 'event horizon' beyond which humans cannot model the future, because events beyond the Singularity will be stranger than science fiction: too weird for human minds to predict. So far, all social and technological progress has resulted from human brains, but humans cannot predict what future radically different and more powerful intelligences will create. He made an analogy to the event horizon of a black hole, beyond which the predictive power of physics at the gravitational singularity breaks down.
A third concept of technological singularity refers to accelerating change in technological development.
Ray Kurzweil (2005) has done the most to promote this idea. He suggests that although we expect linear technological change, information technological progress is exponential, and so the future will be more different than most of us expect. Technological progress enables even faster technological progress. Kurzweil suggests that technological progress may become so fast that humans cannot keep up unless they amplify their own intelligence by integrating themselves with machines.
4. What is the history of the Friendly AI Concept?
Late in the Industrial Revolution, Samuel Butler (1863) worried about what might happen when machines become more capable than the humans who designed them:
...we are ourselves creating our own successors; we are daily adding to the beauty and delicacy of their physical organisation; we are daily giving them greater power and supplying by all sorts of ingenious contrivances that self-regulating, self-acting power which will be to them what intellect has been to the human race. In the course of ages we shall find ourselves the inferior race.
...the time will come when the machines will hold the real supremacy over the world and its inhabitants...
This basic idea was picked up by science fiction authors, for example in the 1921 Czech play that introduced the term “robot,” R.U.R. In that play, robots grow in power and intelligence and destroy the entire human race, except for a single survivor.
Another case is John W. Campbell’s (1932) short story The Last Evolution, in which aliens attack Earth and the humans and aliens are killed but their machines survive and inherit the solar system.
The concerns of machine ethics are most popularly identified with Isaac Asimov’s Three Laws of Robotics, introduced in his short story Runaround. Asimov used his stories, including those collected in the popular I, Robot book, to illustrate many of the ways in which such well-meaning and seemingly comprehensive rules for governing robot behavior could go wrong.
In the year of I, Robot’s release, mathematician Alan Turing (1950) noted that machines may one be capable of whatever human intelligence can achieve:
I believe that at the end of the century... one will be able to speak of machines thinking without expecting to be contradicted.
Turing (1951/2004) concluded:
...it seems probable that once the machine thinking method has started, it would not take long to outstrip our feeble powers... At some stage therefore we should have to expect the machines to take control...
Bayesian statistician I.J. Good (1965), who had worked with Turing to crack Nazi codes in World War II, made the crucial leap to the ‘intelligence explosion’ concept:
Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion”, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make
Author Arthur C. Clarke (1968) agreed:
Though we have to live and work with (and against) today's mechanical morons, their deficiencies should not blind us to the future. In particular, it should be realized that as soon as the borders of electronic intelligence are passed, there will be a kind of chain reaction, because the machines will rapidly improve themselves... there will be a mental explosion; the merely intelligent machine will swiftly give way to the ultraintelligent machine....
Julius Lukasiewicz (1974) noted that human intelligence may be unable to predict what a superintelligent machine would do:
The survival of man may depend on the early construction of an ultraintelligent machine-or the ultraintelligent machine may take over and render the human race redundant or develop another form of life. The prospect that a merely intelligent man could ever attempt to predict the impact of an ultraintelligent device is of course unlikely but the temptation to speculate seems irresistible.
Even critics of AI like Jack Schwartz (1987) saw the implications:
If artificial intelligences can be created at all, there is little reason to believe that initial successes could not lead swiftly to the construction of artificial superintelligences able to explore significant mathematical, scientific, or engineering alternatives at a rate far exceeding human ability, or to generate plans and take action on them with equally overwhelming speed. Since man's near-monopoly of all higher forms of intelligence has been one of the most basic facts of human existence throughout the past history of this planet, such developments would clearly create a new economics, a new sociology, and a new history.
Novelist Vernor Vinge (1981) called this 'event horizon' in our ability to predict the future a 'singularity':
Here I had tried a straightforward extrapolation of technology, and found myself precipitated over an abyss. It's a problem we face every time we consider the creation of intelligences greater than our own. When this happens, human history will have reached a kind of singularity - a place where extrapolation breaks down and new models must be applied - and the world will pass beyond our understanding.
Eliezer Yudkowsky (1996) used the term 'singularity' to refer instead to Good's 'intelligence explosion', and began work on the task of figuring out how to build a self-improving AI that had a predictably positive rather than negative effect on the world (Yudkowsky 2000) — a project he eventually called 'Friendly AI' (Yudkowsky 2001, 2004). (Both this concept of Friendly AI and its difficulty were anticipated by novelist Stanislaw Lem in his 1953 short story Star Diaries, Voyage 24.)
Meanwhile, philosophers and AI researchers were considering whether or not machines could have moral value, and how to ensure ethical behavior from less powerful machines or 'narrow AIs', a field of inquiry variously known as 'artificial morality' (Danielson 1992; Floridi & Sanders 2004; Allen et al. 2000), 'machine ethics' (Hall 2000; McLaren 2005; Anderson & Anderson 2006), 'computational ethics' (Allen 2002) and 'computational metaethics' (Lokhorst, 2011), and 'robo-ethics' or 'robot ethics' (Capurro et al. 2006; Sawyer 2007). This vein of research — what we'll call the 'machine ethics' literature — was recently summarized in two books: Wallach & Allen (2009); Anderson & Anderson (2011).
Prominent philosopher of mind David Chalmers brought the concepts of intelligence explosion and Friendly AI to mainstream academic attention with his 2010 paper, ‘The Singularity: A Philosophical Analysis’, published in Journal of Consciousness Studies. That journal’s January 2012 issue will be devoted to responses to Chalmers’ article, as will an edited volume from Springer (Eden et al. 2012).
Despite their parallel interests, Friendly AI researchers do not regularly cite the machine ethics literature (e.g. see Bostrom & Yudkowsky 2011).
5. Who is working on Friendly AI research today?
Friendly AI research is most associated with the Machine Intelligence Research Institute (MIRI) and its co-founder Eliezer Yudkowsky. Other researchers working on issues related to Friendly AI include:
- Other MIRI researchers Carl Shulman, Kaj Sotala, and Luke Muehlhauser
- Independent researchers associated with the Singularity Institute: Daniel Dewey, Peter de Blanc, Joshua Fox, Steve Rayhawk, and others
- Nick Bostrom, Anders Sandberg, and Stuart Armstrong at Oxford’s Future of Humanity Institute and Oxford's Programme on the Impacts of Future Technology.
- Robin Hanson at George Mason University
- David Chalmers at Australian National University
6. What should I read to catch up with the leading Friendly AI researchers?
The creation of truly Friendly AI would be the best thing that could happen to humanity, while narrowly missing the mark and creating Unfriendly AI could be disastrous for humanity on a previously unimaginable scale. Unfortunately, it will take much more than the normal progress of human thought to solve the problem of Friendly AI. Solving Friendly AI requires solving many longstanding open problems in philosophy and mathematics, not to mention major in advances in cognitive science.
The community of competent Friendly AI researchers is much smaller than humanity needs it to be. If you’d like to contribute, we recommend the reading materials below (listed by subject).
Rationality. To think clearly about large and difficult problems, one must be aware of the many ways in which human minds can fool themselves and produce errors, and one must be trained to overcome these errors as much as possible. Concerning the mathematics and cognitive science of rationality, we recommend:
- Muehlhauser, The Cognitive Science of Rationality (2011)
- Yudkowsky, The Sequences (2006-2009)
- Hastie & Dawes, Rational Choice in an Uncertain World, 2nd edition (2009)
- Baron, Thinking and Deciding, 4th edition (2007)
- Stanovich, Rationality and the Reflective Mind (2010)
Cognitive science. Friendly AI is largely a problem of cognitive science. A broad understanding of cognitive science is recommended.
- Bermudez, Cognitive Science: An Introduction to the Science of the Mind (2010)
- Russell & Norvig, Artifical Intelligence: A Modern Approach, 3rd edition (2009)
- Buss, Evolutionary Psychology: The New Science of Mind, 4th edition (2011)
- Hutter, Universal Artificial Intelligence (2004)
Math, logic, and computer science. Correct scientific and philosophical practices relies heavily on a deep understanding of mathematics, logic, and computation.
- Jaynes, Probability Theory: The Logic of Science (2003)
- Sipser, Introduction to the Theory of Computation, 2nd edition (2005)
- Schechter, Classical and Nonclassical Logics (2005)
Philosophy. Many of the issues in Friendly AI theory are deeply philosophical.
- Lakoff & Johnson, Philosophy in the Flesh (1999)
- Bostrom & Cirkovic, Global Catastrophic Risks (2008)
- Trout & Bishop, Epistemology and the Psychology of Human Judgment (2004)
Friendly AI. Applications of the above material to thinking about Friendly AI in particular include:
- Muehlhauaser & Helm, Intelligence Explosion and Machine Ethics (2012)
- Chalmers, The Singularity: A Philosophical Analysis (2010)
- Yudkowsky, Artificial Intelligence as a Positive and Negative Factor in Global Risk (2008)
- Yudkowsky, Creating Friendly AI (2001)
After learning these basics, you may wish to specialize in one or more areas relevant to Friendly AI research. These works will get you started:
- Whole brain emulation: Sandberg & Bostrom, Whole Brain Emulation: A Roadmap (2008)
- Anthropic reasoning: Bostrom, Anthropic Bias (2002)
- Decision theory: Drescher, Good and Real (2006)
- Human values: Dolan & Sharot, Neuroscience of Preference and Choice (2011)
- Value extrapolation: Yudkowsky, Coherent Extrapolated Volition (2004)
- AI architectures: Dewey, Learning What to Value (2011)
Allen, Varner, & Zinser (2000). Prolegomena to any future artificial moral agent. Journal of Experimental & Theoretical Artificial Intelligence, 12: 251-261.
Allen (2002). Calculated morality: Ethical computing in the limit. In I. Smit & G. Lasker (eds.), Cognitive, emotive and ethical aspects of decision making and human action, vol I. Baden/IIAS.
Anderson & Anderson, eds. (2006). IEEE Intelligent Systems, 21(4).
Anderson & Anderson, eds. (2011). Machine Ethics. Cambridge University Press.
Bostrom & Yudkowsky (2011). The ethics of artificial intelligence. In Ramsey & Frankish (eds.), The Cambridge Handbook of Artificial Intelligence.
Butler (1863). Darwin among the machines. The Press (Cristchurch, New Zealand), June 13.
Campbell (1932). The Last Evolution. Amazing Stories.
Campbell, Hoane, & Hsu (2002). Deep Blue. Artificial Intelligence, 134: 57-83.
Capurro, Hausmanninger, Weber, Weil, Cerqui, Weber, & Weber (2006). International Review of Information Ethics, Vol. 6: Ethics in Robots.
Clarke (1968). The mind of the machine. Playboy, December 1968.
Danielson (1992). Artificial morality: Virtuous robots for virtual games. Routledge.
Eden, Soraker, Moor, & Steinhart, eds. (2012). Singularity Hypotheses: A Scientific and Philosophical Assessment. Springer.
Floridi & Sanders (2004). On the morality of artificial agents. Minds and Machines, 14: 349-379.
Glimcher (2010). Foundations of Neuroeconomic Analysis. Oxford University Press.
Good (1965). Speculations concerning the first ultraintelligent machine. Advanced in Computers, 6: 31-88.
Hall (2000). Ethics for machines.
King et al. (2009). The automation of science. Science, 324: 85-89.
King (2011). Rise of the robo scientists. Scientific American, January 2011.
Kringelbach & Berridge, eds. (2009). Pleasures of the Brain. Oxford University Press.
Kurzweil (2005). The Singularity is Near. Viking.
Lokhorst (2011). Computational meta-ethics: Towards the meta-ethical robot. Minds and Machines, 21: 261-274.
Lukasiewicz (1974). The ignorance explosion. Leonardo, 7: 159-163.
MacKenzie (1995). The Automation of Proof: A Historical and Sociological Exploration. IEEE Annals, 17(3): 7-29.
Markoff (2011). Computer Wins on Jeopardy!; Trivial, It’s Not. New York Times, February 17th 2011: A1.
McLaren (2005). Lessons in Machine Ethics from the Perspective of Two Computational Models of Ethical Reasoning. AAAI Technical Report FS-05-06: 70-77.
Muehlhauser & Helm (2012). Intelligence Explosion and Machine Ethics. In Singularity Hypotheses: A Scientific and Philosophical Assessment.
Nilsson (2009). The Quest for Artificial Intelligence. Cambridge University Press.
Sandberg (2010). An overview of models of technological singularity. Presented at the “Roadmaps to AGI and the future of AGI” workshop following the AGI 2010 conference in Lugano, Switzerland.
Sawyer (2007). Robot ethics. Science, 318: 1037.
Schwartz (1987). Limits of Artificial Intelligence. In Shapiro & Eckroth (eds.), Encyclopedia of Artificial Intelligence, Vol. 1 (pp. 488-503). John Wiley and Sons, Inc.
Schroeder (2004). Three Faces of Desire. Oxford University Press.
Turing (1950). Computing machinery and intelligence. Mind, 59: 433-460.
Turing (1951/2004). Intelligent machinery, a heretical theory. In Copeland (ed.), The Essential Turing (2004). Oxford University Press. Originally presented in 1951 as a lecture for the ‘51 society in Manchester.
Vinge (1981). True Names. In Dell Binary Star #5.
Vinge (1993). The coming technological singularity: How to survive in the post-human era. Whole Earth Review, winter 1993. New Whole Earth.
Wallach & Allen (2009). Moral Machines. Oxford University Press.
Yudkowsky (1996). Staring into the Singularity.
Yudkowsky (2000). Creating a Transhuman AI.
Yudkowsky (2001). Creating Friendly AI.
Yudkowsky (2004). Coherent Extrapolated Volition.
Yudkowsky (2007). Three Major Singularity Schools.
Yudkowsky (2008). Artificial Intelligence as a Positive and Negative Factor in Global Risk. In Bostrom & Cirkovic (eds)., Global Catastrophic Risks. Oxford University Press.