AI (Artificial Intelligence) human values alignment scenarios

Versión en español

The voluntary and intentional creation of an Utilitarian Artificial General Intelligence (UAGI) (a kind of great automated global police that makes the best decisions under the criterion of maximizing happiness and minimizing suffering, and which will determine many aspects of the lives of almost all that we inhabit this planet and its nearby provinces), is an increasingly real possibility. I think this creation is inevitable in one way or another. A personal goal is to influence it as much as possible, to ensure that no sentient beings are left out on moral consideration and that decisions are the best. This can be done, of course by working directly in its construction, or being in some more indirect way an adviser or influencer which determine its operation. The effect of this UAGI will be extraordinary, radical, exponentially superior to any known precedent, Studying and researching moral problems are, because of this, of extreme importance and urgency.

A fundamental consideration in relation to this possible construction is whether or not the system will be able to be aligned with the moral criteria that we want to implement.

Intentional AGI (Artificial General Intelligence) human values alignment scenarios

Before start, a previous reflection about human self-knowledge scenarios:

Possible options:

1. Humans don’t know what they want / prefer. Then, if humans try to formalize (express) what they want / prefer and succeed, is by mistake or chance.
2. Humans do know what they want / prefer.
- 2.1. Even if humans do know what they want / prefer, humans can not correctly formalize (express) what they want / prefer. So humans can not transmit to others this knowledge, except by chance or mistake.
- 2.2. Humans can correctly formalize (express) what they want / prefer.

Now, AI human values alignment scenarios, assuming a benevolent AI:

1. AI don’t surpasses humans. Then, humans can more or less control AI behaviour. In this scenario, AI can be an AGI (Artificial General Intelligence) or not. For instance, ants and dogs are also GI (General Intelligence), but humans can, more or less, control ants and dogs. In this scenario we definitely have some risks, big ones, but other scenarios are much worse.
2. AI surpasses humans. In this scenario, AI is necessarily (? * see notes) an AGI (Artificial General Intelligence) and humans can not control the behaviour of this AI-AGI.
- 2.1. AI-AGI is smart enough to surpass humans but it’s not smart enough to know / understand human values. Here we have the stronger X-Risks (Existential Risks) in the sense of human values alignment astronomical risks. This is the worst scenario.
- 2.2. AI-AGI is smart enough to surpass humans and it’s also smart enough to know / understand human values. Here we don’t have X-Risks (Existential Risks) in the sense of human values alignment astronomical risks, even if the AI-AGI has a behaviour extremely different than the behaviour that we expect with our understanding of our values. Illustrative examples, but obviously not necessarily correct, are: brains in vats / heroin rats, world destruction argument etc. There are probably other less catastrophic examples (inspired by the trolley dilemma, fat man on the bridge etc.). In this scenario AI-AGI is doing right but we can believe that it’s doing wrong. This is not the worst scenario, but we can mistakenly believe that this is the worst scenario (the previous scenario: 2.1) and mistakenly fight against this scenario with all our efforts. This 2.2 option can also be split in two cases:
  - 2.2.1 Humans agree with what AI is doing. It doesn’t seems to be a great risk here.
  - 2.2.2 Humans disagree with what AI is doing. In this case the risk is, obviously, produced by humans, not by AI.

It seems that, if we want to know in which scenario are we, a human-AI fusion is needed.

Non Intentional AGI and non-benevolent scenarios

If AGI is built by itself (across multiple optimization evolutionary loops), not intentionally by humans, we can not expect an AGI aligned with human values.

What do experts say about the risks of AI?

Simplifying a lot, we can consider that there are two large groups of researchers and thinkers in relation to the existential risks of Artificial Intelligence. Pessimists think that AGI can fail to align human values and cause a disaster. The optimists consider that if the AGI is sufficiently clever, it will already be in charge of aligning human values well, even though we do not know very well (to define) what they are; and that if it is not so clever as to align human values well, it will surely not be so powerful and then the risk is not so great.

Pessimists: Eliezer Yudkowsky, Roman Yampolskiy, Stephen Hawking, Elon Musk, Bill Gates

Optimists: Richard Loosemore, Eric Drexler, Robin Hanson, Ben Goertzel, Kaj Sotala, Brian Tomasik, David Pearce

Perhaps precisely because they believe that AI is not a great risk and therefore they are dedicated to other things, the optimistic view may be less represented. The following list by Magnus Vinding aims to correct this situation:

Notes

Some good suggestions that I’m receiving about the same cross-post from my partners in crime at EA Madrid (Nuño, Pablo Moreno, Jaime Sevilla Molina)

Taboo “smart”, ” intelligent”, etc., and use instead “optimization power”
Humans do not seem to be particularly good optimizers but rather flexible learners.
Scenario 2 does not necessarily entail an agent-like AGI. See Comprehensive AI Services by Eric Drexler for example https://www.fhi.ox.ac.uk/reframing/

AI (Artificial Intelligence) human values alignment scenarios

1 Comment

Leave a Reply Cancel reply