AI Safety and Designing to Understand What Humans Want

Long Ouyang, Thesis Research for Future of Life Institute

4 min readAug 11, 2019

This research looks at pragmatic reasoning common sense for humans versus program synthesis that will not make the connection. Long Ouyang received a research grant from the Future of Life Institute in 2015 for $99,750 to study: “Democratizing Programming: Synthesizing Valid Programs with Recursive Bayesian Inference”. Long Ouyang is now currently working as a research scientist at OpenAI (according to his LinkedIn) and he has done so since June this year. As such I am sure we may expect new research from Long shortly going in this direction. I will look at his research from 2015, and again due to my lack of knowledge I do excuse any mistakes or misinterpretations (feel free to correct and comment).

Quick Overview of Democratic Programming

Through studying how humans use Bayesian inference, Ouyang is working to improve the ability of a computer to understand the information it receives better. Thus interpreting beyond the literal meaning. “The communication gap between computers and humans is one of the central problems in AI safety, and Ouyang hopes that a pragmatic synthesizer will help close this gap. If AIs can reason more deeply about what people say to them, they will more effectively create the beneficial outcomes that we want.”

We do of course as usual have to take a few steps back.

What is Bayesian inference?

Bayesian inference is a method of statistical inference in which Bayes’ theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayes’ theorem (alternatively Bayes’ law or Bayes’ rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

There is a form of posterior probability – or the possibility after that which was possible has occurred. How possible is it now that it occurs again, or other possible courses based on the inferred action. To infer after all is to deduce or conclude (something) from evidence and reasoning rather than from explicit statements. This can be displayed as a Bayes’ rule.

P = probability; P(H) prior probability, the hypothesis H before data E; E as evidence responds to new data not used in computing prior probability; P(H|E) the posterior probability, the probability of H after E; then P(E|H) is the probability of observing E given H and is called likelihood; P(E) is marginal likelihood or “model evidence” — this factor is the same for all possible hypotheses being considered.

Computers, Humans and Bayesian inference

In computer science, program synthesis is the task to automatically construct a program that satisfies a given high-level specification. With program synthesis, computers are literalists: instead of considering intentions they do what is literally true, and what’s literally true is not always what humans want.

One example given is that given the data set A, AAA, and AAAAA, a computer might logically conclude that the rule is that everything has to have the letter A. This rule is literally consistent with examples provided, yet fails to represent or capture what the experimenter had in mind. Provide me with a word that begins with the letter ‘a’ could be answered with ‘a’ — which is correct, but wrong in a given context.

Long in his profile on FLI mentions that helpfulness has been intensely studied in the linguistic field called pragmatics. I found clarity too on his profile page and will accentuate this as a quote:

“One goal of artificial intelligence is valid behavior: computers should perform tasks that people actually want them to do. The current model of programming hinders validity, largely because it focuses on the minutiae of how to compute rather than the goal of what to compute.”

Long Repositories?

Long Ouyang has a few repositories up on Github, however I have not explored these much. He has a range of probabilistic related repositories and some related to cognitive science mostly in Javascript and Python. It might be worth having a look at.

This is day 70of #500daysofAI. My current focus for day 50–100 is on AI Safety. If you enjoy this please give me a response as I do want to improve my writing or discover new research, companies and projects.