Photo by @benjamin_1017 from Unsplash

What is Natural-Language Generation?

Transforming structured data into natural language

In TNW there was an article on Natural Language published June 2020. The article is called A beginner’s guide to natural language processing and generation. I have been writing a few articles about natural language processing, yet I have not focused so far on natural language generation. Therefore, I thought it would be interesting to explore the concept briefly. This article is based on the article from Wikipedia called natural language generation and various articles on the topic. It is partly meant as a summary and as a learning process.

“Natural-language generation (NLG) is a software process that transforms structured data into natural language. It can be used to produce long form content for organizations to automate custom reports, as well as produce custom content for a web or mobile application.”

Two clear uses are:

  • Short texts in interactive conversations (a chatbot).
  • Might even be read out by a text-to-speech system.

Human languages tend to be considerably more complex and allow for much more ambiguity and variety of expressions.

Even when human language is expressed through code it may be challenging to translate from one programming language to another, such as is the case with TranscoderAI from Facebook’s researchers.

It might help with an opposition?

NLG is described as the opposite of natural language understanding.

“Natural-language understanding (NLU) or natural-language interpretation (NLI): is a subtopic of natural-language processing in artificial intelligence that deals with machine reading comprehension.”

In NLU, the system needs to disambiguate the input sentence to produce the machine representation language.

In NLG the system needs to make decisions about how to put a concept into words.

Natural-language understanding is considered an AI-hard problem.

Early examples of NLU:

  • 1964: the program STUDENT, written by Daniel Bobrow for his PhD dissertation at MIT is one of the earliest known attempts at natural-language understanding by a computer.
  • 1965 Joseph Weizenbaum at MIT wrote ELIZA, an interactive program that carried on a dialogue in English on any topic, the most popular being psychotherapy.

There is an abstract difference in precision.

“NLU needs to deal with ambiguous or erroneous user input, whereas the ideas the system wants to express through NLG are generally known precisely.”

As well as textual representation.

“NLG needs to choose a specific, self-consistent textual representation from many potential representations, whereas NLU generally tries to produce a single, normalized representation of the idea expressed.”

A paper from 2017 maps the advances in the field classifying the empirical literature.

More recently NLG can also be accomplished by training a statistical model using machine learning, typically on a large corpus of human-written texts.

It is this last aspect that interests me to some extent.

This is #500daysofAI and you are reading article 413. I am writing one new article about or related to artificial intelligence every day for 500 days.

AI Policy and Ethics at www.nora.ai. Student at University of Copenhagen MSc in Social Data Science. All views are my own. twitter.com/AlexMoltzau