What is Natural Language Processing?

A short article about a subfield in the field of artificial intelligence

Alex Moltzau

--

When deciding on approaches that can be taken to working within the field of AI one venue that can be taken is the communication with human beings.

If you are moving in this direction, or hold an interest in vast amounts of text data then it is not impossible that you have heard of natural language processing.

“Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.”

Challenges in natural language processing can involve:

  1. Speech recognition.
  2. Natural language understanding.
  3. Natural language generation.

In the 2010s, representation learning and deep neural network-style machine learning methods became widespread.

It was partly because achievements were being made in the field of artificial intelligence, or within deep learning.

This progress, in what has become known as ‘deep learning’, is often credited to some extent to Yoshua Bengio, together with Geoffrey Hinton and Yann LeCun.

There are several researched tasks in natural language processing.

A few examples of most commonly researched tasks in natural language processing could be:

  • Speech recognition. Given a sound clip of a person or people speaking, determine the textual representation of the speech.
  • Text-to-speech. Given a text, transform those units and produce a spoken representation.
  • Sentiment analysis. Extract subjective information usually from a set of documents, often using online reviews to determine “polarity” about specific objects. It is especially useful for identifying trends of public opinion in social media, for marketing.
  • Topic segmentation and recognition. Given a chunk of text, separate it into segments each of which is devoted to a topic, and identify the topic of the segment.
  • Automatic summarization. Produce a readable summary of a chunk of text. Often used to provide summaries of the text of a known type, such as research papers, articles in the financial section of a newspaper.
  • Coreference resolution. Given a sentence or larger chunk of text, determine which words (“mentions”) refer to the same objects (“entities”).
  • Discourse analysis. This rubric includes several related tasks. One task is identifying the discourse structure of a connected text, i.e. the nature of the discourse relationships between sentences (e.g. elaboration, explanation, contrast). Another possible task is recognizing and classifying the speech acts in a chunk of text (e.g. yes-no question, content question, statement, assertion, etc.).

This is #500daysofAI and you are reading article 402. I am writing one new article about or related to artificial intelligence every day for 500 days.

--

--

Alex Moltzau

Policy Officer at the European AI Office in the European Commission. This is a personal Blog and not the views of the European Commission.