Facebook AI & Retrieval-Augmented Generation (RAG)

A new open-source language model through Hugging Face Transformers in 2020

Alex Moltzau
5 min readSep 30, 2020


There is so much we do not understand. At times it all seems to blend together. Yet, there is a wish to aggregate our language — all of our human communication — to make sense.

How much language data does Facebook process?

According to Statista with Facebook had over 2.7 billion monthly active users as of the second quarter of 2020, and is the biggest social network worldwide.

Hard to say. Truly, it is hard to say just how much data is flowing through Facebook.

The Hive

An article on Kinsta has gathered a variety of stats about Facebook and it has a section on Data and Usage [bold added]:

“Facebook generates 4 petabytes of data per day — that’s a million gigabytes. All that data is stored in what is known as the Hive…

…which contains about 300 petabytes of data. This enormous amount of content generation is without a doubt connected to the fact that Facebook users spend more time on the site than users spend on any other social network, putting in about an hour a day.”

I found a post by Facebook Engineering from 2009, then a Wikipedia article about Apache Hive written by Facebook.

It has quite a weird logo, an elephant-wasp:

A bit later I found the project on GitHub.

“The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL.”

Regardless, this is a digression…

What I wanted to write about is Facebook developing new software!

RAG framework for AI

Facebook has designed a novel framework for AI that can create more intelligent natural language processing (NLP) models.

Facebook announced its new Retrieval Augmented Generation (RAG) architecture.

It is being released as part of its open-source Hugging Face transformer library.

Natural language in NLP largely means human language — the way you and me communicate through words or utterances. First and foremost this is often thought about pertaining to words in different human languages.

How do we understand words and what meanings do words hold?

This may seem like an easy question when you talk, but creating algorithms that make sense of small amounts or large amounts of words or sentences to generate insight is quite a challenging task!

There is so much context that goes into different utterances.

What has changed now for Facebook?

This is done to generate more accurate answers to questions without having to be ‘constantly retrained’.

RAG is based on the paper Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.

“Retrieval-augmented generation (“RAG”) models combine the powers of pretrained dense retrieval (DPR) and sequence-to-sequence models. RAG models retrieve documents, pass them to a seq2seq model, then marginalize to generate outputs. The retriever and seq2seq modules are initialized from pretrained models, and fine-tuned jointly, allowing both retrieval and generation to adapt to downstream tasks.”

To understand this statement it may be useful to retrieve a few descriptions of what this descriptions entail.

Dense Passage Retrieval (DPR) - is a set of tools and models for state-of-the-art open-domain Q&A research. It is based on the following paper:

Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih, Dense Passage Retrieval for Open-Domain Question Answering, Preprint 2020.

Sequence-to-sequence: a typical sequence to sequence model has two parts — an encoder and a decoder. Both the parts are practically two different neural network models combined into one giant network. Broadly, the task of an encoder network is to understand the input sequence, and create a smaller dimensional representation of it.

Furthermore, this sentence may need some exploring:

“The retriever and seq2seq modules are initialized from pretrained models, and fine-tuned jointly, allowing both retrieval and generation to adapt to downstream tasks.”

It may be helpful to look at a figure from their paper:

“Figure 1: An overview of retrieval-augmented generation (RAG). We combine a pre-trained retriever (Query Encoder + Document Index) with a pre-trained encoder-decoder (Generator) and fine-tune end-to-end. For some query x, we use Maximum Inner Product Search (MIPS) to find the top-K most relevant documents of all documents zi . To make the final prediction y, we treat z as a latent variable and marginalize over the encoder-decoder predictions given different documents.”

End-to-end backpropagation through q and an generator.

As you can see there are three question queries on the left.

  1. Question answering.
  2. Fact verification.
  3. Question generation.

They present hybrid generation models with access to parametric and non-parametric retrieval-based external memory, in the form of Wikipedia.

Their RAG models obtain state-of-the-art performance on open domain question answering.

They found that people prefer RAG’s generation over purely parametric BART and find RAG more factual, and we conducted a detailed investigation of the learned retrieval component, validating its effectiveness.

Additionally there is talk of updating the model or ‘hot-swapping indices’:

“Our result shows that we can effectively update RAG’s behavior with new world knowledge by simply replacing its non-parametric memory.”

This may have been done to counter a previous issue of adversarial AI.

MIT researchers built a system that fools natural-language processing systems by swapping words with synonyms.

Perhaps this to some degree can be handled by this NLP model?

I am not entirely sure if I have the answer.

“Our work opens new research directions on how parametric and non-parametric memories interact and how to most effectively combine the different components, showing promise in being applied to a wide variety of NLP tasks.”

With information changing there may be a need to extract information and infer correctly to some extent.

However, does this eliminate the need to retrain models?

I am unsure if that is the case although some researchers wish to think so, that it may be possible to save time.

What do you think?

This is #500daysofAI and you are reading article 484. I am writing one new article about or related to artificial intelligence every day for 500 days.



Alex Moltzau

AI Policy, Governance, Ethics and International Partnerships at www.nora.ai. All views are my own. twitter.com/AlexMoltzau