Photo by @mana5280

Tiny and Powerful NLP for Text With pQRNN

Natural language processing with projection-based modelling and Quasi-Recurrent Neural Networks (QRNN) from Google AI

The Google AI blog is one to follow. I will do my best to cover an article written the 21st of September 2020 by Prabhu Kaliamoorthi, Software Engineer, at Google Research.

What is a projection-based model architecture? According to a paper on from 2018:

What Kaliamoorthi does is build a state-of-the-art light-weight text classification model.

One such model, pQRNN, shows that this new architecture can nearly achieve BERT-level performance, despite using 300x fewer parameters and being trained on only the supervised data.

Image from Google AI blog retrieved the 23rd of September 2020

They have open-sourced the PRADO model they built last year, and encourage the community to use it as a jumping off point for new model architectures.

Their PRADO model is a neural architecture built in 2019. Their model with less than 200K parameters reached ‘state of the art’ performance.

There is a need for NLP models that can be rather than in data centers.

Their new model pQRNN advances NLP performance with a minimal model size.

“The novelty of pQRNN is in how it combines a simple projection operation with a quasi-RNN encoder for fast, parallel processing.”

According to a paper on the topic that the author has linked:

How does it work in Kaliamoorthi’s model?

According to the author []:

  1. “Normally, the text input to NLP models is first processed into a form that is suitable for the neural network,
    → by segmenting text into pieces (tokens) that correspond to values in a predefined universal dictionary (a list of all possible tokens).
  2. The neural network then uniquely identifies each segment using a trainable parameter vector, which comprises the embedding table.
    → in which text is segmented has a significant impact on the model performance, size, and latency.

The figure below is made by Kaliamoorthi and shows the spectrum of approaches used by the NLP community and their pros and cons.

Image from Google AI blog retrieved the 23rd of September 2020

In short, what he says is that not all NLP models need to know everything.

Most tasks can be solved by knowing a small subset of segments.

There is difference in complexity to consider:

Image from Google AI blog retrieved the 23rd of September 2020

Their previous model that this one builds on, PRADO, was designed to:

The pQRNN model has three building blocks according to Kaliamoorthi:

  1. A projection operator that converts tokens in text to a sequence of ternary vectors.
  2. A dense bottleneck layer.
  3. A stack of Quasi-Recurrent Neural Networks (QRNN) encoders.

It is illustrated as follows:

Image from Google AI blog retrieved the 23rd of September 2020

AI Policy and Ethics at Student at University of Copenhagen MSc in Social Data Science. All views are my own.