A Quick Look at RAPIDS

From GPU dataframes to GPU accelerated ML algorithms

2 min readSep 14, 2020

The tagline for RAPIDS is ‘Open GPU Data Science’. I spotted RAPIDS in a recent article they posted in combination with HuggingFace and Dask.

State of the art NLP at scale with RAPIDS, HuggingFace and Dask

See how to build end-to-end NLP pipelines in a fast and scalable way on GPUs — from feature engineering to inference.

medium.com

A quick note on HuggingFace and Dask:

“The Hugging Face transformers package is an immensely popular Python library providing pretrained models that are extraordinarily useful for a variety of natural language processing (NLP) tasks.”
“Dask provides advanced parallelism for analytics”

Apparently, combining the three could be advantageous according to RAPIDS of course. However, what is RAPIDS?

RAPIDS provides GPU Accelerated libraries for data science.

They have several guides online:

Screenshot of rapids.ai taken the 14th of September

Alongside documentation.

Home - RAPIDS Docs

This site serves as a collection of all the documentation for RAPIDS. Whether you're new to RAPIDS, looking to…

docs.rapids.ai

As well as available repositories online on GitHub, additionally:

“RAPIDS is open source licensed under Apache 2.0, spanning multiple projects that range from GPU dataframes to GPU accelerated ML algorithms. Its also provides native array_interface support, allowing Apache Arrow data to be pushed to deep learning frameworks.”

It seems there is some form of collaboration with NVIDIA as well.

RAPIDS

The RAPIDS suite of software libraries, built on CUDA-X AI, gives you the freedom to execute end-to-end data science…

developer.nvidia.com

cuStreamz: More Event Stream Processing for Less with NVIDIA GPUs and RAPIDS Software

One can view cuStreamz as a bridge that connects Python-Streaming and GPUs — with sophisticated and reliable streaming…