Are we drowning in data? @greystorm

The Woes of Dark Data

Roughly 90 percent of data generated by sensors and analog-to-digital conversions never get used

Recently I came across an expression: dark data.

My first thought, for some reason was Darth Vader.

But dark data is not a fictional villain.

If it was I am sure it would look mean — Dark Data.

No, it is far more nefarious, as it is can be a real problem.

Organisations nowadays tend to store a lot of data.

They are often told about the ‘data-driven’ society or ‘data-driven’ economy.

In many, if not most, cases organisation may not even be aware that the data is being collected.

There is an estimate that roughly 90 percent of data generated by sensors and analog-to-digital conversions never get used.

Although there may be noble purposes for keeping data is is estimated that many companies analyse 1% of their data.

It must be said these numbers are from IBM, so it could be taken with a pinch of salt, as they might be served by communicating the importance of buying their product.

Storage can be inexpensive and storing data relatively easy.

Dark data can be any data that is not being used.

This can be:

  • Computer log files.
  • Social media customer sentiment.
  • Paper documents.
  • Video tapes.
  • A variety of other formats.

The ability of an organisation to collect data can exceed the throughput at which it can analyse the data.

Many employees are not necessarily skilled either in the necessary tools to perform an analysis.

Another issue about dark data is that all stored data inevitably drains resources.

This can be through the process of sending or storing itself through the various mediums. It can be through satellites, servers, so on so forth. Gigantic lines that run through the world. It can be the power grids that are connected to all these various sources. It can simply be the nonvolatile memory storage where the data is imprinted requiring often constant cooling.

Is dark data nefarious? Not immediately, but to some extent it can be.

Why do we store that which we may not need?

Because many businesses are unsure of what they need or do not need. Many hope for serendipitous moments.

However, like many people with logistics problems in the past, there can easily be a problem of logistics in the modern world.

There is a cost that we cannot see. This cost is to the environment, to human labour and for a variety of reasons can remain hidden in many cases.

“A hidden cost is a cost imposed by a transaction or activity that is not immediately apparent simply by looking at the trade occurring.”

I would say dark data can be such a hidden cost, not only for companies, but for society. Like a large collection of trash, only these find themselves in various places — remaining uncollected, not recycled, unreclaimed.

As a last point of clarification I do not mean that unequivocally collecting data is bad, rather that we can become more conscious about these practices and try to take better actions.

Perhaps it is time we try harder to find dark data and illuminate the potential issues they pose for society alongside the solutions that too often are promised.

This is #500daysofAI and you are reading article 444. I am writing one new article about or related to artificial intelligence every day for 500 days.

AI Policy and Ethics at Student at University of Copenhagen MSc in Social Data Science. All views are my own.