Dark Data as a Subset of Big Data Proving to be of Importance



Any data which is disregarded and still remains stored without being indexed anywhere is known as Dark Data, it has a tendency to get lost as it disappears for the researchers first. It has been gathered by organizations unintentionally and therefore it is unstructured in nature, it is not accessible to the public and is neither employed for any decision making.

The primary reason for the generation of dark data is the accumulation of bulk of data and only a small part of it being selected for analysis. Data is generated very rapidly; with every user clicking on a link, data is being generated which is analyzed by the corporations to better their businesses. However, they require only a limited amount of data that is structured and then kept as a record in databases whereas the remaining unstructured data is lost amid other data which is not indexed.

Out of 7.5 sextillion gigabytes of data generated throughout the world on a daily basis, 6.75 Septillion megabytes is left unprocessed and goes as dark data which further remains stockpiled in data repositories. The lack of required tools for analysis is another reason for the generation of dark data.

Referenced from the statements given by Bob Picciano, Senior VP of Analytics at IBM, “Data that is difficult to work with creates a high barrier to entry. People typically forego trying to get any information out of it. About 90% of data generated by most sensors and other sources on the market never get utilized, and 60% of that data loses its true value within milliseconds.”

Dark data can be employed by an organization to gain valuable insights which are even more valuable than the insights they are gaining presently, dark data is a subset of big data in a way and can be used for multiple purposes such as to analyze the network security in an environment.