
A recent AI and Information Management report has shown that 64% of organizations manage at least 1 petabyte (PT) of data, and 41% manage at least 500 PTs. In itself, this is perfectly fine – data is the new oil, after all. The problem is, however, that anywhere between 40% and 90% of this data ocean remains dark, which is to say – neither analyzed nor used.
To start, it is necessary to explain that dark data has nothing to do with the “dark web,” a place where stolen data marketplaces and all sorts of criminals thrive. Dark data is simply messy and unused company data, which can range from IoT logs to old customer reviews.
Dark data poses various levels of risk to businesses (storage costs and compliance issues, to name a few) but also brings a slew of opportunities. With so much unused data, companies miss out on valuable insights, superior decision-making, easier regulatory compliance, lower costs, and more streamlined operations.
Innovations in cloud technology, artificial intelligence (AI), and machine learning (ML) can help mine underutilized data, extract value from it and provide valuable business insight. For instance, a retail company utilizing cloud-based AI solutions could analyze historical customer interactions and uncover behavioral patterns, leading to personalized marketing strategies, strategic business decisions, and increased sales. In this blog post, I will briefly explain how shedding light on dark data and mining it can be the next most valuable solution companies might employ to enhance business processes.
Mining the Dark Data: AI Technologies and the Cloud
With the popularity of cloud technology on the rise, businesses are moving their data, including “dark” data, to the cloud. This trend is expected to continue in the foreseeable future. For instance, Gartner predicts that 90% of organizations will adopt hybrid cloud through 2027 and that all cloud segments will record double-digit growth in 2025.
In addition to offering benefits like easy scaling and financial savings, the cloud also makes it much easier to clean, sort, and analyze large volumes of data, including dark data. This, in turn, enables companies to mine insights that otherwise wouldn’t be possible. Advanced analytics, powered by AI and ML, is another major catalyst of change, enabling businesses to deal with large unstructured datasets.
Previously, the biggest problem when dealing with dark data was its messy nature. Even though AI has been able to analyze structured data for years, unstructured or semi-structured data proved to be a hard nut to crack. Unfortunately, unstructured data constitutes the majority of dark data (up to 90%, according to estimates cited by MIT). However, recent advances in natural language programming (NLP), natural language understanding (NLU), speech recognition, and ML have enabled AI to deal with unstructured dark data more effectively.
Today, AI can easily analyze raw inputs like customer reviews, social media comments to identify trends and sentiment. Advanced sentiment analysis algorithms can come to accurate conclusions when concerning tone, context, emotional nuances, sarcasm, and urgency, providing businesses with deeper audience insights. For instance, Amazon uses this approach to flag fake reviews.
In finance and banking, AI-powered data analysis tools are used to process transaction logs and unstructured customer communications to identify fraud risks and enhance service and customer satisfaction.
Another industry where dark data mining might have potentially huge social benefits is healthcare. Currently, this industry generates around 30% of all the data in the world. Rather than allowing it to go to waste, this data could be analyzed to uncover patterns in medical records, improve diagnostics, and optimize treatment plans.
However, it is crucial to remember that health data is extremely sensitive and protected by various regulatory requirements. Therefore, any organization that stores it without active use and would like to mine this data for meaningful insights should strictly comply with all requirements and consider that they will most probably have to host models internally, which will drive the costs up.
Following the Right Process
If you have a “hunch” that your company stores unused but potentially beneficial data, it is important to follow a few basic steps to set the process right. First, before doing anything else, the company must be sure to conduct a systematic data inventory across the whole organization and create a comprehensive map of data locations, formats, ownership, and approximate volume.
While doing so, pay close attention to operational systems that generate transaction logs, customer interaction records, and IoT device outputs. These typically contain valuable but underutilized information.
Once the map is ready, assess what you’ve discovered in terms of potential business value, data quality, extraction difficulty, and regulatory considerations. Ideally, you should also implement a proper data governance framework before proceeding with data mining – you don’t want to run afoul of the GDPR and other privacy laws, particularly for customer-related data that might be gathered and stored without sufficient reason to do so.
Data collection tools should be matched to the types of data you will be mining. For instance, optical character recognition is best for scanned documents, NLP – for processing text-heavy sources, and clustering algorithms for data categorization. Bear in mind that large chunks of dark data might still be unusable – the decision on which data to extract must be based on the business problem one is trying to solve.
The later phase – analysis – should also be focused on business questions instead of performing it as an open-ended exploration. Basic statistical approaches will do just fine at the start. As your understanding of the data you have deepens, additional techniques like clustering, predictive modeling, and machine learning can be introduced, depending on need.
But What if Dark Data Is Useless?
For some companies, mining business insights from dark data might bring unexpected benefits in terms of operational efficiency or competitiveness. For others, unfortunately, it will be nothing more than looking for a needle in a haystack. Though dark data mining can have business value, the basic recommendation is still to avoid hoarding it in the first place.
To minimize the amount of dark data that ends up in your repositories, consider the following:
- Make sure to only gather the dark data analytics you actually need and be clear about the aims you’ll be trying to achieve by collecting and utilizing it. In the case of external data, look for reliable web scraping solutions backed by AI-enhanced features. They make it relatively easy to collect data in a targeted way.
- Implement a data governance policy and set expectations for your employees regarding data hygiene practices. To separate the wheat from the chaff, establish an audit schedule. Here’s a motto for you: data is redundant until proven otherwise. The aim is to archive useful data and securely delete the rest to prevent further accumulation.
- Continue conducting regular audits to identify and eliminate redundant, obsolete, or trivial (ROT) data across your systems.
- Consider deploying AI-powered data management tools that can automatically classify, tag, and prioritize information based on its business value.
Conclusion
Storing colossal amounts of dark data is expensive and may come with regulatory compliance issues. If you don’t plan on using the data you collect and store for mining business insights, you should at least know where it’s stored and perform regular audits to get rid of obsolete and redundant data.
It is estimated that by 2040, the storage of digital data will produce 14% of the world’s total emissions. Thus, data management practices that reduce the amount of dark data stored result not only in lower costs but also in better ESG compliance, as pointed out by KPMG.
However, for some businesses, dark data represents a range of untapped potential for gathering insights. By employing aproactive approach and strategic analytics tools, businesses have immense potential to convert this hidden asset into a competitive advantage and a tool for driving innovations.