By Ian Neild
Today, unstructured data has become the predominant form of data, with estimates that 80 per cent of all data being created and stored is unstructured. This unstructured data comes from emails, blogs, documents, images, movies etc. It is being generated so quickly that most organisations can’t manage it or make use of it. Data is building up on individual computers, back up disks, shared folders, email databases and USB flash drives; interestingly some estimates* suggest 70 per cent of this data is stale after just 90 days. The problem is working out which bit is stale, which is worthless and what has to be kept for business advantage or legal reasons.
Deletion or long-term archive?
A look at my own email shows I am dangerously near a corporate email limit. Most of this data will be redundant, out of date or meaningless but from past history though, I know that I still have useful information and contacts that I will search for just after I delete it. But can an individual or company just delete this data? A recent UK news case on mobile phone hacking involves a lot of evidence generated from emails sent many years ago that are now very meaningful but as they relate to old news stories, should they have been legally deleted?
There are a number of reasons why this is going to be increasingly important in the enterprise; we have let databases get large because storage, processing and networks were getting cheaper. Access to compute on demand meant we could do this and we know that in those data sets is valuable information that is only just starting to be understood. Intelligence engines like Wolfram Alpha/Watson can link up and mine the different data sets to gain knowledge and human-like understanding that could have major benefits in a number of fields.
But the data sets are growing faster than the storage, processing and networks can cope; in 2012 it is estimated that 1,200 Exabytes of data will be generated and enterprise data is expected to grow at 650 per cent over the next five years. This is often likened to finding a “needle in a haystack” but as the data can be so specific, it is probably more apt to say that it is finding a needle amongst a pile of pins that is constantly being added to and your needle may only be relevant for a short time. We don’t need to know that it will rain yesterday.
This is a problem facing the enterprise and governments around the world and a range of storage/analysis solutions are being offered that will process unstructured data sets, structure them and delete what is considered meaningless. What has yet to be tested is whether such systems will hold up in court as legal reasons to delete corporate data and indeed if legislation is needed requiring all enterprises over a certain number of employees or involved in publication to have systems in place to ensure the right data is kept and that the legal system has access to it. There may also be anonymity and privacy issues to consider as multiple data sets may identify data that was previously anonymous. The irony is that the legal system is a classic case of data/paper overload, with some cases generating over 1 million documents, and with some laws in the UK going back to medieval times they have an inherent interest in getting this right. Even looking at data requirements today, there are issues of storage to be overcome, hard drives and tape have a finite lifetime, optical storage is emerging and atomic based storage is a long way from being productised.
One thing is certain, big data holds a lot of opportunities, in the short term, data can be automatically parsed and structured; but as more relevant data is created, databases merge and the length of time data may need to be kept increases, the issues do not go away and hard disks/tape drives do have a lifetime. As world leaders in sustainable, on-demand storage and compute capabilities and a global network, BT has the opportunity and global credibility to work with customers to help turn their unstructured data problems into opportunities to that can help a business innovate and transform. In terms of data capture, aggregation, storage and analytics all managed within a security model that can meet the legal requirements that an enterprise may face around the world. I believe that the Enterprises that can develop the IT architectures and business processes that can leverage this data explosion, in real-time, will be able to differentiate themselves within their chosen markets. Working with a partner like BT that has the core capabilities, future insights and thought leadership in this space could make all the difference.