Category: Big Data

How is Your Data Quality

As a long-time data professional, I am pleased to see that companies have put a much greater focus on the quality of their corporate data assets in recent years. As everyone rushes to herald data as their #1 corporate asset, it is important to realize that the collection, transformation, and publication of flawed data can have far-reaching negative impacts.

IBM estimates that the yearly cost of poor quality data in the United States in 2016 alone was $3.1 trillion. Do I have your attention now? Decisions made based on inaccurate or mischaracterized data can negatively impact your corporate operations, profitability, and other key processes.

Laws and regulations protecting PHI and PII are now common place and companies go to great lengths to mask and encrypt this type of data. But how do you know if one of your source systems is embedding a credit card number or a social security number in a text field or is using it as an unencrypted primary key? The answer is, you probably don’t. So, what can you do about it?

Start the Conversation

A great starting point is socializing the concept of a data quality program with your co-workers in both the IT and business organizations. Begin asking questions about your data by reaching out to those who deal with production issues on a daily basis. Talk with your chief information security officer and ask him/her what concerns they might have with the cleanliness of your corporate data. Talk to peers in your industry and ask them what successes and failures they might be experiencing in the context of their corporate data. Do some research and get people talking.

Commit to Quality

It is imperative that a data quality initiative have the full support of key stakeholders who are committed to the long-term results. While an initiative of this type may start out as a project, it is important to Continue reading

Small number of companies actually have big data

Last week at the TDWI Big Data conference in San Diego I learned that there are relatively few companies that actually have big data.  Yes there are eBay (keynote address), Google, Facebook and other similar companies that have a large web presence who are actively utilizing big data solutions such as Hadoop.  However, many speakers and even some vendors agreed that only a few companies actually have data volumes, variety or velocity (the 3 V’s) that require a big data solution.  One speaker indicated that in his opinion, 85% of data warehouses are under 5Tb, which is considered “small” by TDWI standards. Continue reading