It’s in the Grain: The Details in Your Data

Author: David Tufte Posted In: Data

I was recently at a whisk(e)y tasting event. There were whiskies from the United States, Scotland, Ireland, Canada, Japan, India, France, and probably a few others. There are different types of distilling processes, but all whiskies have water, yeast and grain as ingredients. Different grains change the type of whisky that is produced. Not necessarily better or worse, just different.

The Grain in your Dates

Grain affects analysis and reporting also. Grain is the amount of detail of a particular dimension on which we want to report. If we look at the date dimension, the grain can be daily, weekly, monthly, etc. If in the past all reporting was at the weekly level, then you only need to capture the data at the weekly level. But if you’ve been capturing the data at the daily level, you have options. You can continue to provide reports at the weekly level, but you can also provide reports at the daily level. The daily level can also provide reports at the monthly and quarterly level as well.

The lower grain gives many more options. What if you need to know the impact of a holiday? You only have a general idea at the weekly level, but at the daily level you can see results on the holiday itself. You can also see if the day of the week makes an impact.

The Grain in your Products

It’s pretty obvious when you think about the date dimension, but it’s sometimes not so obvious when you talk about the grain of, say the product dimension. In a grocery store, the products have Stock Keeping Units (SKUs). An SKU tells the stock person where to put that product on the shelf.  The SCAN ID (bar code) give additional information, like 25% more product, or a 2-pack.  If the reporting grain is at the SKU level, it is extremely difficult to know if an increase in sales was due to the 25% increase in product size, or the 2 for 1, or something else. 

I’ve heard the statement, “We’ll never need to report at the Scan level, we always report at the SKU level.” One month after a newly designed data warehouse is in use, the business users are asking about SCAN level analysis. It is too late to change the ETL process, so the best they could try to do was to estimate which different SCAN codes made up the total of the SKU’s purchased. This requires some very complex calculations, and in the end is just the best estimate that could be calculated.

Bottom line, even if your organization has always reported at a certain grain, if you are making a change in your Analysis and Reporting system, get the lowest level of grain that is practically available. It’s straightforward to group by a higher level of grain, but to estimate at a lower grain than what is available is very challenging.