The Warehouse of Data: DW Patterns: The good

So we left off last time recapping the different kinds of SCD patterns, and I mentioned some of the ways that SCDs don't provide adequate history capture. Some of these are, for example, when we have data related to an event which is changed, but doesn't live in an "entity" dimension, but rather a junk dimension, or when the entity related to an event is changed - something that Kimball would probably call a bad design decision; I'm not so sure.

There's another type of change which can occur - when the amount measured changes - these can be measurement errors that we correct, or they can represent a grain that perhaps isn't factored into the fact table in the traditional way - for example if a research paper is released, the public sentiment can change over time (Altmetric, for example, measure something similar), or tracking the influence of people over time a la Forbe's list of influential people (assuming they base that list on data and not thumb-sucking - I don't know either way). There are also a couple of other situations I'll get into through the remainder of this series.

Anyway, due to these issues with dimensions, next we need to take a look at the ways that the fact table can contribute - can we make up for these shortcomings using standard design patterns? Wikipedia lists four design patterns, while Kimball lists three. Fortunately this time they mostly agree on what they are - Wikipedia's fourth pattern is a variant of one of the other three. The place to start however is always with...

Transactional Fact

See definition on the Kimball Group website

These are the most basic, intuitive facts, and with a bit of thought you can normally model (and should probably at least go through the exercise of doing so) the other fact types as a transactional fact - though sometimes you don't have access to that kind of information. Anyway, the basic premise of this fact is that we measure each event that happens - the classic example being sales transactions, where we basically know everything about it when it happens, we can store that, measure it, aggregate it and display it in really nice over-time examples.

In fact, as I mentioned earlier, just about everything can be measured as a transactional fact in some way or another - one example that often arises of when you might not want to use the transaction fact is for measuring stock levels. The amount of stock in a warehouse each day is usually the recommended scenario for Daily Snapshot facts, but if you were to have a positive-valued event for every item that entered the warehouse, and a negative-valued event for every item that left the warehouse, you can happily sum it up over time and get correct values.

Looking back at our list of history tracking requirements, though, we can see pretty clearly that we don't get anything from this fact type. That's not to say you can't combine it with other kinds of patterns to provide history tracking, but this pattern itself doesn't provide anything. Updates to a fact either won't get represented, or they'll overwrite the existing records - either way we don't get history.

Next time we'll take a look at the periodic snapshot fact, and see if that can get us what we want...

The Warehouse of Data

Saturday, January 7, 2017

DW Patterns: The good - Transactional facts

Transactional Fact

See definition on the Kimball Group website

No comments:

Post a Comment