Saturday, December 24, 2016

DW Patterns: The good - a recap

Ok, the next obvious step for looking at DW design patterns is fact design patterns - which I will start on next time. First, however, I want to take a quick look back and summarize the dimension patterns.


So the main takeaway from this is that Type 2 is the master pattern, at least for the under-the-hood part. Thought should be put in as to whether your users actually want type 2, but even if they don't, it's still the way to go. Unless you're confident that you're users will never want it, or that your disk space will suffer too much from it, it should be your default implementation.

Let me just say that again: you should default to Type 2 and only consider not retaining the history in exceptional circumstances. This is because if you have the history in type 2, it's trivially easy to implement types 0, 1, 2, 3 and 6 as a view - plus it renders type 4 pointless. That leaves us with types 4K and 5K, which are both optimizations, and thus should only be considered in exceptional circumstances anyway.

So: if you're making an ETL automation tool, or anything like that, you should start with Type 2 and add others only as they are needed.


Stepping back from the individual patterns for a moment though, let's talk about what kinds of problems they solve and what assumptions they make about our data. First, the most important aspect to understand about SCDs - they do not store all history. The only thing any of the above SCDs track is changes to the entities represented by dimensions. This is great when your dimensions nicely fit into business concepts, but breaks down the moment you have something like, say, a junk dimension. Or a dimension with an awkwardly split scope (e.g. a student dimension that's split into two records - one for the student's international study, one for their domestic study). Or any other such information.

The weakness of SCDs for tracking scope doesn't stop there though - what if a fact record (an event) is updated so that it refers to a completely different dimension member (e.g. a student's enrolment is updated to change the paper they're enrolled in). Since no entity is changed, SCDs are hopeless here -we have to hope that our facts deal with it instead.

Sadly, the standard patterns don't do this, as we'll see...

No comments:

Post a Comment