Monday, August 15, 2016

DW Patterns: The good - types 0 - 1

In my previous post I talked about data warehousing design patterns and the idea that we might need to start branching out. Before we get to actually looking at potential new patterns, we should take stock of what patterns we currently have and what they're good for. This post and the next couple will be about the different types of Slowly Changing Dimensions, then we'll pick up on facts. Importantly, I will be looking at the SCDs as currently presented in Wikipedia.

Before I get into the details, I need to debunk a common misconception. No Slowly Changing Dimension is better than any other. The choice of which SCD to use depends on the situation. Type 2 is not "better" than type 1, unless the situation needs it. Type 0 is not "worse" than the other types, except in that it is less frequently required. So, without further ado:

Type 0: Retain Original


This is where we never update data once it's gone into the table. The only example anyone's been able to provide me with in regards to this is the Date dimension, but that's kind of cheating because the source data doesn't really change, or if it does (e.g. holidays etc) then it's actually type 1. It is conceivable that it might come in handy though, and I'm sure you'll know if it does (please let me know in comments if you have come across any examples of this in the wild).

As I was writing the second-to-last paragraph of this article I realized that junk dimensions are another case for being called type 0, but since there's nothing really being updated in the source system (just different combinations over time) I'm not convinced that those can be legitimately called Type 0 either.

Type 1: Overwrite


In a Type 1 attribute, if data in the source system changes then we overwrite it. In a solely type 1 table, date won't form part of the grain (unlike in type 2) unless you're talking about a date-related table, like the Date dimension.

Overwrite attributes are the singularly most over-used attribute type in data warehousing. This is because it's easy, and frequently acceptable. We're also not here to talk about when they shouldn't be used, so when should you use Type 1?

If you want to be able to find all historical facts for the latest version of something then you want Type 1. A great example here is that of people's names. If I go to look up a student's records, I don't want to have to specify every name they've ever had. That's just common sense. Of course, it depends on the situation; marital status is a highly situational example: in some situations it may be very important to know if a person was married at the time, while for other businesses you may only care about what they are now.

Another use for Type 1, as alluded to earlier, is if you just don't care. It's a lot of effort to set up type 2+, so if it's an attribute that you don't care about then type 1 may be what you want. If you don't think something's going to change at all, you might set it up as type 1 and rely on backups to let you backdate a type 2 fix if it turns out that it does change.

Notably, Type 1 dimensions only work for dimensions that represent entities. If you try to deal with a junk dimension (or any other grain-free dimension for that matter) with Type 1 you will have a bad time - don't do it.

Sorry this one's a bit short, I've had a busy weekend. Stay tuned for my next post at the end of the week continuing discussion about where our patterns do work.

2 comments:

  1. "If you want to be able to all historical facts for the latest version of something then you want Type 1."

    I think you accidentally a word.

    ReplyDelete
    Replies
    1. Oh, you're right. Thank you, random internet citizen, for your vigilance.

      Delete