This post will focus on the other kind of standard snapshot fact. This is, of course, the...
Accumulating Snapshot Fact
While each row in a periodic snapshot represents an observation of the current state of an ongoing process (like a reading of a thermometer, or a count of the number of items in inventory), a row in the Accumulating Snapshot table represents an entire process, from beginning to end. The idea is that you have one date/time column for each step in the process, that records when the record hit that stage. It would be possible to have separate rows and treat it as a transaction fact, but frequently there will be measures that only relate to one step in the process - and then you'd have to have columns that only contain values for a small proportion of rows, and figure out what to do in other cases, and so on.
One important thing to note here is that the process must have absolutely well-defined stages in order for this pattern to work - if it's possible for the process to move backwards and forwards, skip steps or go through the stages in a non-linear order, this pattern is useless. If the process has an undefined number of stages, this pattern is useless.
In cases where you do have a strictly defined process, however, this isn't just a good solution, it's ideal - you minimize the amount of data stored, you don't waste space on pointless columns for most records (some records may have empty columns, but only when the process hasn't yet reached that stage), and you can easily see where each item going through the process is at. It's great - it just seems to me to be quite oddly-specific to earn a place among the Three Patterns of Facts.
Does it fix the problems that were left by the previous two types though? It seems to have invented its own niche problem and solved that rather than helping with these basic questions of history tracking, but let's check. The problems I was evaluating against previously when fact data changed were:
- Data entry error (the value was always supposed to be X, but it was originally measured as Y)
- The grain of the measure doesn't match our fact table because the quantity we're measuring is not temporally related to when the event occurred (e.g. attention score)
- The measure is a continuous variable (e.g. influence of individuals, or temperature)
As with the previous two types, 1 is not covered - we only have the choice to overwrite one of the existing measurements taken - no way to track what the incorrect value was. Not too promising - what about number 2? In order to achieve that kind of tracking, we'd have to have infinite columns. That doesn't seem feasible... Number 3 was already falls under Periodic Snapshot's area of expertise, so we can skip that (this pattern can't really do much there anyway).
So now we have 3 problems remaining - problem 1 and 2 from above, and the new problem of what can be done for modelling processes for which we'd like to use the Accumulating Snapshot, but can't because they have a flexible process (one example that I'm particularly familiar with, is student enrollment - this generally involves a lot of back-and-forth between the institution, the student and various other parties, and the process doesn't follow a set path - or if it does, it's such a convoluted one that the simplistic modelling of the Accumulating Snapshot can't handle it).
So what can be done? Well, Kimball does have a couple of pointers that may help and we even alluded to some in the previous post, but that will be the topic for next time.
No comments:
Post a Comment