Saturday, July 30, 2016

How to bake a coeliac-friendly dimension

In my last post on the topic of grain-free dimensions I talked about how it's possible to easily generate a "grain" table that provides access to the dimension when building a dimension. Here I'm going to get into the gritty details of how to do it - feel free to rip my SQL apart in the comments and suggest better methods for doing this, or completely alternative ways! I'm always keen to learn from others' experience.

Anyway, let's get started. I will continue with the example from last time, so let's kick this off with some revision. We started off with a Student dimension (with the grain defined by the StudentID), and wanted to add an International_Indicator column. In our imaginary school, this necessitates changing the grain of the Student table to include PaperID - since apparently International study is defined by the paper the student is studying. So, we want to wind up with something like this:

Dim_Student
SK_StudentStudentIDInternational_Indicator...
11234International...
21234Domestic...

Grain_Student
SK_StudentStudentIDPaperID
112341963
112342345
212345684


Before we get into how we build this, a disclaimer: I'm afraid that I've only ever implemented this for medium-sized businesses where truncating and re-populating was the method of choice, so that's what I'm going to present here. I suppose that if you were persisting the tables you'd have to resort to a row-hashing technique, but that's out of scope for this post.

Ok, so how do we get there? Well, let's check out my database in SSMS - I've skipped the part of actually changing the dimension structure, since I think that's pretty trivial:

Now, there's an important aspect here that's not shown above: Dim_student.SK_Student is not an identity column - we will load in the values manually. Ok, so let's take a look at the code required to load this - I'll be using a SQL stored procedure to do it, but it should be easy to port to another method. The basic principle here is that we want to go through three steps: build a table including all of the columns (IDs, attributes and surrogate key). Then, load into the dimension, and load into the grain table. And here's the code that I use to do that:

CREATE PROCEDURE grainfreedimension.Load_Dim_Student
AS
BEGIN
       SET NOCOUNT ON;
      
       truncate table grainfreedimension.Dim_Student
       truncate table grainfreedimension.Grain_Student

       select
              Paper.PaperID,
              Student.StudentID,
              Student.Name as Student_Name,
              case
                     when Paper.InternationalIndicator = 1 then 'International'
                     else 'Domestic'
              end as International_Indicator,
              DENSE_RANK() over (
                     order by
                           Student.StudentID,
                           Student.Name,
                           Paper.InternationalIndicator
              ) as SK_Student
       into #transform
       from grainfreedimension.Paper
       inner join grainfreedimension.StudentPaper on Paper.PaperID = StudentPaper.PaperID
       inner join grainfreedimension.Student on StudentPaper.StudentID = Student.StudentID

       insert into grainfreedimension.Dim_Student (
              SK_Student,
              StudentID,
              International_Indicator,
              Student_Name
       )
       select distinct
              SK_Student,
              StudentID,
              International_Indicator,
              Student_Name
       from #transform
      
       insert into grainfreedimension.Grain_Student (
              SK_Student,
              StudentID,
              PaperID
       )
       select distinct
              SK_Student,
              StudentID,
              PaperID
       from #transform
END
GO

It's really pretty simple code - it's really just the use of the DENSE_RANK function, allowing us to build the transform table without having to do joins on a bunch of dimension attributes, that makes this really cool.

No comments:

Post a Comment