Anyway, let's get started. I will continue with the example from last time, so let's kick this off with some revision. We started off with a Student dimension (with the grain defined by the StudentID), and wanted to add an International_Indicator column. In our imaginary school, this necessitates changing the grain of the Student table to include PaperID - since apparently International study is defined by the paper the student is studying. So, we want to wind up with something like this:
Dim_Student | |||
---|---|---|---|
SK_Student | StudentID | International_Indicator | ... |
1 | 1234 | International | ... |
2 | 1234 | Domestic | ... |
Grain_Student | ||
---|---|---|
SK_Student | StudentID | PaperID |
1 | 1234 | 1963 |
1 | 1234 | 2345 |
2 | 1234 | 5684 |
Before we get into how we build this, a disclaimer: I'm afraid that I've only ever implemented this for medium-sized businesses where truncating and re-populating was the method of choice, so that's what I'm going to present here. I suppose that if you were persisting the tables you'd have to resort to a row-hashing technique, but that's out of scope for this post.
Ok, so how do we get there? Well, let's check out my database in SSMS - I've skipped the part of actually changing the dimension structure, since I think that's pretty trivial:
Now, there's an important aspect here that's not shown above: Dim_student.SK_Student is not an identity column - we will load in the values manually. Ok, so let's take a look at the code required to load this - I'll be using a SQL stored procedure to do it, but it should be easy to port to another method. The basic principle here is that we want to go through three steps: build a table including all of the columns (IDs, attributes and surrogate key). Then, load into the dimension, and load into the grain table. And here's the code that I use to do that:
CREATE PROCEDURE grainfreedimension.Load_Dim_Student
AS
BEGIN
SET NOCOUNT ON;
truncate table
grainfreedimension.Dim_Student
truncate table
grainfreedimension.Grain_Student
select
Paper.PaperID,
Student.StudentID,
Student.Name as Student_Name,
case
when Paper.InternationalIndicator = 1 then 'International'
else 'Domestic'
end as
International_Indicator,
DENSE_RANK() over (
order by
Student.StudentID,
Student.Name,
Paper.InternationalIndicator
) as SK_Student
into #transform
from grainfreedimension.Paper
inner join
grainfreedimension.StudentPaper on Paper.PaperID = StudentPaper.PaperID
inner join
grainfreedimension.Student on StudentPaper.StudentID = Student.StudentID
insert into
grainfreedimension.Dim_Student (
SK_Student,
StudentID,
International_Indicator,
Student_Name
)
select distinct
SK_Student,
StudentID,
International_Indicator,
Student_Name
from #transform
insert into
grainfreedimension.Grain_Student (
SK_Student,
StudentID,
PaperID
)
select distinct
SK_Student,
StudentID,
PaperID
from #transform
END
GO
It's really pretty simple code - it's really just the use of the DENSE_RANK function, allowing us to build the transform table without having to do joins on a bunch of dimension attributes, that makes this really cool.
No comments:
Post a Comment