So far in our data modeling journey, we have:

  1. Defined the schema of the source tables.
  2. Created a staging layer to clean and rename the columns.
  3. Created snapshots of the staging tables to track changes over time.

We shall now begin creating our dimension tables. Dimensions are descriptive attributes that provide context to facts. They are typically used for filtering and grouping data. Examples of dimensions include time, location, and product.

When you track the changes to a dimension over time, it is known as a Slowly Changing Dimension.

Our dimension tables follow the following naming convention:

  • dim_<entity_name> for simple dimensions.
  • dim_<entity_name>_history for slowly changing dimensions.

You can find all the code for creating these dimension tables in the models/core/dimensions/ directory of the example-dbt-project.

Creating Dimension Tables

dim_users_history

Specification

We would like our dim_users_history table to have the following columns.

Note that our user_id is no longer a primary key in this table. This is because since we can now have multiple rows for each user, each row representing a different version of the user’s history, the user_id is no longer unique.

Instead, we have a dbt_scd_id column that acts as the primary key.

We have also added a column called is_current that indicates whether a row is the current version of the user.

Column NameTypeDescription
dim_users_keyPKPrimary Key
user_idFKForeign Key referencing user_id in users table
user_namevarcharName of the user
user_emailvarcharEmail address of the user
latitudefloatLatitude of the user’s location
longitudefloatLongitude of the user’s location
created_attimestampTime when the user was created
updated_attimestampTime when the user was last updated
valid_fromtimestampTime when the record became valid
valid_totimestampTime when the record ceased to be valid
is_currentbooleanIndicates whether the record is current

Model

The following dbt model creates a slowly changing dimension table for the users table.

with

--
source as (
    select * from {{ ref('snapshot_stg_users') }}
),

--
renamed as (
    select
        dbt_scd_id as dim_users_key,
        user_id,

        user_name,
        user_email,
        latitude,
        longitude,

        created_at,
        updated_at,

        dbt_valid_from as valid_from,
        dbt_valid_to as valid_to,

        (valid_to is null) as is_current

    from
        source
)

--
select * from renamed

dim_merchants_history

Specification

We would like our dim_merchants_history table to have the following columns:

Column NameTypeDescription
dim_merchants_keyPKPrimary Key
merchant_idFKForeign Key referencing merchant_id in merchants table
merchant_namevarcharName of the merchant
latitudefloatLatitude of the merchant’s location
longitudefloatLongitude of the merchant’s location
created_attimestampTime when the merchant was created
updated_attimestampTime when the merchant was last updated
valid_fromtimestampTime when the record became valid
valid_totimestampTime when the record ceased to be valid
is_currentbooleanIndicates whether the record is current

Model

The following dbt model creates a slowly changing dimension table for the merchants table.

with

--
source as (
    select * from {{ ref('snapshot_stg_merchants') }}
),

--
renamed as (
    select
        dbt_scd_id as dim_merchants_key,
        merchant_id,

        merchant_name,
        latitude,
        longitude,

        created_at,
        updated_at,

        dbt_valid_from as valid_from,
        dbt_valid_to as valid_to,

        (valid_to is null) as is_current

    from
        source
)

--
select * from renamed

dim_items_history

Specification

We would like our dim_items_history table to have the following columns:

Column NameTypeDescription
dim_items_keyPKPrimary Key
item_idFKForeign Key referencing item_id in items table
item_namevarcharName of the item
item_categoryvarcharCategory of the item
base_price_usdfloatBase price of the item in USD
created_attimestampTime when the item was created
updated_attimestampTime when the item was last updated
valid_fromtimestampTime when the record became valid
valid_totimestampTime when the record ceased to be valid
is_currentbooleanIndicates whether the record is current

Model

The following dbt model creates a slowly changing dimension table for the items table.

with

--
source as (
    select * from {{ ref('snapshot_stg_items') }}
),

--
renamed as (
    select
        dbt_scd_id as dim_items_key,
        item_id,

        item_name,
        item_category,
        base_price_usd,

        created_at,
        updated_at,

        dbt_valid_from as valid_from,
        dbt_valid_to as valid_to,

        (valid_to is null) as is_current

    from
        source
)

--
select * from renamed

dim_promotions_history

Specification

We would like our dim_promotions_history table to have the following columns:

Column NameTypeDescription
dim_promotions_keyPKPrimary Key
promotion_idFKForeign Key referencing promotion_id in promotions table
item_idFKForeign Key referencing item_id in items table
discount_percentage_as_floatfloatPercentage discount applied to the item
created_attimestampTime when the promotion was created
updated_attimestampTime when the promotion was last updated
valid_fromtimestampTime when the record became valid
valid_totimestampTime when the record ceased to be valid
is_currentbooleanIndicates whether the record is current

The following dbt model creates a slowly changing dimension table for the promotions table.

with

--
source as (
    select * from {{ ref('snapshot_stg_promotions') }}
),

--
renamed as (
    select
        dbt_scd_id as dim_promotions_key,
        promotion_id,

        item_id,
        discount_percentage,

        created_at,
        updated_at,

        dbt_valid_from as valid_from,
        dbt_valid_to as valid_to,

        (valid_to is null) as is_current

    from
        source
)

--
select * from renamed

Next Steps

We have now created our dimension tables. In the next section, we shall create our fact tables.