stg_ to indicate that they are staging tables.
You can find all the code for creating these staging tables in the
models/staging/product directory
of the example-dbt-project.Create Staging Tables
stg_users
The columns of ourusers table need the following changes:
user_idname- Should be renamed touser_nameemail_address- Should be renamed touser_emaillat- Should be renamed tolatitudelong- Should be renamed tolongitudecreatedAt- Should be renamed tocreated_atupdatedAt- Should be renamed toupdated_at
users table.
stg_merchants
The columns of ourmerchants table need the following changes:
merchant_idmerchant_namegeo_loc- This is a custom datatype that stores “latitude/longitude” as a single value. We should split this into two columnslatitudeandlongitudecreatedAt- Should be renamed tocreated_atupdatedAt- Should be renamed toupdated_at
merchants table.
stg_items
The columns of ouritems table need the following changes:
item_idmerchant_idname- Should be renamed toitem_namebecause explicit is better than implicitcategory- Should be renamed toitem_categorybecause explicit is better than implicitprice- Should be renamed tobase_price_usdso that it’s clear that the price is in USDcreated_atupdated_at
items table.
stg_promotions
The columns of ourpromotions table need the following changes:
promotion_idproduct_id- The use of the word ‘product’ is because of legacy reasons. We should rename this toitem_idto match theitemstablediscount- By default we store the discount percentage as an integer e.g. If we offer 20% discount, this value is 20. We will convert this to a float by dividing by 100 to make it easier to work with. We will also rename it todiscount_percentage_as_floatcreated_atupdated_at
promotions table.
stg_inventory
The columns of ourinventory table need the following changes:
inventory_idmerchant_iditem_idqty- Should be renamed toquantityfor claritydate_acquired- Should be renamed toacquisition_datebecause we keep the units as suffixescost- Should be renamed tounit_cost_usdso that it’s clear that the price is in USD and we’re talking about the cost of a single unitcreated_atupdated_at
total_cost_usd which is the product of quantity and unit_cost_usd since this is a common calculation that we’ll need to do.
The following dbt model creates a staging table for the inventory table.
stg_orders
The columns of ourorders table need the following changes:
order_idcustomer_id- Should be renamed touser_idto match theuserstablestore_id- Should be renamed tomerchant_idto match themerchantstabledate- Should be renamed toorder_placed_atbecause it’s a timestamp and not a date, and we want to clarify that it’s the time when the order was placedcreated_atupdated_at
orders table.
stg_order_items
The columns of ourorder_items table need the following changes:
order_item_idorder_idproduct_id- Should be renamed toitem_idto match theitemstableqty- Should be renamed toquantityfor claritycreated_atupdated_at
order_items table.
Next Steps
In this section, we have created staging tables for each of the raw tables in our database. These staging tables are used to clean and rename our columns so that they are consistent and easy to work with. The core of a data warehouse is made up of tables known as facts and dimensions. Facts represent actions or events, while dimensions represent the context in which these actions or events occur. In the next section, we will use thesnapshot functionality in dbt to create snapshots of our staging tables.
These snapshots will allow us to track changes to our data over time and create a history of our data.
Such a table that contains historical records of your data is called a slowly changing dimension (SCD) table.
Once we have our SCD tables, we will create our fact tables.