- Defined the source tables that are used to run our application.
- Created staging tables that clean and rename columns from the source tables.
- Created snapshots for some of our staging tables to track changes over time.
- Used those snapshots to create dimension tables that represent the entities in our business.
- fct_orders: Contains information about the orders placed by users.
- fct_order_items: Contains information about the items in each order.
fct_order_items
table since it is a more granular version of fct_orders
.
We will then aggregate this table to create the fct_orders
table.
You can find all the code for creating these fact tables in the
models/core/facts
directory
of the example-dbt-project.Model: fct_order_items
Thefct_order_items
table contains information about the items in each order.
Specification
We would like ourfct_order_items
table to contain the following columns:
Column Name | Type | Description |
---|---|---|
order_item_id | PK | A unique identifier for each order item. |
order_id | FK | Ref. order_id in the stg_orders table. |
item_id | FK | Ref. item_id in the stg_items table. |
promotion_id | FK | Ref. promotion_id in the stg_promotions table. |
dim_item_key | SCD Key | Ref. item_key in the dim_items_history table. |
dim_promotion_key | SCD Key | Ref. promotion_key in the dim_promotions table. |
quantity | Measure | The quantity of the item in the order. |
base_unit_selling_price_usd | Measure | The price of the item before applying any promotions. |
effective_unit_selling_price_usd | Measure | The price of the item after applying any promotions. |
landed_unit_cost_price_usd | Measure | The cost price of the item. |
unit_margin_usd | Measure | The margin on the item. |
total_selling_price_usd | Measure | The total selling price of the item. |
total_cost_price_usd | Measure | The total cost price of the item. |
total_margin_usd | Measure | The total margin on the item. |
created_at | Dimension | Time when the order item was created. |
updated_at | Dimension | Time when the order item was last updated. |
fct_order_items
table.
Intermediates
Most of the information we need for the model can be found directly in one table or another. However, the calculation of thelanded_unit_cost_price_usd
requires a bit more work.
We will create an intermediate model called int_items_landed_cost_daily
to calculate this value.
Doing so allows us to abstract the complexity of the calculation and reuse it in other models if needed.
int_items_landed_cost_daily
If you go back to how the e-commerce platform works, you’ll recall that merchants receive items from vendors in batches daily. The price of an item in each batch is different and depends on the vendor it was bought from, how large the batch was etc. The landed cost price is therefore recalculated every time a new batch of items is received. It is calculated as the weighted average of the items currently in stock and the new items received. It looks like obtaining this number might take a fair amount of calculation, we will therefore relegate it to its own intermediate model calledint_items_landed_cost_daily
.
The model should look something like this:
Final Model
The final model forfct_order_items
should look something like this:
fct_order_items
table that contains all the information we need about the items in each order.
The model is designed to be flexible and can be easily extended to include more information if needed.
It contains measures like total_selling_price_usd
, total_cost_price_usd
, and total_margin_usd
that can be used to
analyze the performance of the business and calculate metrics like profit margins.
It also contains dimension keys that can be used to join with the dimension tables to get more
information about the entities involved at the time of the order.
The IDs of all dimensions are also included as they can be sometimes required for debugging or auditing purposes.
We can now use this table to create the fct_orders
table that aggregates the data at the order level.
Model: fct_orders
The model forfct_orders
is quite simple. It aggregates the data from fct_order_items
at the order level.
Specification
We would like ourfct_orders
table to contain the following columns:
Column Name | Type | Description |
---|---|---|
order_id | PK | A unique identifier for each order. |
user_id | FK | Ref. user_id in the stg_users table. |
merchant_id | FK | Ref. merchant_id in the stg_merchants table. |
dim_user_key | SCD Key | Ref. user_key in the dim_users_history table. |
dim_merchant_key | SCD Key | Ref. merchant_key in the dim_merchants_history table. |
order_placed_at | Dimension | Date and time when the order was placed. |
total_selling_price_usd | Measure | The total selling price of the order. |
total_cost_price_usd | Measure | The total cost price of the order. |
total_margin_usd | Measure | The total margin on the order. |
created_at | Dimension | Time when the order was created. |
updated_at | Dimension | Time when the order was last updated. |
Final Model
The final model forfct_orders
should look something like this:
Next Steps
We have now created the fact tables for our e-commerce data warehouse. These tables contain the numerical data that represents the business processes of an organization. Thefct_order_items
table contains information about the items in each order and the
fct_orders
table aggregates this data at the order level.
With our facts and dimensions in place, we have a complete star schema data model that can be
used for analytics and reporting purposes.
In the next and final section, we will recap everything we have done so far and look at an
example of creating a mart table that aggregates data into Gross Merchandise Value (GMV) and margin metrics.