Fact Tables
So far in our data modeling journey, we have:
- Defined the source tables that are used to run our application.
- Created staging tables that clean and rename columns from the source tables.
- Created snapshots for some of our staging tables to track changes over time.
- Used those snapshots to create dimension tables that represent the entities in our business.
Now, we will create fact tables that represent the numerical data or actions that represent the business processes of an organization. Along with dimensions, facts are the other main component of a star schema.
In our e-commerce example, we will create the following fact tables:
- fct_orders: Contains information about the orders placed by users.
- fct_order_items: Contains information about the items in each order.
We will first create the fct_order_items
table since it is a more granular version of fct_orders
.
We will then aggregate this table to create the fct_orders
table.
You can find all the code for creating these fact tables in the models/core/facts
directory
of the example-dbt-project.
Model: fct_order_items
The fct_order_items
table contains information about the items in each order.
Specification
We would like our fct_order_items
table to contain the following columns:
Column Name | Type | Description |
---|---|---|
order_item_id | PK | A unique identifier for each order item. |
order_id | FK | Ref. order_id in the stg_orders table. |
item_id | FK | Ref. item_id in the stg_items table. |
promotion_id | FK | Ref. promotion_id in the stg_promotions table. |
dim_item_key | SCD Key | Ref. item_key in the dim_items_history table. |
dim_promotion_key | SCD Key | Ref. promotion_key in the dim_promotions table. |
quantity | Measure | The quantity of the item in the order. |
base_unit_selling_price_usd | Measure | The price of the item before applying any promotions. |
effective_unit_selling_price_usd | Measure | The price of the item after applying any promotions. |
landed_unit_cost_price_usd | Measure | The cost price of the item. |
unit_margin_usd | Measure | The margin on the item. |
total_selling_price_usd | Measure | The total selling price of the item. |
total_cost_price_usd | Measure | The total cost price of the item. |
total_margin_usd | Measure | The total margin on the item. |
created_at | Dimension | Time when the order item was created. |
updated_at | Dimension | Time when the order item was last updated. |
You’ll notice that not all these columns come from the same place. This is quite normal in a data warehouse. A fact table is designed to bring together data from different sources to provide a complete picture of a business process and doing so often requires going through multiple transformations.
Let us walk through the transformations required to create the fct_order_items
table.
Intermediates
Most of the information we need for the model can be found directly in one table or another.
However, the calculation of the landed_unit_cost_price_usd
requires a bit more work.
We will create an intermediate model called int_items_landed_cost_daily
to calculate this value.
Doing so allows us to abstract the complexity of the calculation and reuse it in other models if needed.
int_items_landed_cost_daily
If you go back to how the e-commerce platform works, you’ll recall that merchants receive items from vendors in batches daily. The price of an item in each batch is different and depends on the vendor it was bought from, how large the batch was etc.
The landed cost price is therefore recalculated every time a new batch of items is received. It is calculated as the weighted average of the items currently in stock and the new items received.
It looks like obtaining this number might take a fair amount of calculation, we will therefore relegate it to its own
intermediate model called int_items_landed_cost_daily
.
The model should look something like this:
Final Model
The final model for fct_order_items
should look something like this:
This creates the fct_order_items
table that contains all the information we need about the items in each order.
The model is designed to be flexible and can be easily extended to include more information if needed.
It contains measures like total_selling_price_usd
, total_cost_price_usd
, and total_margin_usd
that can be used to
analyze the performance of the business and calculate metrics like profit margins.
It also contains dimension keys that can be used to join with the dimension tables to get more information about the entities involved at the time of the order.
The IDs of all dimensions are also included as they can be sometimes required for debugging or auditing purposes.
We can now use this table to create the fct_orders
table that aggregates the data at the order level.
Model: fct_orders
The model for fct_orders
is quite simple. It aggregates the data from fct_order_items
at the order level.
Specification
We would like our fct_orders
table to contain the following columns:
Column Name | Type | Description |
---|---|---|
order_id | PK | A unique identifier for each order. |
user_id | FK | Ref. user_id in the stg_users table. |
merchant_id | FK | Ref. merchant_id in the stg_merchants table. |
dim_user_key | SCD Key | Ref. user_key in the dim_users_history table. |
dim_merchant_key | SCD Key | Ref. merchant_key in the dim_merchants_history table. |
order_placed_at | Dimension | Date and time when the order was placed. |
total_selling_price_usd | Measure | The total selling price of the order. |
total_cost_price_usd | Measure | The total cost price of the order. |
total_margin_usd | Measure | The total margin on the order. |
created_at | Dimension | Time when the order was created. |
updated_at | Dimension | Time when the order was last updated. |
Final Model
The final model for fct_orders
should look something like this:
Next Steps
We have now created the fact tables for our e-commerce data warehouse. These tables contain the numerical data that represents the business processes of an organization.
The fct_order_items
table contains information about the items in each order and the
fct_orders
table aggregates this data at the order level.
With our facts and dimensions in place, we have a complete star schema data model that can be used for analytics and reporting purposes.
In the next and final section, we will recap everything we have done so far and look at an example of creating a mart table that aggregates data into Gross Merchandise Value (GMV) and margin metrics.
Was this page helpful?