Facts and dimensions are the building blocks of a Kimball-style data warehouse.
- Facts are measurements, metrics or actions that represent business processes. They are typically numeric values that can be aggregated. Examples of facts include sales revenue, quantity sold, and the number of orders placed.
- Dimensions provide context to facts. They are descriptive attributes that help to understand the facts. Examples of dimensions include time, location, and product.
snapshot
functionality in dbt comes in handy.
What is a Snapshot?
The snapshot functionality in dbt compares the current state of a table with its previous state and creates a new row every time it detects a change. This allows us to track changes to our data over time and create a history of our data. dbt offers two strategies to compare the current state of a table with its previous state:timestamp
and check_cols
.
- Timestamp: This strategy compares the current state of a table with its previous state based on a
timestamp column - usually an
updated_at
column. If the timestamp column is updated, dbt creates a new row in the snapshot table. - Check_cols: This strategy compares the current state of a table with its previous state based on the values
of specific columns. If any of the columns specified in the
check_cols
configuration are updated, dbt creates a new row in the snapshot table.
dbt recommends the
timestamp
strategy for most use cases. The check_cols
strategy can be used when you don’t have a
timestamp column in your table, however it is less performant than the timestamp
strategy.Example
The follow example was taken from dbt documentation. Imagine you have anorders
table with the following columns:
id | status | updated_at |
---|---|---|
1 | pending | 2019-01-01 |
2 | pending | 2019-01-01 |
id | status | updated_at |
---|---|---|
1 | shipped | 2019-01-02 |
2 | pending | 2019-01-01 |
id | status | updated_at | valid_from | valid_to |
---|---|---|---|---|
1 | pending | 2019-01-01 | 2019-01-01 | 2019-01-02 |
1 | shipped | 2019-01-02 | 2019-01-02 | null |
2 | pending | 2019-01-01 | 2019-01-01 | null |
Creating Snapshots
Let us now create snapshots of our staging tables. Snapshot tables are named with the prefixsnap_
followed by the name of the model being snapshotted.
For example, the snapshot table for the stg_users
table would be named snap_stg_users
.
You can find all the code for creating these snapshots in the
snapshots/
directory
of the example-dbt-project.