Source
Source layers can be of two types - Raw Data (with no transformation) or Staging (with minimal data cleaning and homogenisation).Raw Data
Contains replicas of data as it is at source, without any transformation or changes whatsoever. Table and column names should match those in the original source without any modifications. Create a separate namespace for each data source. Each production database or cluster should be considered a separate source. This layer should only be visible to builders of ETL pipelines and not to end users.Staging
Contains views on top of your raw data layer that contain the following transformations:- Combining partial data together into a single table. e.g. Combining active and deleted records.
- Rename columns to follow a consistent naming scheme across all your tables.
- Type cast columns for consistency or ease of use. e.g. converting all timestamps to UTC, all boolean columns to TRUE/FALSE instead of 1/0 etc.