General Principles
Optimise for clarity, not for brevity. Long names can be mentioned once and then aliased in queries but the confusion caused by unclear names can’t be fixed as easily. Some other general principles to follow:- Column names should be lower snake cased.
- Use consistent and descriptive identifiers and names.
- Limit use of abbreviations that are related to domain knowledge.
- Avoid using reserved words as column names i.e. a date column should not be called date, use calendar_date instead.
- Names must begin with a letter and may not end in an underscore.
Data Type Principles
Boolean Columns
- Boolean columns should be stored as true and false rather than 1 and 0, etc
- Boolean columns should be prefixed with a present or past tense third-person singular verb, such as:
- is* or was*.
- has* or had*.
- does* or did*.
Date and Time Columns
- Date columns should be suffixed with
_date,_month,_yeardepending on the level of granularity.- e.g.
created_dateupdated_month
- e.g.
- Datetime columns should be suffixed with
_at.- e.g.
created_atupdated_at
- e.g.
- Use UTC time zone by default. If you need to use another timezone, suffix the column name with the timezone identifier.
- e.g.
created_atrefers to the time in UTC.created_at_eu_frrefers to the time in Europe/Paris timezone.
- e.g.
Foreign Key Columns
- Foreign key columns should be named after the referenced table and suffixed with
_id.- e.g.
user_idproduct_id
- e.g.
- If the foreign key is part of a composite key, the column name should be prefixed with the table name.
- e.g.
order_user_idorder_product_id
- e.g.
- If the foreign key is part of a self-referencing relationship, the column name should be suffixed with
_idand prefixed with the relationship name.- e.g.
manager_idparent_id
- e.g.
Numeric Columns
- Numeric columns should be named after the quantity they represent.
- e.g.
quantityprice
- e.g.
- Mention the unit of measurement in the column name if it is not obvious.
- e.g.
length_cmweight_kg
- e.g.
ETL metadata
Columns storing ETL related metadata are prefixed with double underscores, e.g.__dbt_inserted_at
Aggregate Columns
Aggregate columns should take the form:aggregation_object_context_units
Aggregation:
Should be:
countwhen measuring the quantity of discrete objects e.g. conversationssumwhen measuring the quantity of continuous quantities e.g. ARR- Otherwise the aggregation function should be stated explicitly:
- e.g.
- avg
- median
- std_dev
- …
- e.g.
- A column containing the number of inbound support tickets should be called
count_support_tickets_inbound - A column containing the number of inbound support tickets from email should be called
count_support_tickets_inbound_from_email - A column containing the number of support tickets which had a teammate reply before they were closed should be called
count_support_tickets_resolved_by_teammate - A column containing the time in hours a teammate was active in the product should be called
sum_teammate_time_active_in_hours - A column containing the median ARR per app should be called
median_arr_per_app - A column containing the average cost per request in US dollars should be called
avg_cost_per_request_in_usd