Column names should be descriptive and follow a consistent naming convention.
Optimise for clarity, not for brevity. Long names can be mentioned once and then aliased in queries but the confusion caused by unclear names can’t be fixed as easily.
Some other general principles to follow:
_date
, _month
, _year
depending on the level of granularity.
created_date
updated_month
_at
.
created_at
updated_at
created_at
refers to the time in UTC.created_at_eu_fr
refers to the time in Europe/Paris timezone._id
.
user_id
product_id
order_user_id
order_product_id
_id
and prefixed with the relationship name.
manager_id
parent_id
quantity
price
length_cm
weight_kg
Columns storing ETL related metadata are prefixed with double underscores, e.g. __dbt_inserted_at
Aggregate columns should take the form:
aggregation
_object
_context
_units
Aggregation:
Should be:
count
when measuring the quantity of discrete objects e.g. conversationssum
when measuring the quantity of continuous quantities e.g. ARRObject
Refers to the underlying object being aggregated (e.g. apps, inbox conversations, outbound messages).
More abstract objects are allowed (e.g. paid trials, net new arr) but these should be agreed, broadly understood, and documented.
Context
Should provide information on the filtering applied to the object being aggregated. This can be further split into qualifiers and actors: context = qualifier_by_actor
Units
Should be stated where applicable. In cases where we only use one unit (e.g. dollars for currency columns) they can be omitted. However, if there is any ambiguity they should be included. Explicit is better than implicit.
Examples:
count_support_tickets_inbound
count_support_tickets_inbound_from_email
count_support_tickets_resolved_by_teammate
sum_teammate_time_active_in_hours
median_arr_per_app
avg_cost_per_request_in_usd
Column names should be descriptive and follow a consistent naming convention.
Optimise for clarity, not for brevity. Long names can be mentioned once and then aliased in queries but the confusion caused by unclear names can’t be fixed as easily.
Some other general principles to follow:
_date
, _month
, _year
depending on the level of granularity.
created_date
updated_month
_at
.
created_at
updated_at
created_at
refers to the time in UTC.created_at_eu_fr
refers to the time in Europe/Paris timezone._id
.
user_id
product_id
order_user_id
order_product_id
_id
and prefixed with the relationship name.
manager_id
parent_id
quantity
price
length_cm
weight_kg
Columns storing ETL related metadata are prefixed with double underscores, e.g. __dbt_inserted_at
Aggregate columns should take the form:
aggregation
_object
_context
_units
Aggregation:
Should be:
count
when measuring the quantity of discrete objects e.g. conversationssum
when measuring the quantity of continuous quantities e.g. ARRObject
Refers to the underlying object being aggregated (e.g. apps, inbox conversations, outbound messages).
More abstract objects are allowed (e.g. paid trials, net new arr) but these should be agreed, broadly understood, and documented.
Context
Should provide information on the filtering applied to the object being aggregated. This can be further split into qualifiers and actors: context = qualifier_by_actor
Units
Should be stated where applicable. In cases where we only use one unit (e.g. dollars for currency columns) they can be omitted. However, if there is any ambiguity they should be included. Explicit is better than implicit.
Examples:
count_support_tickets_inbound
count_support_tickets_inbound_from_email
count_support_tickets_resolved_by_teammate
sum_teammate_time_active_in_hours
median_arr_per_app
avg_cost_per_request_in_usd