Optimise for clarity, not for brevity.
Long names can be mentioned once and then aliased in queries but the confusion caused by unclear names can’t be fixed as easily.Some other general principles to follow:
Column names should be lower snake cased.
Use consistent and descriptive identifiers and names.
Limit use of abbreviations that are related to domain knowledge.
Avoid using reserved words as column names i.e. a date column should not be called date, use calendar_date instead.
Names must begin with a letter and may not end in an underscore.
Aggregate columns should take the form:aggregation_object_context_unitsAggregation:Should be:
count when measuring the quantity of discrete objects e.g. conversations
sum when measuring the quantity of continuous quantities e.g. ARR
Otherwise the aggregation function should be stated explicitly:
e.g.
avg
median
std_dev
…
ObjectRefers to the underlying object being aggregated (e.g. apps, inbox conversations, outbound messages).More abstract objects are allowed (e.g. paid trials, net new arr) but these should be agreed, broadly understood, and documented.ContextShould provide information on the filtering applied to the object being aggregated.
This can be further split into qualifiers and actors: context = qualifier_by_actorUnitsShould be stated where applicable. In cases where we only use one unit (e.g. dollars for currency columns) they can be omitted.
However, if there is any ambiguity they should be included. Explicit is better than implicit.Examples:
A column containing the number of inbound support tickets should be called count_support_tickets_inbound
A column containing the number of inbound support tickets from email should be called count_support_tickets_inbound_from_email
A column containing the number of support tickets which had a teammate reply before they were closed should be called count_support_tickets_resolved_by_teammate
A column containing the time in hours a teammate was active in the product should be called sum_teammate_time_active_in_hours
A column containing the median ARR per app should be called median_arr_per_app
A column containing the average cost per request in US dollars should be called avg_cost_per_request_in_usd