For the sake of our illustration, we will consider the example of a simple B2C ecommerce app where users can purchase products. Users are served by merchants that store products.

Each product is assigned a category. Merchants acquire these products from suppliers - called “vendors” - and store them till they are sold to a customer.

Business Context

App Workflow

  • Users log in and see items available at the nearest merchant.
  • Users add items to a virtual shopping cart.
  • Users check out the cart as a single order, which is delivered to their location.
  • Each order item has a landed cost price and an effective selling price
  • Effective selling price is calculated as the base selling price minus any discounts or promotions.

Key Metrics

These are the key metrics that the business is interested in:

  • GMV: Sum of the selling price of all orders.
  • Total Margin: Sum of the margin (selling price - landed cost price) of all orders.
  • Aggregations by date, merchant and product category.

Additional Complexity

  • Each merchant acquires items from vendors in batches, with varying prices.
  • The effective landed cost price is a weighted rolling average recalculated daily.
  • Promotions are applied at the item level.

Source Tables

Let’s define the schema of the source tables as they would appear in the production database:

Users

Contains information about the users of the app.

Column NameTypeDescription
idVARCHARPrimary Key
nameVARCHARName of the user
emailVARCHAREmail of the user
latFLOATLatitude of the user
longFLOATLongitude of the user
createdAtTIMESTAMPTime when the user was created
updatedAtTIMESTAMPTime when the user was last updated

Merchants

Contains information about the merchants who store and sell items to users.

Column NameTypeDescription
merchant_idVARCHARPrimary Key
merchant_nameVARCHARName of the merchant
geo_locGEOGRAPHYLatitude/Longitude of the merchant
createdAtTIMESTAMPTime when the merchant was created
updatedAtTIMESTAMPTime when the merchant was last updated

Items

Contains information about the items available for sale.

Column NameTypeDescription
item_idVARCHARPrimary Key
merchant_idVARCHARForeign Key referencing merchant_id in merchants table
nameVARCHARName of the item
categoryVARCHARCategory of the item
priceFLOATBase selling price of the item
created_atTIMESTAMPTime when the item was created
updated_atTIMESTAMPTime when the item was last updated

Promotions

Contains information about the promotions applied to items.

Column NameTypeDescription
promotion_idVARCHARPrimary Key
product_idVARCHARName of the item to which promotion is applied
discountFLOATPercentage discount applied to the item
created_atTIMESTAMPTime when the promotion was created
updated_atTIMESTAMPTime when the promotion was last updated

Inventory

Contains information about the batches of items acquired by merchants.

Column NameTypeDescription
inventory_idSERIALPrimary Key
merchant_idINTForeign Key referencing merchant_id in merchants table
item_idINTForeign Key referencing item_id in items table
qtyINTQuantity of the item in inventory
date_acquiredDATEDate when the item was acquired
costDECIMALCost price of the item
created_atTIMESTAMPTime when the inventory record was created
updated_atTIMESTAMPTime when the inventory record was last updated

Orders

Contains information about the orders placed by users.

Column NameTypeDescription
order_idSERIALPrimary Key
customer_idINTForeign Key referencing user_id in users table
store_idINTForeign Key referencing merchant_id in merchants table
dateTIMESTAMPDate and time of the order
created_atTIMESTAMPTime when the order was created
updated_atTIMESTAMPTime when the order was last updated

Order Items

Contains information about the items in each order.

Column NameTypeDescription
order_item_idSERIALPrimary Key
order_idINTForeign Key referencing order_id in orders table
product_idINTForeign Key referencing item_id in items table
qtyINTQuantity of the item in the order
created_atTIMESTAMPTime when the order item was created
updated_atTIMESTAMPTime when the order item was last updated

Next Steps

In this section, we have defined the source tables that form the basis of our data model.

You’ll notice that there are some irregularities in the way columns are named, or in terms of how information like locations are stored. This is quite common in real-world data sources.

In the next section, we will create the staging layer where we can make sure all columns follow the correct data types and nomenclature.