Explore

SQL & Materialization Style Guide

Objective

This style guide is intended to standardize how the CAX team writes production code. Ad-hoc queries can be written in whatever format pleases the developer. Once the code needs to go in production, the style guide should be applied.

In order to make applying these styling principles easier for developers, there is a SQL linter called

SQL Fluff⁠

that will easily apply these changes to whatever file a dev is changing.

SQL Fluff

SQLFLuff is a SQL linter that’s used to define the basic structure and style of the SQL that we write, and move the review and style of that structure into the hands of the authors.

In order to utilize SQL fluff you’ll need to install it via pip , pypi , or your preferred package installer.

You can confirm its installation by getting SQLFluff to show its version number.

How to use

SQL Fluff has two core commands that are executed on a file

1) The first is lint which scans the file, or file path, you’ve designated for any SQL styling that doesn’t conform to the configured rules and prints out in your terminal which rules are not being followed

2) The second is fix which applies the correct styling to your file

We have added our own sqlfluff config file to the app-compliance-etl Squarewave repository so in whatever branch you’re working in you can run the sqlfluff commands to apply them to your production files.

If you don’t see a .sqlfluff config file in your app-compliance-etl local repo, make sure to run a git pull to update your local repo with any new changes in the remote main repo.

General Principles

Optimize for readability and maintainability rather than fewer lines of code

Write code with the WET (write everything twice) programming principle in mind

It's okay to write the same logic in two different places. Once we've written it a third time that logic should be standardized in a single upstream table

Query Structure and Syntax Styling

General SQL Styling and Best Practices

Lowercase everything

This removes the need for devs to hit shift or caps lock to swap back and forth between cases

IDEs used at Block, and Github, highlight keywords and functions to make it easier for readers to delineate between keywords and non-keyword syntax

Trailing commas only

Improves readability of the code as it's the natural way lists are written

Snowflake will be

releasing⁠

support for trailing commas soon - so the common complaint around using trailing commas will soon be a thing of the past

Put each column on its own line and each SQL keyword on it's own line

If there is a distinct qualifier, put it on the same line as select

Each new column line should be indented 4 spaces

Left align select statements if they're not in a CTE

Avoid using select * outside of import CTEs - it isn't explicit as to what columns are being used and some tables could have hundreds of columns which makes debugging difficult

CTEs

Utilize CTEs instead of subqueries

CTEs help to break your code into more readable blocks; ideally performing one logical body of work per CTE

CTEs should be placed at the beginning of the query

Use import CTEs to filter datasets

This can help with over ETL performance, especially when joining tables later in the script

Be explicit with CTE names. The CTE name should indicate what that code block is trying to accomplish

Indent the select statement within your CTE 1 tab, or 4 spaces

Below is an example of a properly formatted and named CTE

Use a final CTE before your final select statement.

This comes in handy when debugging a script where you have multiple CTEs since you can easily isolate the CTE that has the bug by replacing the select * from final with select * from cte_name

CASE Statements

Case statements with one when clause can be written on one line

Case statements with multiple when clauses should be written with each when clause on it's own line

If a when clause has multiple conditions the first condition should be written on the same line as the when clause, each subsequent condition should go on a new line and indented from the first line

Join Conditions

Be explicit in JOIN conditions - use inner join instead of just join

If using an inner join, apply your filtering logic in the where clause, not the join clause

Avoid using right joins - instead, restructure your query to use left joins

Put the table referenced first immediately after the on in the join condition

Joins with single conditions should be written on one line

Joins with multiple conditions should have each condition written on it's own line

Aggregation, Filtering, and Ordering

- group by column names, not numbers; same for order by - group by all is your best friend if you need to group by a lot of columns; just make sure to put your aggregates at the end of your column list so it's understood what columns are being aggregated - Each column in the group by , order by , where should have it's own line unless there is only one column being utilized

Use lateral column aliasing in your `group by`

Avoid using order by to reduce compute costs - leave that to the BI tools or Excel/GSheet user to do that

Break long lists of in values into multiple lines with one value per line

Table and Column Naming Conventions

Be explicit with column and table names

The more descriptive a column/table name is the less time someone querying, reading or updating the table will have to spend figuring it out what it is

It's okay if column and table names seem obnoxiously long - embrace being explicit

When aliasing columns use as to define the alias name

Be descriptive with alias names; using a short form version of the table name is a good place to start

Avoid using non-descriptive naming conventions such as id , name , and type - prefix these columns with what is being identified or named

Booleans should start with has_ or is_

Timestamps should end with _at

Dates should end with _date

Avoid reserved keywords like date or month as column names

Always rename aggregates

Sample SQL

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.