icon picker
SQL & Materialization Style Guide


This style guide is intended to standardize how the CAX team writes production code. Ad-hoc queries can be written in whatever format pleases the developer. Once the code needs to go in production, the style guide should be applied.
In order to make applying these styling principles easier for developers, there is a SQL linter called that will easily apply these changes to whatever file a dev is changing.

SQL Fluff

SQLFLuff is a SQL linter that’s used to define the basic structure and style of the SQL that we write, and move the review and style of that structure into the hands of the authors.
In order to utilize SQL fluff you’ll need to install it via pip , pypi , or your preferred package installer.
You can confirm its installation by getting SQLFluff to show its version number.

How to use

SQL Fluff has two core commands that are executed on a file
1) The first is lint which scans the file, or file path, you’ve designated for any SQL styling that doesn’t conform to the configured rules and prints out in your terminal which rules are not being followed
2) The second is fix which applies the correct styling to your file
We have added our own sqlfluff config file to the app-compliance-etl Squarewave repository so in whatever branch you’re working in you can run the sqlfluff commands to apply them to your production files.
If you don’t see a .sqlfluff config file in your app-compliance-etl local repo, make sure to run a git pull to update your local repo with any new changes in the remote main repo.

General Principles

Optimize for readability and maintainability rather than fewer lines of code
Write code with the WET (write everything twice) programming principle in mind
It's okay to write the same logic in two different places. Once we've written it a third time that logic should be standardized in a single upstream table

Query Structure and Syntax Styling

General SQL Styling and Best Practices

Lowercase everything
This removes the need for devs to hit shift or caps lock to swap back and forth between cases
IDEs used at Block, and Github, highlight keywords and functions to make it easier for readers to delineate between keywords and non-keyword syntax
Trailing commas only
Improves readability of the code as it's the natural way lists are written
Snowflake will be support for trailing commas soon - so the common complaint around using trailing commas will soon be a thing of the past
Put each column on its own line and each SQL keyword on it's own line
If there is a distinct qualifier, put it on the same line as select
Each new column line should be indented 4 spaces
Left align select statements if they're not in a CTE

Avoid using select * outside of import CTEs - it isn't explicit as to what columns are being used and some tables could have hundreds of columns which makes debugging difficult


Utilize CTEs instead of subqueries
CTEs help to break your code into more readable blocks; ideally performing one logical body of work per CTE
CTEs should be placed at the beginning of the query
Use import CTEs to filter datasets
This can help with over ETL performance, especially when joining tables later in the script
Be explicit with CTE names. The CTE name should indicate what that code block is trying to accomplish
Indent the select statement within your CTE 1 tab, or 4 spaces
Below is an example of a properly formatted and named CTE
Use a final CTE before your final select statement.
This comes in handy when debugging a script where you have multiple CTEs since you can easily isolate the CTE that has the bug by replacing the select * from final with select * from cte_name

CASE Statements

Case statements with one when clause can be written on one line
Case statements with multiple when clauses should be written with each when clause on it's own line
If a when clause has multiple conditions the first condition should be written on the same line as the when clause, each subsequent condition should go on a new line and indented from the first line

Join Conditions

Be explicit in JOIN conditions - use inner join instead of just join
If using an inner join, apply your filtering logic in the where clause, not the join clause
Avoid using right joins - instead, restructure your query to use left joins
Put the table referenced first immediately after the on in the join condition
Joins with single conditions should be written on one line
Joins with multiple conditions should have each condition written on it's own line

Aggregation, Filtering, and Ordering

- group by column names, not numbers; same for order by - group by all is your best friend if you need to group by a lot of columns; just make sure to put your aggregates at the end of your column list so it's understood what columns are being aggregated - Each column in the group by , order by , where should have it's own line unless there is only one column being utilized
Use lateral column aliasing in your `group by`
Avoid using order by to reduce compute costs - leave that to the BI tools or Excel/GSheet user to do that
Break long lists of in values into multiple lines with one value per line

Table and Column Naming Conventions

Be explicit with column and table names
The more descriptive a column/table name is the less time someone querying, reading or updating the table will have to spend figuring it out what it is
It's okay if column and table names seem obnoxiously long - embrace being explicit
When aliasing columns use as to define the alias name
Be descriptive with alias names; using a short form version of the table name is a good place to start
Avoid using non-descriptive naming conventions such as id , name , and type - prefix these columns with what is being identified or named
Booleans should start with has_ or is_
Timestamps should end with _at
Dates should end with _date
Avoid reserved keywords like date or month as column names
Always rename aggregates

Sample SQL

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
) instead.