Finding duplicate rows

Duplicate rows are those pesky duplicates of data that sometimes sneak into your table without you even realizing it. By identifying and dealing with these duplicates, you can ensure the data accuracy and integrity, avoid errors in calculations, and get more reliable insights. We'll walk you through an efficient method to identify and delete duplicates in any table.

Step 1: Define “duplicate”

Identify which columns need to match for rows to be considered “duplicates”. For example:
In this table, you see that the rows share identical values in the asset column, but have different “asset types”. In this step, you need to decide if you’d consider these rows duplicates. Is it a duplicate if just the asset name matches, or does it need to match in multiple columns? Which columns must it match in?

Step 2: Create a new “matching rows” column

Set the column type to be text.

Step 3: Add a calculation

Add a calculation to the column to show any rows that are duplicates of the row, per your definition from step 1. The formula should look this:
thisTable.Filter(CurrentValue != thisRow AND [COLUMN NAME]=thisRow.[COLUMN NAME])

COLUMN NAME is the name of the column you’ve identified as the column that should match in duplicate rows. If multiple columns should match, chain another AND [COLUMN NAME]=thisRow.[COLUMN NAME] into the filter.

Step 4: Delete duplicates

Using the information in the duplicates column, verify that flagged rows are indeed duplicates, then delete the extra rows.

Step 5: Address the duplicates column

Once you’ve de-duped your table, it’s time to decide what to do with the duplicates column. You can choose to delete it, as it has served its purpose and is no longer needed! Or, you can hide the column instead. This option is great if you may want to check the same table for dupes again in the future. You can unhide the column in the future to quickly check for dupes without having to recreate the column/formula again.

Was this helpful?

YesNo