Will You Change Your Mind About Tables (data)?

Formulas for linking tables together in Excel vs. Coda vary greatly. In Excel, you rely on old formulas you’ve used before, but Coda’s building blocks make column referencing more intuitive.
When was the last time you changed your mind about something you fundamentally believed was true?

I am not talking about religious beliefs, but rather about a topic you consider yourself to be an expert on. For me, this happened in 2013, when I first discovered the
in Microsoft Excel.

If you don’t want to read this mini-tutorial of tables in Excel and Coda, I recorded a video below for you to watch:


What Most People Do In Excel

Like many beginners to Excel, I thought Excel was meant for storing lists of data. Stuff like your grocery lists, books you want to read, and maybe a personal budget. We all know Excel, today, is super powerful with features like PowerBI and
, but a majority of
are probably utilizing a fraction of Excel’s power and capabilities. I couldn’t find stats on feature usage in Excel, but from seeing hundreds of Excel files over the years, I will posit that a majority of people use Excel for one purpose:
lists
.

1_BwaQAvnr0cOi9d-D_K3DIA.gif


There is nothing wrong with lists. They make sense. The grid-like nature of Excel provides a solid framing for lists.

But once I discovered functions like
VLOOKUP()
,
OFFSET()
,
INDEX()
, and
MATCH()
, I found that I wasn’t creating standalone lists anymore. The paradigm became
connecting lists together
. As my Excel knowledge grew, I couldn’t see a world where your lists of data
were not related
in some way. If you have a list of product orders, at some point you might want to know the name of the customer who made the order, and that customer name will probably come from a Customers list and you pull in the name via
VLOOKUP()
.

Is It A Database Or A List?

1_oH5KDEEBgmjyu9ee1EjkuA.jpeg
Source: allyourbasearebelongtous.com

For anyone who has built a model or dashboard in Excel and wanted to make it scalable so it could handle new rows of data, you have probably used
with the
OFFSET()
function. I won’t go into detail on how the formula works, but it’s sort of a hack to account for new rows of data that get added to your spreadsheet and you want your formulas and charts to “pick up” the new data.

Numerous hacks like these exist to make your file feel more like an application. Hacks for getting filtered data to show up in the right place with random
IF
and
IFERROR()
formulas thrown into your file. Hacks for referencing the correct columns for
VLOOKUP()
formulas in the event new columns get added to your dataset.

And then I discovered SQL.

Instead of referring to a row or column reference, this new formula language just requires you to know the name of the column. It doesn’t matter how long your list is or how many rows get added hourly or daily; the
SELECT
statement will pull all the rows that meet your criteria.

In my head, SQL statements are easier to
understand
, but Excel formulas are easier to
write
. As I’m building my connected lists in Excel, am I really just creating a mini-database, but I just happen to be querying my data using hacked-up formulas? Are Excel formulas the most efficient or right way to pull your data, or am I just
familiar
enough with the formulas that the hacked-up solution feels
right
? This is still a question that I am struggling with, and hopefully the examples below will show
the paradigm shift I think many Excel users will grapple with and how Coda’s formula language challenged my beliefs about Excel’s formula language.

The Simplicity of a VLOOKUP()

You have two lists of Employees and Tax Rates below (
I was inspired by the example Joel Spolsky used in his Excel tutorial, watch it
):

1_Qy06dPVK9_JCoDOkQDigwA.png

If you want to calculate the Total Tax $ column, you need to know the Tax Rate %. Your first instinct would be to write a
VLOOKUP()
formula in cell
F5
and drag that formula down to cell
F14
:

=VLOOKUP(E5,$B$21:$C$23,2,0)

That’s what I love about the workhorse of
VLOOKUP()
. It only requires a few parameters and you can quickly pull in data from other lists if you have a unique identifier in that lookup table (in this case, it’s the City).

As I stated in the beginning of this post, tables in Excel really change the game on how you reference data. While tables surfaced in Excel 2007, I’ve rarely seen them used in various files I’ve worked on during my days at Google and as a freelance consultant. You probably noticed that the Employee and Tax Rates lists are indeed tables with the dropdown arrows in the header rows and the alternating row colors. If you write the
VLOOKUP()
formula in this
table
you get a formula that looks like this:

=VLOOKUP([@Location],TaxRates[#All],2,0)

When you move your cursor over to column E to reference the location, you actually refer to the entire
column
name similar to SQL. Similarly, instead of referencing
$B$21:$C$23
, you reference the
TaxRates
table (notice how when I select a cell in this range the
TaxRates
table name appears in the top left):

1_GhfDfVsf5qdkiv86MwnZNA.gif

Another key feature about writing formulas in tables is that they will automatically fill down to all the empty cells below. Here is what happens when you try writing the new
VLOOKUP()
formula in the Employees table:

1_XPyB8DZIfSdKtH23Lhqzrw.gif

Why is the reference to the Location column as
[@Location]
and the tax rates as
TaxRates[#All]
important?
These references now are
objects
that you can reference elsewhere in your Excel file.
Gone are the days where you need to know the exact cell reference and/or naming ranges of cells with a unique name. Another SQL-like feature is that you can add rows to either the Employees or Tax Rates table, and those rows automatically become “part” of the table:

1_oLhDdCLX7gPnf0eqzDZVYg.gif

If we look inside the Total Tax $ column, you’ll notice it also contains the “@” references to the Salary and Tax Rate % columns instead of individual cell references:

1_sNZfz11TxOqNCFE-NNgULQ.png

A few things I don’t like about this solution:
You still have to enter in the column number you want to pull into your main table (e.g. if new columns are added to the Tax Rates table you’ll have to change the “2” to the new rate column)
Auto-filling the formulas down through the empty cells is great, but if you accidentally delete an individual cell in the Tax Rate % column, you have to copy and paste the formula again from another cell that contains the table formula.
I think Excel encourages you to type in the column and table names, but the familiar behavior for most VLOOKUPers is to move your cursor to the column or lookup table. Typing in the column and table references requires you to remember the exact syntax of using the “@” and “#” symbol with various brackets (although there is a bit of autocomplete to help you along the way):

1_5Vh38s-LNP-g82Am5HoAww.gif
Writing this formula was painful…

Finally, the ideal solution to create a scalable formula to link two tables together should use a combination of
INDEX()
and
MATCH()
, which I describe in the next section.

The Many Ways to Lookup

1_32Tfsm5tGSgn5jB6aAjeBw.png
Source: trumpexcel.com

The debate is as old as evolution vs. creationism (ok not that old). When do you use
VLOOKUP()
and when to use
INDEX/MATCH
? There are countless
about this topic. My simple rule is this: if you have more than 50,000 rows of data and care about performance and speed, use
INDEX/MATCH
.

In our simple example of Employees and Tax Rates, a
VLOOKUP()
does what we need but that’s because our tables are small and the performance difference is negligible. If this was a list of millions of employees, we would have a problem with
VLOOKUP()
. So if we want the most optimized solution, let’s try out the
INDEX/MATCH
method:

1_5L5EnuYxm1-EiinbFa5bxQ.gif

I didn’t even bother trying to type in the column and table references since it would’ve been a pain to remember where to use the brackets, “@” symbols, etc. I just referenced the columns and ranges of cells the “old-fashioned” way by using the cursor. So, when it’s all said and done, the most optimized formula to lookup the tax rate is:

=INDEX(TaxRates[Tax Rate],MATCH([@Location],TaxRates[City],0))

A few things I don’t like about this solution:
It’s long AF.
INDEX/MATCH
is considered an “advanced” Excel formula that most Excel users won’t use because it requires you to nest the
MATCH()
function inside the
INDEX
function
Notice how there is a 0 at the end after
TaxRates[City]
? This is a weird flag that also plagues the
VLOOKUP()
formula indicating the
match_type
you want Excel to use. I have never seen a real-life scenario where you use a
match_type
of
true
(e.g. “1”). If you don’t use the 0, Excel finds the first closest match based on alphabetical order, which leads to completely wrong results for tax rates:

1_Nd63jc4irM0Pp3uRsQrZ1g.gif

When To Link Your Tables Together

1_oPCCxV7bIDrKq9PIJHgP6Q.gif
Source: Giphy (couldn’t help myself by not using a Zelda reference in this section)

While I consider tables in Excel to be super powerful, there are a few usability issues that make them less approachable to regular Excel users who are linking lists of data together with good ‘ol
VLOOKUP()
and cell/range references.

Tables completely changed my mind on how lists are linked, but I still rarely use them in my files due to the set up that’s required. To provide some more perspective, I would only convert my lists into tables if I knew that
other people were using my file
. I this were the case, I would want to build a scalable solution that accounts for the following:
Non-fixed number of rows being added to the file hourly, daily, weekly
Adding new columns does not affect functionality of
VLOOKUP()
(partially solved with
INDEX/MATCH
)
Inability for teammates to delete random cells in a column that contains formulas
If I am just using the file for personal use, the extensibility of the tables feature becomes less interesting since I am typically only using Excel for one-off data analysis. If your team has an Office 365 subscription where you have all the latest and greatest sharing features in Excel Online, tables might be the the right solution since your team is putting collaboration first.

Tables In Coda

Until I discovered Coda, my “worldview” of linking tables together consisted of two paradigms:
Using
VLOOKUP()
and
INDEX/MATCH
JOIN
statements in SQL
If you are approaching this problem from a programming perspective, there are numerous ways you can link data together in Javascript, Python, etc. But if you want to give the
average business user
a way to accomplish this basic task in a platform that feels approachable and intuitive, Coda’s formula language is the right implementation of formulas + visuals.

Let’s take the same data we’ve been working with and see how the Tax Rate % column would be pulled in via tables in Coda:

1_nCwcL0ivsiCJD85G35hLbQ.gif

A few things to note in the gif above:
The Location column is a
column format that creates a dropdown of all the cities you can select from the Tax Rates table
As you write the formula in the Tax Rate % column, there is auto-complete with column names which you can TAB over to auto-complete the formula (no pesky “@” or brackets to type)
Since the Location column in our main table is a lookup to the Tax Rates table, we can reference the Tax Rate column from this lookup table easily by just writing
[Tax Rate]
in our formula
When it’s all said and done, the formula to pull the tax rate in the most optimized way is:

=Location.[Tax Rate]

The formula in Excel is:

=INDEX(TaxRates[Tax Rate],MATCH([@Location],TaxRates[City],0))

The Coda formula is obviously shorter, but more importantly, is easier to understand for the end user. The difficulty, as with any formula or programming language, is understanding the syntax to lookup your data.

However, much like the mental leap I took when discovering
VLOOKUP()
, I cannot see a case where I would want to use
VLOOKUP()
in my main table when I can reference columns as easily as Coda’s formula language affords.

Your Tables, Your Team

1_J9qLadpUzYc39WZHTFjjMw.jpeg
Source: morganlovell.co.uk
Table layouts from the open office plan in the 60s have evolved, shouldn’t your use of data tables too (just full of jokes today)?

Tools and platforms that change fundamental beliefs about the way you can pull, sort, and organize data are few and far between. You have Excel formulas, SQL, various programming languages, and the more recent NoSQL movement.

Aside from Excel’s formula language, no other language connects the
visual feedback
from writing a formula with the
back-end
processing that the formula triggers. Why should pulling and sorting data require you to be an Excel guru or SQL expert? Teams are much more fluid with how their data is managed and stored, and the way teams access the data should be just as flexible and on-demand like that of other business applications.

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.