If you don’t want to read this mini-tutorial of tables in Excel and Coda, I recorded a video below for you to watch:
What Most People Do In Excel
Like many beginners to Excel, I thought Excel was meant for storing lists of data. Stuff like your grocery lists, books you want to read, and maybe a personal budget. We all know Excel, today, is super powerful with features like PowerBI and
are probably utilizing a fraction of Excel’s power and capabilities. I couldn’t find stats on feature usage in Excel, but from seeing hundreds of Excel files over the years, I will posit that a majority of people use Excel for one purpose: lists.
There is nothing wrong with lists. They make sense. The grid-like nature of Excel provides a solid framing for lists.
But once I discovered functions like VLOOKUP(), OFFSET(), INDEX(), and MATCH(), I found that I wasn’t creating standalone lists anymore. The paradigm became connecting lists together. As my Excel knowledge grew, I couldn’t see a world where your lists of data were not related in some way. If you have a list of product orders, at some point you might want to know the name of the customer who made the order, and that customer name will probably come from a Customers list and you pull in the name via VLOOKUP().
Is It A Database Or A List?
For anyone who has built a model or dashboard in Excel and wanted to make it scalable so it could handle new rows of data, you have probably used
with the OFFSET() function. I won’t go into detail on how the formula works, but it’s sort of a hack to account for new rows of data that get added to your spreadsheet and you want your formulas and charts to “pick up” the new data.
Numerous hacks like these exist to make your file feel more like an application. Hacks for getting filtered data to show up in the right place with random IF and IFERROR()formulas thrown into your file. Hacks for referencing the correct columns for VLOOKUP() formulas in the event new columns get added to your dataset.
And then I discovered SQL.
Instead of referring to a row or column reference, this new formula language just requires you to know the name of the column. It doesn’t matter how long your list is or how many rows get added hourly or daily; the SELECT statement will pull all the rows that meet your criteria.
In my head, SQL statements are easier to understand, but Excel formulas are easier to write. As I’m building my connected lists in Excel, am I really just creating a mini-database, but I just happen to be querying my data using hacked-up formulas? Are Excel formulas the most efficient or right way to pull your data, or am I just familiar enough with the formulas that the hacked-up solution feels right? This is still a question that I am struggling with, and hopefully the examples below will show the paradigm shift I think many Excel users will grapple with and how Coda’s formula language challenged my beliefs about Excel’s formula language.
The Simplicity of a VLOOKUP()
You have two lists of Employees and Tax Rates below (I was inspired by the example Joel Spolsky used in his Excel tutorial, watch it
If you want to calculate the Total Tax $ column, you need to know the Tax Rate %. Your first instinct would be to write a VLOOKUP() formula in cell F5 and drag that formula down to cell F14:
That’s what I love about the workhorse of VLOOKUP(). It only requires a few parameters and you can quickly pull in data from other lists if you have a unique identifier in that lookup table (in this case, it’s the City).
As I stated in the beginning of this post, tables in Excel really change the game on how you reference data. While tables surfaced in Excel 2007, I’ve rarely seen them used in various files I’ve worked on during my days at Google and as a freelance consultant. You probably noticed that the Employee and Tax Rates lists are indeed tables with the dropdown arrows in the header rows and the alternating row colors. If you write the VLOOKUP() formula in this table you get a formula that looks like this:
When you move your cursor over to column E to reference the location, you actually refer to the entire column name similar to SQL. Similarly, instead of referencing $B$21:$C$23, you reference the TaxRates table (notice how when I select a cell in this range the TaxRates table name appears in the top left):
Another key feature about writing formulas in tables is that they will automatically fill down to all the empty cells below. Here is what happens when you try writing the new VLOOKUP() formula in the Employees table:
Why is the reference to the Location column as [@Location] and the tax rates as TaxRates[#All] important? These references now are objects that you can reference elsewhere in your Excel file. Gone are the days where you need to know the exact cell reference and/or naming ranges of cells with a unique name. Another SQL-like feature is that you can add rows to either the Employees or Tax Rates table, and those rows automatically become “part” of the table:
If we look inside the Total Tax $ column, you’ll notice it also contains the “@” references to the Salary and Tax Rate % columns instead of individual cell references:
A few things I don’t like about this solution:
You still have to enter in the column number you want to pull into your main table (e.g. if new columns are added to the Tax Rates table you’ll have to change the “2” to the new rate column)
Auto-filling the formulas down through the empty cells is great, but if you accidentally delete an individual cell in the Tax Rate % column, you have to copy and paste the formula again from another cell that contains the table formula.
I think Excel encourages you to type in the column and table names, but the familiar behavior for most VLOOKUPers is to move your cursor to the column or lookup table. Typing in the column and table references requires you to remember the exact syntax of using the “@” and “#” symbol with various brackets (although there is a bit of autocomplete to help you along the way):
Writing this formula was painful…
Finally, the ideal solution to create a scalable formula to link two tables together should use a combination of INDEX() and MATCH(), which I describe in the next section.
The Many Ways to Lookup
The debate is as old as evolution vs. creationism (ok not that old). When do you use VLOOKUP() and when to use INDEX/MATCH? There are countless
about this topic. My simple rule is this: if you have more than 50,000 rows of data and care about performance and speed, use INDEX/MATCH.
In our simple example of Employees and Tax Rates, a VLOOKUP() does what we need but that’s because our tables are small and the performance difference is negligible. If this was a list of millions of employees, we would have a problem with VLOOKUP(). So if we want the most optimized solution, let’s try out the INDEX/MATCH method:
I didn’t even bother trying to type in the column and table references since it would’ve been a pain to remember where to use the brackets, “@” symbols, etc. I just referenced the columns and ranges of cells the “old-fashioned” way by using the cursor. So, when it’s all said and done, the most optimized formula to lookup the tax rate is:
INDEX/MATCH is considered an “advanced” Excel formula that most Excel users won’t use because it requires you to nest the MATCH() function inside the INDEX function
Notice how there is a 0 at the end after TaxRates[City]? This is a weird flag that also plagues the VLOOKUP() formula indicating the match_type you want Excel to use. I have never seen a real-life scenario where you use a match_type of true (e.g. “1”). If you don’t use the 0, Excel finds the first closest match based on alphabetical order, which leads to completely wrong results for tax rates:
When To Link Your Tables Together
Source: Giphy (couldn’t help myself by not using a Zelda reference in this section)
While I consider tables in Excel to be super powerful, there are a few usability issues that make them less approachable to regular Excel users who are linking lists of data together with good ‘ol VLOOKUP() and cell/range references.
Tables completely changed my mind on how lists are linked, but I still rarely use them in my files due to the set up that’s required. To provide some more perspective, I would only convert my lists into tables if I knew that other people were using my file. I this were the case, I would want to build a scalable solution that accounts for the following:
Non-fixed number of rows being added to the file hourly, daily, weekly
Adding new columns does not affect functionality of VLOOKUP() (partially solved with INDEX/MATCH)
Inability for teammates to delete random cells in a column that contains formulas
If I am just using the file for personal use, the extensibility of the tables feature becomes less interesting since I am typically only using Excel for one-off data analysis. If your team has an Office 365 subscription where you have all the latest and greatest sharing features in Excel Online, tables might be the the right solution since your team is putting collaboration first.
Tables In Coda
Until I discovered Coda, my “worldview” of linking tables together consisted of two paradigms:
Using VLOOKUP() and INDEX/MATCH
JOIN statements in SQL
Let’s take the same data we’ve been working with and see how the Tax Rate % column would be pulled in via tables in Coda:
The Coda formula is obviously shorter, but more importantly, is easier to understand for the end user. The difficulty, as with any formula or programming language, is understanding the syntax to lookup your data.
However, much like the mental leap I took when discovering VLOOKUP(), I cannot see a case where I would want to use VLOOKUP() in my main table when I can reference columns as easily as Coda’s formula language affords.
Your Tables, Your Team
Table layouts from the open office plan in the 60s have evolved, shouldn’t your use of data tables too (just full of jokes today)?
Tools and platforms that change fundamental beliefs about the way you can pull, sort, and organize data are few and far between. You have Excel formulas, SQL, various programming languages, and the more recent NoSQL movement.
Aside from Excel’s formula language, no other language connects the visual feedback from writing a formula with the back-end processing that the formula triggers. Why should pulling and sorting data require you to be an Excel guru or SQL expert? Teams are much more fluid with how their data is managed and stored, and the way teams access the data should be just as flexible and on-demand like that of other business applications.