“Look for a way in which you can accomplish 90% of what you want with only 10% of the work/effort/time. If you search hard for it, there is almost always a 90/10 solution available. Most importantly, a 90% solution to a real customer problem which is available right away, is much better than a 100% solution that takes ages to build.” - Paul Bucheit
“High upside” means that if the hypothesis of the test is proven to be true, the results for the company would be positive. The degree of upside required in order to proceed with the test, however, is relative to the cost of the test. If the test is easy and cheap to run, there is a lower bar for upside than if a test is costly and expensive to run. In product iteration, for example, it might take 20 minutes to set up an A/B test of different copy on a button in the onboarding flow. The cost is low, so the potential upside doesn’t need to be particularly high for the test to be worthwhile. In another scenario, say if someone needs to be hired and trained to fill a role in order for a test to be carried out, then the potential upside needs to be much higher.
In general, the rule of thumb I like to use is that the potential upside should be at least 5x the costs of running the test. Once it’s at 10x or more, the test becomes a no-brainer to run. Importantly, timeframe matters here: upside is measured on an annualized basis assuming the test succeeds, and downside (cost) is a fixed value based on the investment and timeframe required to get results.
The number one mistake that people make in designing tests is in not properly capping their downside – ie, the maximum amount of resources that the company will spend in order to get to the learning and conclusion. What begins as a “test” becomes an unquestioned, unexamined default that adds unjustified costs to the business. You’ll see this frequently with contractors being added to do things that on the surface make sense (ie - social media marketing, outbound email campaigns) but in reality, have never been properly examined for ROI.
A properly defined test by default has a limit to how much we are willing to spend (in money and time) to get to a result. Once this is known, the company can make a rational decision whether the potential upside justifies the capped downside. Is it smart to spend $10,000 (capped downside) to know if outbound email marketing can open a new line of referrals to the business? The answer is likely yes. But is it smart to commit to $10,000 a month in perpetuity, until someone decides otherwise? Not until you have data.
Capped downside is about being smart about the defaults and the checkpoints along the way. It’s about ensuring that by default the test will NOT continue, unless the data indicate that it should.
Poorly-executed tests also often fail to determine the criteria for success, and thus can easily lead to the default of the test continuing even if it’s not producing substantial value for the company. In a Perfect Test, the success criteria are laid out in advance so there is no tendency to justify continuing a poor outcome down the line. Remember – when running a company a lot of what we are up against is simply inertia. We tend to prefer continuance over change, especially if it involves humans and jobs. Therefore it’s extra careful to be extremely clear on what success looks like before beginning – the temptation to move the goalposts can be too great otherwise.
So now we have well-defined success criteria, a timeline, and a budget. The only thing remaining is to ensure we actually review the data we collect in a structured manner! To ensure doing so, it’s important to set up your review meeting at the time of commencing the test. This makes sure that everyone is held accountable and doesn’t forget. It also more clearly delineates the “start” and “end” date of the test in a way that can be very helpful for getting everyone on board and moving forward.
Congratulations! If your test meets the first four criteria, you are almost guaranteed to make a smart, evidence-based decision based on real data rather than intuition, instinct, or inertia. Once the decision is made, we highly encourage a culture of reporting back on test outcomes and decisions – this allows everyone to learn from our learning.
The great thing about the above criteria is that they take the risk out of running an experiment. They ensure that even if the test doesn’t produce results that warrant continuation (the null hypothesis is proven false), the outcome is still a successful one for the company: we took a shot on goal, we learned from it, and we can move on to the next one! A good rule of thumb is that about 1 in 3 tests should succeed, and each of those tests have a 5x or more return on investment, then the tests are inherently growing the business!