As data-driven marketers, we need to constantly combine our marketing experience with data insights to make better business decisions. Testing has therefore become a fundamental part of optimizing marketing programs. For example, a marketer designing a test will rely on their experience to determine what variable(s) to focus on in that test, like the offer, CTA, or information architecture on a page. Once the variable(s) are picked, one can conduct an A/B or multivariate test to get the data needed to evaluate the best performing variable(s). Setting a significance level for a marketing test helps us gauge the reliability of the insights we get out of the test results.
While we all find comfort in the results of a good marketing test, it’s possible that the test is leading us astray. The “definitive” results that cause us to select one option over another could just be random chance or the effect of an unrelated event. This could lead to either undermining or inflating the results of test. In order to avoid this trap, we need to be mindful of a test’s statistical significance, a parameter that you set when designing a test to help ensure that the results you see in your test will match what happens if you roll out the changes broadly.
What is statistical significance in testing?
Statistical significance in hypothesis testing helps you determine whether the results of your test are real and not random for a given significance level. A test is said to be statistically significant when the result of an experiment (i.e. the p-value) is less than or equal to the significance level predetermined for the test. Let me illustrate the definition with a hypothetical example:
Imagine conducting an A/B test on a sample set to see if there is a relationship between eating a cupcake and gaining weight. The control group (null hypothesis) has a regular diet, while the variation group (alternate hypothesis) has an additional cupcake with the regular diet. A 95% statistically significant test implies that out of 100 people selected at random, there is the probability that at least 95 of them will gain weight if they eat a cupcake. In other words, the higher the statistical significance in a test, the more likely it is that the population will behave like the test sample and show the same results as the test.
What level of significance should I use for my marketing test?
Generally, the higher the required statistical significance, the more time it will take to complete a test, limiting how many tests you can run or how quickly you can implement results. Academia usually recommends at least 95% statistical significance for any test process as accuracy is given the highest priority for their research. But that doesn’t mean you need to adhere to the same requirement in every test you conduct.
Instead of always setting statistical significance at 95%, you can set an appropriate level given the nature of the test, taking into consideration factors like risk tolerance, time and budget.
Your risk tolerance: If you are conducting a test to validate “the accuracy of a pregnancy test”, you will want to have a statistical significance level closer to 100% because a lower significance level increases the chances of false positives (where you tell someone they are pregnant when they aren’t). In this case, a false positive has a direct impact on the product’s brand value and customer satisfaction. At the same time, if you want to conduct an A/B test to pick the best image for your website, you might reduce the significance level to 80% as you could be more tolerant to a false positive when selecting a winning image, since the change can easily be reversed. In other words, the higher the risk, the higher you should set your statistical significance.
Time: Your statistical significance for the test is going to decide how long your test will run. A 95% statistical significance is going to take longer and is also going to be more accurate than an 80% statistical significance test. As a marketer designing the test, you might need to hit a certain deadline. For example, if you are conducting a test to decide the price point of a product before its launch, you would need to design the test to ensure it ends before the launch date. In such cases, you might need to trade off statistical significance for a shorter test as the importance of product launch precedes the statistical test.
When making this tradeoff, make sure that the test runs long enough to incorporate seasonality. The rule of thumb is to allow a test to run for at least 1 full week, even if it reaches the required sample size for your statistical significance requirement before then. Once the statistical significance and the test duration are finalized, make sure that the test runs for the allocated period.
Budget: Time is money! Longer tests can give you more accurate results but, they can also cost more, reducing the budget that you have available to run additional tests. As a marketer, you need to plan your tests keeping in mind your budget and the opportunity cost for your business decision.