Archive

Author Archive

Unusual Statistical Phenomena, Part II: Stat Testing of Percentages

January 24th, 2022 Comments off

Sometimes when looking at the results from survey data, we see something that makes us say ‘huh?’ or ‘that doesn’t look right’. When the odd results persist after verifying the data were processed correctly (always a good practice), there is typically still a logical answer that can be uncovered after doing some digging. Sometimes the answer lies with something that we will call ‘unusual statistical phenomena.’  This is part 2 of a series that will look at some of these interesting – or confounding – effects that do pop up now and then in real survey research data.

This time we will look at an unusual phenomenon that can occur when doing something typically considered fairly mundane – testing for statistical significance between percentages. An example will help to illustrate this phenomenon which periodically causes us to question stat testing results.

Let’s say we have fielded the same survey for two different brands. One part of the survey collects respondent opinions of the test brand using a battery of attribute statements with a 5-point agreement scale. The base size for each survey was 300.

Stat testing was conducted between results for the two brands for Top Box percentages on each of the attribute statements. However, some of the results are questionable. Specifically, for the attribute “Is Unique and Different” Brand B’s score was higher than Brand A’s by 4 percentage points, which was statistically significant at the 90% confidence level (denoted by the “A” in the chart below); while for the attribute “Is a Brand I Can Trust” Brand B’s score was higher than Brand A’s by 6 percentage points, which was NOT statistically significant at the 90% confidence level. How could this be!

How can a difference of 4 points be statistically significant while a difference of 6 points is not, even with the same base sizes? To understand how this can happen, let’s first look at the basics of how a statistical test for comparing percentages works.

First, a t-value is computed according to this formula:

Then this t-value is compared to a critical value. If the t-value exceeds the critical value then we say that the difference between the percentages is statistically significant.  The critical value is based on the chosen confidence level and the base sizes of the samples from which the percentages were derived.

In our example, we chose the 90% confidence level for both statistical tests and the base sizes are the same, so the critical value for both tests is the same. We also know the difference between the percentages (the numerator of our equation) is what appears anomalous as the difference of 4 led to a t-value that exceeded the critical value, while the difference of 6 did not exceed the critical value. Therefore, the issue must lie with the Standard Error of the Difference.

Let’s next examine what a Standard Error represents. Our surveys were fielded among a sample of the overall population. If we sample among women 18 to 49 in the United States, we will infer that our results are representative of the entire population of interest, which is all women 18 to 49 in the United States. However, it is unlikely that the measures we compute from the sample (such as the percentage that say Brand A “is a brand I can trust”) will be exactly the same as the percentage would be if we could ask everyone in the entire population of interest.  There is some uncertainty in the result because we are asking it of only a subset of the population. The Standard Error is a measure of the size of this uncertainty for a given metric.

In our equation, the denominator is the Standard Error of the Difference between the percentages. While not precisely correct, the Standard Error of the Difference can be thought of as the sum of the individual Standard Errors for the two percentages being subtracted (the actual value will be somewhat less due to taking squares and square roots). As the graph below illustrates, the Standard Error for a percentage is a function not only of the sample size, but also of the size of the percentage itself.

Specifically, for any given sample size the Standard Error is largest for values around 50% and decreases as values approach either 0% or 100%. For a base size of 100 (the dark blue line), the Standard Error is close to 5 for percentages near 50%, but decreases close to 2 for very small or very large percentages.  You can think about this as it being harder to estimate the percent incidence of a characteristic of a population when around half the population has that characteristic versus when almost all (or almost none) of the population has that characteristic.

In our example, the percentages for Is a Brand I Can Trust are close to 50%, so at a base size of 300 the individual Standard Errors would each be a little under 3. In contrast the percentages for Is Unique and Different are around 10%, so at a base size of 300 the Standard Errors would each be around 1.5.  That’s a big difference!

It follows that the Standard Error of the Difference for Is a Brand I Can Trust would be much larger than for Is Unique and Different. In fact, the actual values are 4.08 for Is a Brand I Can Trust and 2.34 for Is Unique and Different. Again, a big difference. If we divide the differences in the percentages by these values for Standard Error of the Difference, we get t-values of 1.47 and 1.71, respectively. Given the critical value is approximately 1.65, we see that the t-value for the difference of 6 is below the critical value (hence not statistically significant); while the t-value for the difference of 4 is above the critical value (hence is statistically significant).

Hopefully this takes some of the mystery out of stat testing and helps in understanding why what can appear to be anomalous results may actually be correct.

Categories: Special Feature, Uncategorized Tags:

Do you ever look at your data and say, “huh?” The Unusual Statistical Phenomena of Simpson’s Paradox

November 2nd, 2021 Comments off

Sometimes when looking at the results from survey data, we see something that makes us say “huh?” or “that doesn’t look right”.  When the odd results persist after verifying the data were processed correctly (always a good practice), there is typically still a logical answer that can be uncovered after doing some digging.  Sometimes the answer lies with something that we will call “unusual statistical phenomena.”  This is part 1 of a series that will look at some of these interesting – or confounding – effects that do pop up now and then in real survey research data.

This time we will look at Simpson’s Paradox.  And we aren’t referring to the fact that Bart Simpson never seems to age while the rest of us do.  It is actually a phenomenon first described by the statistician Edward H. Simpson in 1951.

It’s easiest to understand this phenomenon through an example.  So, let’s say that we have two ads that have been on air, ad A and ad B.  In our tracking survey among adults 18 to 65, we will ask respondents if they recognize having seen each ad on air.  Earlier in the survey we ask Purchase Intent for the product which is featured in each of the two ads.  From these results, we will compare Top Box Purchase Intent among respondents who recognized each of the two ads.  The results in the table below show somewhat higher Top Box Purchase Intent for Ad A:

However, the client is also interested in seeing the results among each of two age groups: age 18 to 39 and age 40 to 65.  When we table those results, we find something that just doesn’t make sense.  Purchase Intent is slightly higher for Ad B among both age groups – a reversal from the overall results.  How can that be!

After verifying with data processing that the data are correct, we have our team dig into the data to figure out what is going on.  Finally, an explanation is found.

Ad B was aired heavily among programming targeted to a younger audience, while Ad A was primarily aired in general interest programming – which skews to a slightly older audience.  Hence Ad B had much higher recognition among the younger age group – and as a result, a much higher proportion of young people in the set of respondents among whom purchase intent was calculated.

The table of base sizes shown below reveals this imbalance. When combined with the younger age group’s more skeptical nature (and lower results) when it comes to Purchase Intent – especially in our category – the apparent anomaly is explained.

This is an example of Simpson’s Paradox.  It is a phenomenon in which individual subgroups all show the same trend in results, but the trend reverses when the subgroups are combined.  This occurs when there is a confounding variable that causes an imbalance in base sizes such as we saw above.  In our example, the confounding variable was the differing recognition levels for the ads among the two age groups.

Simpson’s paradox shows us the importance of knowing and understanding our data and keeping a watch out for the kind of confounding factors that could end up misleading us if we don’t account for them.

Categories: Uncategorized Tags:

Streaming Services Deep Dive… | The Brand Strength Monitor / RDE Chart of the Week | Pre/Post Pandemic Penetration and Usage

June 22nd, 2021 Comments off

In our last Chart of the Week we started looking at the Subscription Video Streaming Services category and specifically the pandemic winner – HBO Max.  This week we’re taking a deep dive and examining the Pre/Post Pandemic Penetration and Usage by demographics.

The MSW TBSM tracking service measures category penetration and level of usage as one component of the survey.  A comparison of results taken before the beginning of the pandemic to a comparable assessment from May 2021 reveals explosive growth in the Subscription Streaming Video Services category.

  • Overall, category penetration increased from 75.4% to 86.6%, which represents a 15% increase.  Moreover, claimed heavy usage increased by over half during the pandemic, from 18.5% before the outbreak to 29.2% in May 2021.  Clearly, as a result of spending much more time at home during the pandemic, people were turning to video entertainment.

 

  • Those demographic groups which have seen particularly large gains include:
    • Women, with heavy usage of streaming video services more than doubling.
    • Age 55 Plus, which had the lowest pre-pandemic penetration, but closed the gap substantially, reaching penetration of 70% in the latest reading.
    • Below Median Income, clearly also looking for entertainment options and perhaps enticed by some of the newer services, particularly lower cost “basic” plans.

 

  • Clearly these results are consistent with the surge in subscription levels reported by the major subscription streaming video services for 2020:
    • Netflix added 37 million new subscribers worldwide in 2020 – easily the largest annual increase since expanding into video streaming 14 years ago.
    • Hulu added 9 million new subscriptions, a 29.6% year-over-year increase.
    • While Disney+ debuted with very strong numbers pre-pandemic, growth continued to surge through the pandemic. The service hit 100 million subscribers in March 2021; a remarkable feat for a service just over a year old which initially hoped to reach 60 to 90 million subscribers by 2024.
    • At the end of the first quarter of 2021, HBO and HBO Max totaled 44.2 million domestic subscribers – far exceeding the 33.1 million subscribers a year ago (before HBO Max).

 

While subscriber growth is seen to be slowing with the easing of the pandemic, it is clear the pandemic accelerated the trend toward the use of streaming services, particularly among those groups which had been slower to adopt the technology.