Clients want to know “What is true?”, while statistical tests offer the likelihood that a relationship in the data could have been “caused by random error.” The fact that the question and answer don’t line up is usually responded to by quoting textbook doctrine about what statistics can demonstrate and is then followed by adopting one of two postures:
Posture 1: I’m a Scientist
(i.e. I only provide answers I can prove. Questions requiring something else must be answered by the “less pure.”)
Posture 2: I’m a Consultant
(i.e. I live in the real world. It’s close enough.)
While characterized in a humorous and perhaps extreme way, the above two postures are usually not far from what we’ve all seen in real life. But let’s assume that the readers of this article have a genuine interest in answering the client question. So, what is the probability that an estimated relationship is “true”?
TRUTH
Okay, p-values don’t represent truth, but they’re a step in the right direction. Let’s take a look at how far they take us. The p-value represents the chance that a coefficient could have resulted from random variation. Hence, a low p-value mostly rules out random measurement error, which is a habitual problem. But let’s compare some identical p-values obtained from different tests.
Finding 1: p-value of .05 – Prices are negatively correlated with sales
Finding 2: p-value of .05 – People who bought a 2002 Ford Fiesta, but not a 2001 nor 2003, are more likely to buy a 2013 BMW
So what is the probability that each of these data relationships is consistently true in real life?
% Data Relationship is True = %A / (%A + %B)
% A = chance that real world relationships could have caused the data relationship
% B = chance that random variation, or other forms of false attribution, could have caused the data relationship
If you’ve seen the same relationship many times, like in the pricing example, the A value is probably pretty big, and the B value, as measured by p-value, is only 5%, so the chance of the data relationship reflecting reality is quite high.
On the other hand, Finding 2 has no logical pattern in the dates and connects brands with little in common, so while the possibility of a true relationship is not known, it’s reasonable to believe that it’s quite small. I’d judge it to be less than 1%. If my judgment of 1% is correct, and the p-value is 5%, that means there is a 5 to 1 chance that the data relationship is a false positive.
HOW IS ALL THIS USEFUL?
It’s useful both in terms of INFERENCES and STUDY DESIGN.
Inferences:
When you infer relationships from statistical evidence, it is reasonable to demand much more evidence before accepting an apparently unlikely relationship as truth. Conversely accepting more liberal p-values for strongly believed relationships is also reasonable, but remember that no matter how true a relationship maybe in real life, your data might provide a poor representation of that relationship, which is insufficient to predict future events. Therefore, it may be unwise to lower your statistical criterion too far if your purpose is forecasting.
Study Design:
If you want to assess impacts that are expected to be small relative to the level of market noise, then you should expect a high percentage of false positives and extreme estimates. Just throwing everything (all possible predictors) in and “seeing what sticks” is likely to be a bad idea as most of the “statistically significant” results will be overwhelmed by false positives. Such circumstances demand the savvy use of statistical diagnostics to keep from getting fooled. Two approaches are to try pooling estimates over cross-sections or to measure aggregate effects that are expected to be stronger relative to the market noise level.
CARE TO SHARE A REAL LIFE EXAMPLE?
* If you liked this article. I’d appreciate your making a quick comment. Your vote of approval is nice to hear and always appreciated.
David Young
September 25, 2013
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: The R Project for Statistical Computing
Discussion: The Likelihood of Being TRUE? (Scientist, Consultant, or Truth Seeker)
Study design also implies that you should choose study subjects according to traits that you expect to see. This can imply a kind of inverse binomial sampling subject to budget constraints. For example, the study I am involved in had to pick subjects based on both ethnicity and gender, since both are important markers of disease probability. In addition, controls who are disease free but matched on age, gender and ethnicity had to be selected. It helps to know a little about what you are looking for before you begin, that helps to eliminate the shotgun approach to prediction.
By Roy W. Haas, Ph.D.
David Young
September 25, 2013
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: Marketing Mix
Discussion: The Likelihood of Being TRUE? (Scientist, Consultant, or Truth Seeker)
I concur with David on inference. Sale is a very complex function where a very large number of factors influence final outcome (sale). In Marketing mix analysis, we try to capture most dominant relationships. Therefore, an analyst has to be cautious about mis-specification and spurious relationships. All serious practitioners prefer to see that extra evidence to accept any counter intuitive result.
Problem of spurious relationships is more common with weak variables (very low and sparse activity). In practice we see all sorts of p values for such doubtful estimates. Whereas strong variables (fairly consistent activity and substantial volume) generally appear with very good p values. Therefore, when we try to capture impact of activities at a very granular level, chances of getting spurious reads increases substantially. In such cases, it is always advisable to model with an aggregated variable and then try to understand the impact at a granular level through post modelling analysis. In a nutshell, a good market mix analysis demands a sound judgment about the trade-off between high level estimate and granularity, about all counter intuitive estimates (extra evidence) and overlay of extraneous information to get deeper insights. Thus market mix analysis remains a mix of science and art where very experienced econometricians and marketers draw business inferences. The modern trend of getting things done in an analytic factory with poorly trained analysts does not serve market mix analysis well.
By Sanjay Sharma
David Young
September 25, 2013
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: Next Gen Market Research (NGMR) – The Best MR Networking Group on the Web! N
Discussion: The Likelihood of Being TRUE? (Scientist, Consultant, or Truth Seeker)ext
I’d like to agree with, and even strengthen, Roy’s last sentence. Not only should you know something about the relationships among the data going in, you should have planned for and expected to find them.
What gets lost in much of the testing that gets done in Marketing Research in general, is that these statistical tests were originally designed to uncover the “truth” in experiments. The expected statistical approach is that one would have an hypothesis going into the study that some manipulation in the Test cell would produce a difference that would not be found in the Control cell. The null hypothesis is that there would be no difference.
What happens all too often in MR today is that some analysts are over-zealous in wanting to find “significant” differences in the data in order to make the report more interesting. Therefore, they are predisposed to latch onto any spurious result and exclaim, “Lookie what I found!” This is further compounded by crosstab software that annotates every statistical difference for every question by numerous banner points and/or ubiquitous and user-friendly stat testing packages. Between the mindset and the ease of statistical testing, analysis of a study’s results can readily turn into a fishing expedition.
What should be happening is we should be looking at test results to determine if indeed the new ad, for example, performs more favorably than the current one in generating a positive attitude toward your brand or product. And in the process, we shouldn’t be paying attention to the fact that the blue-eyed Eskimos in the study were more likely to find the woman in the ad attractive than other groups in the study.
All that being said, I don’t mean to imply that all of MR should always be held to these strict statistical standards. There is, of course, a place for inference and discovery in your data sets. After all, a typical marketing research study contains many measures, and if a consistent pattern of differences is seen for one subgroup vs. another across multiple measures, then by all means point it out and interpret it. But every “statistically significant” finding isn’t necessarily significant from a marketing point of view.
By Bruce McCleary
David Young
November 11, 2013
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: The R Project for Statistical Computing
Discussion: The Likelihood of Being TRUE? (Scientist, Consultant, or Truth Seeker)
I have another comment about finding 2. The probability of obtaining it by chance is 1 in 20. But events that have a 1 in 20 chance of happening do occur. When you buy a lottery ticket the chance that you win is very low, but someone does win. You can make a virtually infinite list of bizarre claims about consumer behaviour, like ” someone who prefers chocolate ice cream above vanilla ice cream is likely to prefer a red tie above a green tie”. Without doubt at least some of those will, when tested, turn out to have a p-value of more that 0.05. As already said in this topic, a way out is to set as condition for believing something not only a p-value threshold but also a plausible mechanism for the existence of the relation.
By Guy Bottu
David Young
November 11, 2013
Group: Consumer Insights Interest Group
Discussion: The Likelihood of Being TRUE? (Scientist, Consultant, or Truth Seeker)
Isn’t this really a Bayesian question – shouldn’t it be answered with a likelihood estimate?
Second thing is that other cases/experiments are taking place in the market at the same time that you’re taking measurements – competitors are running promotions, new products are in test marketing, banner ads are reaching hidden corners of the market. So some relationships may be real, but out of your control in addition to those that are just noise. What you want to find are levers you can pull which will have a proveable effect, not just stuff which is merely interesting. To me that means a correlation should help prioritise which levers to pull, but at some point you have to design experiments to test the levers in reality.
By Saul Dobney
David Young
November 11, 2013
@Saul, Sure you could incorporate your out of data knowledge about likelihoods into Bayesian priors if you wanted to. Personally I like to use Empirical Bayesian estimates.(i.e. taking the prior from a larger data aggregation rather than using my personal knowledge.) I do end up using judgement to draw any conclusions as above, but I find it easier to assess, if I clearly understand what the data evidence was in this case, apart from my other knowledge of the problem. I also find that on the really tricky problems I need time to think about it, and that would usually be more time than I would have devoted to setting up an initial prior when I’m required to come up with many priors for many variables. That said, I’m sure there are situations where I’d set up the priors from the start, just as you’ve suggested.