The Likelihood of Being TRUE? (Scientist, Consultant, or Truth Seeker)

Posted on September 4, 2013


Clients want to know “What is true?”, while statistical tests offer the likelihood that a relationship in the data could have been “caused by random error.”  The fact that the question and answer don’t line up is usually responded to by quoting textbook doctrine about what statistics can demonstrate and is then followed by adopting one of two postures:

Posture 1: I’m a Scientist

(i.e. I only provide answers I can prove.  Questions requiring something else must be answered by the “less pure.”)

Posture 2: I’m a Consultant

(i.e. I live in the real world.  It’s close enough.)

While characterized in a humorous and perhaps extreme way, the above two postures are usually not far from what we’ve all seen in real life.  But let’s assume that the readers of this article have a genuine interest in answering the client question.  So, what is the probability that an estimated relationship is “true”?


Okay, p-values don’t represent truth, but they’re a step in the right direction.  Let’s take a look at how far they take us.  The p-value represents the chance that a coefficient could have resulted from random variation.  Hence, a low p-value mostly rules out random measurement error, which is a habitual problem.  But let’s compare some identical p-values obtained from different tests.

Finding 1: p-value of .05 – Prices are negatively correlated with sales

Finding 2: p-value of .05 – People who bought a 2002 Ford Fiesta, but not a 2001 nor 2003, are more likely to buy a 2013 BMW

So what is the probability that each of these data relationships is consistently true in real life?

% Data Relationship is True = %A / (%A + %B)

% A = chance that real world relationships could have caused the data relationship

% B = chance that random variation, or other forms of false attribution, could have caused the data relationship

If you’ve seen the same relationship many times, like in the pricing example, the A value is probably pretty big, and the B value, as measured by p-value, is only 5%, so the chance of the data relationship reflecting reality is quite high.

On the other hand, Finding 2 has no logical pattern in the dates and connects brands with little in common, so while the possibility of a true relationship is not known, it’s reasonable to believe that it’s quite small.  I’d judge it to be less than 1%.  If my judgment of 1% is correct, and the p-value is 5%, that means there is a 5 to 1 chance that the data relationship is a false positive.


It’s useful both in terms of INFERENCES and STUDY DESIGN.


When you infer relationships from statistical evidence, it is reasonable to demand much more evidence before accepting an apparently unlikely relationship as truth.  Conversely accepting more liberal p-values for strongly believed relationships is also reasonable, but remember that no matter how true a relationship maybe in real life, your data might provide a poor representation of that relationship, which is insufficient to predict future events.  Therefore, it may be unwise to lower your statistical criterion too far if your purpose is forecasting.

Study Design:

If you want to assess impacts that are expected to be small relative to the level of market noise, then you should expect a high percentage of false positives and extreme estimates.  Just throwing everything (all possible predictors) in and “seeing what sticks” is likely to be a bad idea as most of the “statistically significant” results will be overwhelmed by false positives.  Such circumstances demand the savvy use of statistical diagnostics to keep from getting fooled.  Two approaches are to try pooling estimates over cross-sections or to measure aggregate effects that are expected to be stronger relative to the market noise level.


* If you liked this article. I’d appreciate your making a quick comment.  Your vote of approval is nice to hear and always appreciated.