Texas Sharpshooter Fallacy
- Patterns in Data
- Clustering Illusion
- Swedish Power Lines
The Texas Sharpshooter & Swedish Power Lines
Welcome back, hopefully you had a chance to read the previous piece on the Inherent Conflict with Betting Content. Today I want to share the story of a Texas Sharpshooter who may be able to assist us with our betting decisions and why Swedish Power Lines are critical to show the pitfalls of data mining.
The Texas Sharpshooter walks outside of his home and stands near his barn. He then starts shooting at the side of the barn and many bullet holes appear where the bullets pierced the wood. After shooting over 100 bullets, he puts his gun into his holster and goes to grab some paint. Observing the position of all the bullet holes, the Sharpshooter then paints a bullseye around a small cluster of 4 bullet holes and ignores the remaining bullet holes that have sprayed wildly across all corners of the barn wall. The man proceeds to show his wife the incredibly packed cluster of bullet holes inside the bullseye and explains how he is a fine shooter.
Is he really a Sharpshooter?
This fallacy outlines the way in which certain interpretations of data can lead to false conclusions and how we observe patterns where they do not exist. By only focusing on the small cluster of bullet holes, it can lead to the erroneous conclusion that we have a Sharpshooter on our hands. Any person analyzing the entire barn wall (the entire data set) could tell you that this person has limited ability; however, if you only analyze the cluster, you may decide that he is indeed a Sharpshooter.
Are there Profitable Patterns?
As the Texas Sharpshooter showed us, Clustering Illusion and misinterpretation of trends and/or patterns can cause problems for those betting, especially when these are the basis of our decisions. A quick search on Twitter and you can find any number of tweets that show impressive win or against the spread records, or amazing streaks, with less than impressive rationale (if it exists at all) that underlies the content of such tweets.
This is not an attack on any person, company or approach, rather, this is for discussion purposes and I hope some fruitful discussions arise.
It strikes me that the description above is akin to the cluster of bullet holes and is (or very likely is) a random trend rather than a profitable betting approach to follow moving forward. We don’t need to get into correlation and causation in detail here, but the challenge with approaches such as this is that it is data mining (others may call ‘data dredging’). Making decisions after the data is gathered is problematic, especially with no hypothesis going into the data analysis, and can cause people to believe that there is a causal relationship that will be helpful moving forward.
Nuggets like this tweet resonate with the way we are wired. Our first instinct and response is not always to react with, “this does not capture all the relevant data needed to make a sound betting decision” or “of course that is an arbitrary end point and is not helpful”.
We are susceptible to recency bias, small sample size and a myriad of other cognitive and unconscious biases that may lead to betting decisions being made erroneously (if we want to be profitable).
For those that still find a betting angle in such a tweet, I urge you to reconsider. These biases are well-documented, so much so that there is even a word for this mindset: Apophenia.
Swedish Power Lines
In the early 1990s, a study was conducted in Sweden to determine if people living in close proximity to power lines were more likely to become ill or sick. The problem with the study, which even drew comment from then US President Bill Clinton, was that it was a perfect example of the Texas Sharpshooter fallacy. By looking at over 800 ailments, the study was bound to find at least one that had higher rates than normal (and even of statistical significance), just like our Texan was bound to have a useful cluster for where he could paint a bullseye.
If you search and analyze a very rich database for 52 tennis tournaments played over the course of a calendar year and uncover that 2 of those 52 have shown a favorable outcome for underdogs over the last 3 years (we picked this arbitrary end point because 2 and 4 years showed barely breakeven results), and you intend on betting this ‘edge’, you should consider the possibility that you are mistaken as to the causality of those outcomes (i.e. that those 2 tournaments will continue to produce an edge betting on the underdogs). If you are willing to live a little more dangerously (i.e. selective filtering to suit your newly found trend of profitable underdogs), you can of course add another filter and see that in the 2nd and 3rd round of those tournaments, the win % is even higher for underdogs.
Before you bet, you should consider whether this data dredging is going to be helpful to your bankroll (and the answer may sit in the English alphabet between M and P). If for some reason there was a yet to be uncovered factor(s) or angle(s) that was entirely valid, there is also a distinct chance that the market would ‘find’ these and correct over the course of those 3 years and moving forward (as markets tend toward efficiency).
Looking backwards and finding out what patterns exist and then fitting that with your narrative is something to be wary of (at a minimum), just as we found in the first tweet when looking at the performance of:
Super Bowl Champions
Against the spread
Away from home in week 2
After covering in week 1
(For the record, even with the trend suggesting an inflated line and a letdown spot, the Super Bowl Champion New England Patriots, playing away from home in week 2, after covering against the Pittsburgh Steelers in Foxborough in week 1, covered the spread winning 43-0…. but there is no need for a victory lap on this one or that would just be letting outcome bias fool us).
We cannot opt-in or opt-out when it comes to many biases that impact our betting and there is not necessarily a way to ‘defeat’ these elements that impact decision making. We can, however, be acutely aware of their existence and manage their influence (especially negative influence) on how we evaluate information and ultimately make decisions.