In 1943, the U.S. military cataloged bullet holes on all the bombers that made it back from missions. Wings and fuselage were riddled with damage; engines and cockpits had almost none. The intuitive conclusion: reinforce the areas with the most bullet holes. Statistician Abraham Wald gave the opposite advice: “Reinforce the areas with no bullet holes.”
The logic is straightforward: the data only came from planes that survived. Planes hit in the engine never made it back, so they were never in the sample. The military thought it was analyzing “where all planes get hit,” but it was actually analyzing “where planes get hit and still survive” — a systematically filtered sample.
This is the essence of sampling bias: the sample you analyze does not represent the population you want to understand. The math is not wrong — the data was already skewed before it ever reached your hands.
1. Survivorship Bias: The Dead Don’t Talk
Wald’s insight was essentially a reversal of the question: instead of asking “what does this data tell us,” he asked “what kind of data would never appear here.”
“Bill Gates, Zuckerberg, and Jobs all dropped out of college — dropping out is the shortcut to success.” This is the most commonly cited version of survivorship bias. We see the few names that survived but never see the thousands of entrepreneurs who died in their garages. Failed cases cannot speak for themselves.
In product development, this bias plays out every day: we analyze behavior data from retained users, find that they love a certain “advanced feature,” and decide to double down on it. But we ignore the users who deleted the app in their first week — maybe it was precisely the complexity of that feature that drove them away. Listening only to survivors makes the product increasingly niche.
2. Sample Selection Bias: Your List Determines Your Conclusion
In 1936, before the U.S. presidential election, The Literary Digest mailed out 10 million questionnaires and collected 2.4 million valid responses. The data showed Republican candidate Landon would crush Roosevelt. Meanwhile, a young man named George Gallup, using only 50,000 responses, predicted Roosevelt would win by a landslide.
The result: Roosevelt swept the nation with 60.8% of the vote. The Literary Digest folded shortly after. Gallup became famous overnight.
Why did 2.4 million lose to 50,000? The problem was the mailing list. The Literary Digest drew its sample from telephone directories and automobile registration records. During the Great Depression of 1936, people who could afford a phone and a car were mostly wealthy — and the wealthy leaned Republican. It was not that the response rate was too low; those 2.4 million questionnaires never reached the poor in the first place.
In the digital age, this bias is even more insidious. If your data collection SDK crashes on low-end Android phones, those users’ behavior simply does not exist in your database. You might conclude that “users have high average spending power,” but that is only because users who cannot afford good devices were physically excluded.
3. Coverage Bias: Your Sampling Frame Missed Entire Groups
This is the systematic version of sample selection bias: it is not that your random sampling was poorly done, but that your sampling frame never covered certain groups at all.
Online surveys systematically exclude elderly populations who lack internet access or are not comfortable with digital tools. Twitter sentiment analysis can only capture opinions of “people willing to post publicly” — the silent majority, who may represent the actual majority opinion, will never appear in your data.
When analyzing user behavior, if your event tracking only covers in-app actions, users who leave on the very first screen because the interface is terrible leave almost no analyzable trace. Your database is full of people who “used at least a few screens,” while those who could not even survive the first page are invisible to you.
4. Self-Selection Bias: Only the Strongly Opinionated Speak Up
Open the App Store and you will see a clear “U-shaped distribution” — lots of 5-star reviews and lots of 1-star reviews, with very few 3-star ratings in between. Does this mean users have a love-hate relationship with your product? No. It means only people with strong emotions bother to take the time to rate.
This is self-selection bias. The people who participate in a survey were not randomly chosen — they are “people who voluntarily decided to speak up.” This group is fundamentally different from the silent majority: they might be more loyal, more angry, have more free time, or simply have stronger opinions.
When collecting feature feedback in product forums or on social media, you will always hear from power users or the deeply dissatisfied. The 80% who quietly use the product — neither fanatical nor furious — are missing from your data. Building products based on forum feedback sometimes makes them increasingly complex, scaring away ordinary users.
5. Convenience Sampling Bias: The Easiest People to Find Are Not the People You Need to Understand
During the early stages of product development, the most common practice is to “ask colleagues to try it out.” This is called convenience sampling, because these data points are the easiest to get.
The problem is that colleagues are usually tech enthusiasts: they can tolerate complex UIs, understand obscure technical jargon, know what “swipe to go back” means, and are used to certain design patterns. One team got rave reviews during internal testing, only to see real users complain they “couldn’t find the button” after launch — colleagues and real users live in two different usage universes.
Using a convenience sample to validate demand is like looking in a mirror and asking “Am I handsome?” You will always get a feel-good answer.
6. Time Window Bias: Different Times, Different People
Suppose you run an A/B test on Monday from 10 to 11 AM, and the new homepage shows a 20% conversion lift. You happily roll it out to everyone, and weekend sales tank.
This is not about picking the wrong measurement timing (that would be a measurement bias issue) — it is about selecting the wrong time window’s population. People shopping on Monday morning might be corporate buyers or remote workers; weekend shoppers might be students or office workers. Entirely different behavior patterns. Your A/B test sample represented “people who show up on Monday mornings,” not “all potential users.”
When monitoring system performance, if you only analyze logs from off-peak hours, your understanding of the system’s true capacity is based on a completely unrepresentative time window.
These 6 biases share a common antidote: ask “how was my sample formed, and who is missing from it.” The question sounds simple, but when the data volume is large and the confidence intervals look tight, almost no one thinks to raise it.
Wald’s contribution was not better math. It was a better question.