The law of large numbers and what it is not

A lot has been written about the law of large numbers (zbch) (for example, in English, here and here , also [1]). In this text I will try to talk about what the law of large numbers is not - about the erroneous perception of this law and the potential pitfalls hidden in mathematical formulations.



Let's start with what the law of large numbers is. Informally, this is a mathematical theorem that "the probability of deviations of the sample mean from the mathematical expectation is small" and that "this probability tends to zero as the sample grows." Quite informalthe theorem states that with we can be reasonably sure that the mean of our sample is close enough to the "real" mean and thus describes it well. Of course, it is assumed that there is a traditional statistical "baggage" - our observations from the sample should describe the same phenomenon, they should be independent, and the thought that there is some "real" distribution with a "real" mean should not cause us significant doubts.



When we formulate the law, we say “sample mean,” and anything that can be mathematically written as such an average falls under the law. For example, the share of events in the total mass can be recorded as an average - we just need to record the presence of an event as "1" and the absence as "0". As a result, the average will be equal to the frequency and the frequency should be close to the theoretical average. That is why we expect the percentage of heads to be close to ½ when flipping a perfect coin.



Consider now the pitfalls and misconceptions about this law.



First, the ZBCH is not always correct. This is just a mathematical theorem with "inputs" - assumptions. If the assumptions are wrong, then the law is not required to be implemented. For example, this is so if the observations are dependent, or if there is no certainty that the “real” mean exists and of course, or if the phenomenon under study changes over time and we cannot say that we observe the same value. In truth, to a certain extent, the ZBC is also true in these cases, for example, for weakly correlated observations or even when the observed value changes over time. However, to correctly apply this to immediate reality, a well-trained specialist mathematician is needed.



Second, it appears to be true that the ZBR claims "the sample mean is close to the true mean." However, such a statement remains incomplete: it is necessary to add “with a high degree of probability; and this probability is always less than 100%. "



Thirdly, I would like to formulate the ZBP as “the sample mean converges to the real mean with unlimited sample growth”. However, this is not true because the sample mean does not converge at all, since it is random and remains so for any sample size. For example, even if you flip a symmetrical coin a million times, all the same there is a chance that the proportion of heads will be far from ½ or even zero. In a sense, there is always a chance of getting something out of the ordinary. We must admit, however, that our intuition still tells us that the ZBP should describe some kind of similarity, and this is actually the case. Only it is not the mean that "converges", but the "probability of deviation of the sample mean from its true value," and converges to zero. Since this idea is intuitively very convenient ("the chances of seeing something unusual tend to zero"),mathematicians have invented for this a special type of convergence - “convergence in probability”.



Fourth, the ZBC does not say anything about when the sample mean can be considered close enough to the theoretical one. The law of large numbers only postulates the existence of a certain phenomenon; it says nothing about when it can be used. It turns out that the law of large numbers does not answer the key question from the point of view of practice - "can I use ZBP for my sample of size n?" Other theorems provide answers to these questions, for example, the Central Limit Theorem. It gives an idea of ​​the extent to which the sample mean may deviate from its true value.



In conclusion, the central role of ZBP in statistics and probability theory should be noted. The history of this law began when scientists noticed that the frequencies of some repetitive phenomena stabilize and stop changing significantly, subject to repeated repetition of experience or observation. Strikingly, this "frequency stabilization" was observed for completely unrelated phenomena - from dice rolls to agricultural yields, indicating the possible existence of a "law of nature." Interestingly, this law of nature turned out to be a part of mathematics, and not physics, chemistry or biology, as is usually the case with the laws of nature.



[1] Illustrating the Law of Large Numbers (and Confidence Intervals) Jeffrey D Blume & Richard M Royall



All Articles