What is the question - this is the answer: formalizing the problem, we already predetermine the possible answer

An interesting and instructive article "A random tram in the middle of an unfamiliar city" offers the following experiment:

Imagine that someone took a strip of photographic film N cm long and decided to observe how particles coming from space will leave their traces on it. The experimental probability density scale falling film on the particles will be described in a uniform distribution on the interval 0 to N . In this experiment, the experimenter tells you the distance k between the left edge of the film and the point where the first registered particle hit. As before, you are required to give a reasonable estimate for the unknown you of N .


To solve this problem, the following assumption was made:

Imagine now that in one experiment the distance from the point of impact of the particle to the left edge of the photographic film was equal to P1 , and in another experiment - P2 , with P1 <P2 . Wouldn't it be reasonable then to give a smaller estimate for the length of the photographic film in the first experiment than in the second?


I wondered in numbers - is it always and how reasonable is it?



These notes are not a continuation and discussion of the article from which the quotation is taken, this is an attempt to see how the formulation of the problem itself, the restrictions introduced, the assumptions and conditions adopted at the formalization stage will be reflected in the answer received. I will not give formulas and will try not to use special terms, it seems to me that the very problem of the dependence of the result on the accepted or not accepted assumptions will be more clearly visible.



To begin with, I will modify, simplify and ground the experiment.



Fate or our assistant has a bag in which there are barrels numbered in order, like in a loto. The assistant (it's easier for me to imagine him than fate) secretly from us pulls out a keg at random and pours into the first chest numbered balls according to the number on the keg. Then he repeats the procedure of randomly removing the keg and pours the appropriate number of numbered balls into the second chest. There are two chests in front of us with an unknown number of balls in each of them. We draw at random one ball from the first and one ball from the second chest, and make a reasonable assumption that the high-numbered ball corresponds to the chest with a large number of balls.

Let's estimate how reasonable the assumption is?



Let's formalize and refine the problem



1. Since the kegs are in the bag, they must be limited to a certain number. Keeping in mind the original source about the number of tram lines, so far the number of barrels is limited to 30.



2. But what should we do if we took out balls with the same numbers from the chests? We have options:



2.1 to recognize the outcome as unsuccessful, not to make decisions and ask the assistant to make a new filling of the chests.



2.2 toss a coin and decide at random which chest has more balls. There will be no unfortunate outcomes in this option.



2.3 decide that since the numbers are the same, then the number of balls in the chests is also the same. There will be no unsuccessful outcomes in this option either.



Here I want to note that I do not choose which option is better. My goal is to see how different options will affect the answer.



3. Since we have a different number of outcomes, the question arises: "And from what number of outcomes to count the share of correct answers?" From all experiences or only from successful outcomes? Let's count both options.



4. The assistant took out the first barrel, looked at the number, poured the corresponding number of balls into the first chest. Stop! And then what did he do with the removed keg? He has two options: put the keg back in the bag or not put it back in the bag. Or what is the same - the assistant could get two barrels at once and pour balls into the chests according to the numbers taken out on the barrels, the assistants are lazy, but we do not see what he is doing there. In this case, we will never have an equal number of balls in the chests, and therefore unsuccessful outcomes. This point clearly deviates from the task from the quote, where the keg goes back into the bag, but I have other goals, and not returning the keg is a typical situation in life, we will calculate this option.



So, we have three options for how to count the outcomes of the experiment in which the numbers of the balls are the same, two options for calculating the proportion of correct answers and two options for filling the chests with balls. A total of 12 variants of the experiment results!



How will the probability of the correct answer depend on the number of barrels in the bag of fate, that is, on the maximum possible number of balls in the chest? Maybe all the options will be the same? Maybe the options will have the same trend? It was at this moment that I tried to test my intuition by filling out the following plate:







It turned out, looking ahead, what to train and train my intuition. I cleaned up the plate from many of my considerations.



In order not to tire with formulas that, although beautiful, are recurrent, and I cannot reduce recurrent formulas to closed ones, I will describe the general calculation algorithm:



1. For each number of barrels in a bag, we can make a list of all options for filling chests with balls.



Example: If the number of barrels is 4, then we get 16 options for filling two chests by the number of balls: 1 and 1, 1 and 2, 1 and 3, 2 and 1, 2 and 2 ... 4 and 4.



2. For each variant of filling the chests, we count the number of correct answers for three variants of counting equal balls.



Example: To fill chests 2 and 3, (in the first chest there are 2 balls, in the second 3) you get the following table.







3. For the selected number of barrels, add up all the correct answers for each option for filling the chests.



4. We calculate the proportion of correct ones for the two counting options (in relation to the total number of experiments and to the number of successful ones).



5. We also count points from 3 to 4 for the option when the keg does not return to the bag, that is, when we cannot have an equal number of balls in the chests.



I counted for the number of kegs from 1 to 8 and 30 to show the trend. Here are the graphs.



First for the option when the keg is returned to the bag







With an increase in the number of barrels in the bag, and consequently an increase in the possible number of balls in chests, the probability of a correct assessment increases and the difference between the options decreases. Curiously, the probability is not always higher than 0.5. The yellow chart is also curious, there is a decline and only then a rise. In general, the range from 1 to 7 was not obvious to me.



It turns out that if there are less than 8 balls, then for the variant of counting β€œEquals are considered a failure. The percentage of correct ones is counted from all experiments "a random answer will give a better result than following the rule" More ball number means the chest contains more balls. "



Graphs for the option when the keg does not return to the bag and therefore there cannot be the same number of balls in the chests







Graphics are three, since the two are the same, they are marked in red.



For four options, the probability of a correct answer falls and tends, apparently, to 0.5! (?) In other words, in these options for a large number of balls in chests, you can not carry out the experiment at all, but simply toss a coin - the result is the same. Actually, for the sake of this, I decided to calculate various options, I was expecting some surprises. I have no rigorous proof that the probability tends exactly to 0.5. This is again my intuition, and it often fails.



I want to stress again that these notes are not about choosing the right strategy or evaluating which option is better. The interest was to see the influence of different options for setting conditions on the result.



PS As I wanted, I managed not to use formulas and use a special term - a recurrent formula only once.



PPS If you are too lazy to watch Wikipedia, then the recurrent formula is when you need to come to house number 30, but you must first visit all previous houses with numbers from 1 to 29.



All Articles