The saddest equation in Data Science

image



Stock up on handkerchiefs! Now I will tell you the whole truth about statistics and data science. You will have tears in your eyes, I promise you.



CONCLUSION = DATA + ASSUMPTIONS. In other words, statistics do not give you the truth.



Common myths



The following misconceptions are often heard:



  • “If I find the right equations, I can find out what no one knows right now.”
  • “If I add maths to my data, I can reduce the uncertainty.”
  • "Statistics can turn data into truth!"


It all sounds like fairy tales, doesn't it? Because that's what they are.



Bitter truth



There is no magic in the world that can help you create something out of nothing. Forget about it. Statistics about something else. Take my word for it as a statistic. (As a bonus, this article will save you a ton of time chasing this pipe dream .)



Unfortunately, many charlatans will try to convince you otherwise. They will use the standard technique, “You don’t know the equations with which I threw you, so acknowledge my advantage and do as I say!”



Do not fall for the words of these posers.



image About the Author: Cassie Kozyrkov is a South African data and statistics specialist. She founded Decision Intelligence at Google, where she is Principal Researcher.




Do not repeat the fate of Icarus



Think of statistical inference (in short, "statistics" ) as a leap from what we know (our usual data) to what we don't know (our population dimension).



In statistics, what you know is not what you would like to know.
Maybe you want the facts about tomorrow, but you can draw conclusions only on the basis of yesterday. (So ​​annoying when we don’t remember the future, right?) Maybe you want to know what all your potential users think about your product, but you can only interview a hundred. Then you get uncertainty!



This is not magic, this is speculation



How can you even jump from what you know to what you don't? You need a bridge to bridge this chasm. And the name of this bridge is assumptions. Let me remind you of the most painful equation in data science: DATA + ASSUMPTIONS = FORECAST.



DATA + ASSUMPTIONS = FORECAST.
(You can easily replace the word “prediction” with “conclusions” or “predictions” if you prefer. It's all about the same thing: a statement about something you don't know for sure.)



What is assumption?



If we knew all the facts (and we were sure that these are undeniable facts), we would not need assumptions (or statistics). Assumptions are the ugly pieces you use to build a bridge between what you know and what you would like to know. These are cheats that you have to use when you need the numbers to converge, but there is not enough data.



Assumptions are the ugly patches that you apply in places where there is no information.
How can I put it bluntly? Assumption is not a fact, it is nonsense that you come up with because you do not have enough information. If you often belittle people with your super precise intervals, remember that it is too rash to say that what is based on assumptions is true. Think of statistics better as a decision-making tool. This tool is not perfect, but still better than nothing (in certain situations).



Statistics is your attempt to do whatever you can in a world of uncertainty.
Assumptions - Assumptions in Africa. They will not turn into facts by the wave of a magic wand.



Making assumptions is part of making decisions



Show me any decision made without speculation. I can easily list for you many implicit assumptions that you make in real life, without even thinking.



Examples: When you read a newspaper, do you assume that all facts are verified? When you made plans for 2020, did you assume that there would be no global pandemic? If you analyzed the data, did you assume that the data was recorded without errors? Have you assumed that your random number generator is producing random results? (Usually they are not random.) When you decide to make a purchase on the Internet, do you assume that the correct amount will be charged to you? What about your last snack? Did you assume he was not poisoned? When you took the medication, did you * know * about its long-term effects, or ... did you anticipate?



Whether you like it or not, assumptions are part of decision making.


Whether you like it or not, assumptions are always part of decision making. Interference in the data of the real world must consist of a plurality of recorded assumptions. At the same time, data scientists must describe all the corners that they will have to go around.



Even if you decide to dispense with statistics, you are probably using assumptions to decide how to proceed. For your own safety, you must be aware of what assumptions your decisions are based on.



How the "magic" of statistics works



There are many tools in statistics that allow you to formulate assumptions and combine them with evidence. So intelligent decisions are born. (See my 8-minute introduction to statistics here.)



It is ludicrous to expect an analysis that includes uncertainty and probability to be a source of truth with a capital “P”.
Yes, that's how statistical magic works. You choose which assumptions to live with, then combine them with the data. On the basis of this wicked union, you make intelligent decisions. That's all the statistics.



image



That is why an analysis that includes uncertainty and probability can never be a source of truth with a capital "P". There is no secret dark magic doing this for you.



Two people can come to completely different conclusions based on the same data! It is enough for them to make different assumptions.
For the same reason, two people can come to completely different conclusions based on the same data! It is enough for them to make different assumptions. Statistics give you a tool that allows you to make more informed decisions, but there is no single rule for using it. It is a personal decision making tool.



How well you do your research depends on how good the assumptions you make.



What about science?



What happens when a scientist uses statistics to draw conclusions? He simply forms an opinion and decides to share it with the whole world. This is not bad, scientists have to draw conclusions from time to time by will, not by will, this is their job. I suppose that sometimes these conclusions can be heeded.



By will not by will, scientists periodically have to draw conclusions based on statistics, this is their job.
I enjoy listening to advice from people who have more information and experience than me, but I never allow myself to confuse opinions with facts. There are scientists who are well versed in probability and work with it. However, I have also met with scientists who have made so many statistical errors that you will never get them out of your life. Opinions cannot (and should not) influence people who are not ready to formulate assumptions for themselves. These opinions were obtained through a combination of evidence and unverified assumptions. They cannot be considered competent.



Outcome



Think of statistics as a science that can help you make decisions when you are unsure of something. This is a framework that helps you make informed decisions with a lack of information. There is no single right way to use statistics.



No, it doesn't give you the facts you want. She gives you what you need to deal with a lack of facts. The point of statistics is to help you do everything in your power in a world of uncertainty.



You only need to make assumptions.



Translation: Diana Sheremyeva



image



Learn more about how to get a sought-after profession from scratch or Level Up in skills and salary by completing SkillFactory paid online courses:











All Articles