Key principles for creating useful and informative graphs
Data visualization is an essential step in the process of understanding data science. This is where you present and report your results in a graphical format that is intuitive and easy to understand.
Data visualization takes a lot of work, a lot of cleaning and analysis goes into distilling and turning dirty data into beautiful graphs and charts. But even with the data prepared, you still have to adhere to certain principles or methodologies to create useful, informative graphics.
However, in writing this article, I took inspiration from Edward Taft's book Beautiful Evidence, which contains six principles on how to make data graphs useful. It is these principles that separate useful charts from unhelpful ones.
This article is also heavily inspired by Roger D. Peng's Exploratory Data Analysis in R. It is available for free on Bookdown and you can read it to learn more about EDA.
Let's take a closer look at these principles.
An example of data visualization on Our World in Data
1. Show comparison (control and experimental groups)
Demonstration of comparison is the foundation of good scientific research. Proofs of a hypothesis are always related to something else. Let's take an example: you say, "Dark chocolate improves concentration and learning ability." The important question in this statement is "compared to what?" Without comparison (relative hypothesis), the statement is useless.
One way to show comparison is with the control and treatment groups. People in one group will eat chocolate, people in the second group won't. This way, you can compare the effects of chocolate on concentration and learning ability based on test results or by measuring brain activity.
When creating graphs for the presentation of your research, you can graph the control and treatment groups using the mustache box. This way, readers get a clear idea of the effect of the experiment.
When creating graphs to represent your research, you can graph the control and treatment groups using a rectangular chart. In this way, readers get a clear idea of the consequences of the treatment.
2. Causality and explanation
What follows is an explanation showing causality in thinking about the question you are trying to answer. If you have shown that an effect is obtained in the experimental group but not in the control group, you must formulate a hypothesis from the evidence as to why this is so.
Returning to the previous example, let's say that the subjects in the experimental group got higher scores on the test, which shows that dark chocolate improves concentration. An important question: why is this exactly the case?
This question is important because it helps raise other questions that can either refute or support your hypothesis throughout the study.
To show a causal relationship or mechanism, you can measure the brain activity of the control and treatment groups and graph the results by showing them side by side. Using a graph of test scores and a graph of brain activity, you will see the reason why test subjects who took chocolate received higher scores, that is, an answer to the question of how dark chocolate improves cognitive function.
3. Data with many variables (more than two variables)
The real world is complex, and the relationship between the two events is usually non-linear. So in research, you have attributes or variables that you can measure. All of these variables interact with each other in different ways. Some of them can be confusing , while others can be important attributes explaining the relationship of events.
As you already know, correlation does not imply causation. Therefore, it is not a good idea to limit your research to only two variables: this leads to erroneous conclusions. Thus, you should show as much data as possible in your charts. This can help you uncover any confusion in your data.
Take the Simpson paradox, a paradox in probabilistic statistics, when "when groups are combined, the tendency that occurs in different groups of data disappears." To illustrate:
- Two variables - negative relationship.
- Three variables are positive relationship (x, y, z) (there are confusing variables).
4. Don't let tools drive analysis
A good storyteller knows how to grab people's attention while telling a story in a productive way. The storyteller is not limited to the story itself, but can express the story in a unique way, combining different perceptions and including multiple imagery, making the story alive.
Likewise, a good data visualizer is not limited to the visualization tools at hand. The person visualizing the data has the ability to switch from one form of expression (such as lines or circles) to using multiple presentation modes.
For example, instead of creating reports containing only text, use infographics: images, charts, words, numbers, etc., all this will enrich the information. With an abundance of information and graphs, readers can observe many different correlations of evidence in one place. So remember that you are telling a story. Don't let tools limit your thinking. Let analysis drive the tools, creating stunning, evidence-rich graphics.
5. Document your charts with appropriate labels, scales and data sources
When you first look at a chart, you see the title first and then the chart context labels. Without them, the graph doesn't tell anything. Good reports / graphs are properly documented with appropriate scales and labels assigned to each graph. The data sources used to create the graphs are also critical. Thus, it is good practice to save the code that was used to generate the data and graphs: this allows the data to be reproduced. It also adds credibility to your charts. Moreover, by saving the code, you can edit the graph if necessary.
6. Content comes first
Ultimately, regardless of all of the above principles, without content that is high quality, relevant, and holistic, your graphics will be useless or misleading. In other words, "trash inside, trash outside." Before reporting any result, make sure the result is something interesting and important. No matter how beautiful or visual your graphics are, no one wants useless results. Something interesting is personal experience or something inspired by the Internet. In any case, always ask questions: this is how the idea becomes reality.
Conclusion
Data visualization is an incredible skill. You can take data and turn it into beautiful graphics and plots that tell people a story. In an era when data is growing exponentially, it is increasingly important to be able to tell a story with data. This is the best moment to learn new things. And a summary of the principles:
- Show comparison.
- Show the reasons.
- Show multidimensional data.
- Combine as much evidence as possible.
- Describe and document the schedule.
- Make sure your story is interesting.
What I want you to take away from this article is this: always remember to start with a good question, use the right approach, and only present the information necessary to answer your good question.
I leave this quote from the American mathematician John Tukey , who ushered in a new era of statistics:
A simple graph brought more information into the mind of a data analyst than any device.
For a deeper understanding of these principles, I recommend referring to the book "Exploratory Data Analysis in R" by Roger D. Peng (I will leave a link to it below).
Resources and links
If you want to learn more about data visualization check out these great free books:
- Claus O. Wilke. Fundamentals of Data Visualization
- Hadley Wickham and others. ggplot2: elegant graphics for data analysis
- Winston Chang. R Graphics Cookbook .
Platforms that showcase beautiful visualizations
Guides for creating charts are available on these resources.
Links for this article
Thanks for reading!
Other professions and courses