
Recreating Spurious Correlations
Using Python, SciPy & Matplotlib
PROJECT DESCRIPTION
As you surely know, correlation is not causation. To force the point home, Tyler Vigen has created a delightful web site and accompanying book showcasing spurious correlations. In this study, we try to replicate two of these findings.
Per Capita Cheese Consumption (US) - correlates with - number of deaths by accidental suffocation and strangulation in bed
In the first case, we show that, in the U.S., the per capita cheese consumption is correlated with the number of deaths by accidental suffocation and strangulation in bed. After obtaining, loading and cleaning the 2 datasets, we check for Normality and use Pearson's r correlation coefficient to calculate the correlations between the 2 measures. We use Matplotlib to visualise the results.
Number of doctorates awarded in Biological and Biomedical Sciences - correlates with - the per capita consumption of mozzarella cheese
In the second case, we show that, in the U.S., the number of doctorates awarded in Biological and Biomedical Sciences is correlated with the per capita consumption of mozzarella cheese. After obtaining, loading and cleaning the 2 datasets, we check for Normality and use Spearman's r correlation coefficient to calculate the correlations between the 2 measures. We use Matplotlib to visualise the results.
THEMES
Python, SciPy, Matplotlib
DATASET INFO
- For the per capita cheese consumption we used the data provided by the U.S. Department of Agriculture here.
- For the deaths by accidental suffocation and strangulation we used the data from the U.S. Centers for Disease Control and Prevention here. Note that we had to search a bit to find what to put in the form to find the data on deaths by accidental suffocation and strangulation in bed; but this was half the fun.
- For the number of doctorates awarded, we used the data provided by the National Science Foundation from the Survey of Earned Doctorates.
- For the mozarella consumption we used the data provided by the U.S. Department of Agriculture here.
The datasets can be also found here.
CODE
Code in Python (with Markdown) is available here
Sotiris Baratsas © 2022. All rights reserved.