• Recreating Spurious Correlations

    Using Python, SciPy & Matplotlib

  • PROJECT DESCRIPTION

    As you surely know, correlation is not causation. To force the point home, Tyler Vigen has created a delightful web site and accompanying book showcasing spurious correlations. In this study, we try to replicate two of these findings.

    Per Capita Cheese Consumption (US) - correlates with - number of deaths by accidental suffocation and strangulation in bed

    In the first case, we show that, in the U.S., the per capita cheese consumption is correlated with the number of deaths by accidental suffocation and strangulation in bed. After obtaining, loading and cleaning the 2 datasets, we check for Normality and use Pearson's r correlation coefficient to calculate the correlations between the 2 measures. We use Matplotlib to visualise the results.

  • broken image
  • Number of doctorates awarded in Biological and Biomedical Sciences - correlates with - the per capita consumption of mozzarella cheese

    In the second case, we show that, in the U.S., the number of doctorates awarded in Biological and Biomedical Sciences is correlated with the per capita consumption of mozzarella cheese. After obtaining, loading and cleaning the 2 datasets, we check for Normality and use Spearman's r correlation coefficient to calculate the correlations between the 2 measures. We use Matplotlib to visualise the results.

  • broken image
  • THEMES

    Python, SciPy, Matplotlib

    DATASET INFO

    • For the per capita cheese consumption we used the data provided by the U.S. Department of Agriculture here.
    • For the deaths by accidental suffocation and strangulation we used the data from the U.S. Centers for Disease Control and Prevention here. Note that we had to search a bit to find what to put in the form to find the data on deaths by accidental suffocation and strangulation in bed; but this was half the fun.
    • For the number of doctorates awarded, we used the data provided by the National Science Foundation from the Survey of Earned Doctorates.
    • For the mozarella consumption we used the data provided by the U.S. Department of Agriculture here.

     

    The datasets can be also found here.

    CODE

    Code in Python (with Markdown) is available here