• Exploring White House visitors anomalies

    using PySpark

  • PROJECT DESCRIPTION

    In a paper that appeared in 2017, Jeffrey R. Brown and Jiekun Huang examined whether corporate
    executives’ meetings with key policymakers are associated with positive abnormal stock returns and whether such meetings may be associated with firms treated preferentially by the government. The results of the study suggest that political access does benefit corporations.

     

    The authors were able to do that because the White House released details of visitors records. Unfortunately, the release of visitor information was stopped under the Trump administration. The details of the Obama administrations are still available online, however, and you will use them in this project.

     

    Specifically, you can go through the data and answer the following questions using Spark. You can access the data here. You will use all the data for all the years to answer the following questions:

    1. Who are the top 20 visitors?

    2. Who are the top 20 visitees?

    3. Who are the top 20 visitor-visitee pairs?

    4. What were the top 20 most busy days?

    5. What where the top 20 most busy months-years?

    6. What was the order of popularity of days of week for visits?

    7. What was the order of popularity of months for visits?

    THEMES

    Apache Spark, Python, PySpark

    CODE

    Code in Python is available upon request.