•  

    Building a Data Warehouse for property sales in Brooklyn

    using Microsoft SQL Server - SSAS - Tableau

  • DATASET INFO

    Publisher: ​New York City, Department of Finance publishes annual rolling property sales data. Our dataset comes from the ​Annualised Sales dataset, which displays properties sold in New York City by year, beginning in 2003.

    Dataset


    Dataset Description: ​The dataset has 390.883 rows and 111 columns. The dataset provides information regarding properties sold in Brooklyn for 15 consecutive years, from 2003 to 2017.


    Additional Explanation: ​Many variables of the dataset are Tax-Related or focused on bureaucratic processes and coverage issues (e.g. schools, police precincts, fire brigades). Also, some of the variables need further exploration, which we did through the PLUTO dictionary (found ​here)​, which provides explanation for most the variables found in the dataset.

    THEMES

    Microsoft SQL Server, R, Multidimensional DBs, Tableau, Data Visualisation

    PROJECT DESCRIPTION

    The project is focused on the creation of a Data Warehouse application, for the analysis of property sales in Brooklyn, one of the five boroughs of New York CIty. The project is split into 5 main phases:


    Phase 1: ​Finding the dataset, understanding its structure and what are the meaningful business questions, this dataset could answer.
    Phase 2: ​Extract-Transform-Load processes for the data warehouse, using R Studio.
    Phase 3: ​Building of the Data Warehouse using Microsoft SQL Server.
    Phase 4: ​Building the Multidimensional Cube using Microsoft Analysis Services and Visual Studio.
    Phase 5: ​OLAP Report and Data Visualisation (using Tableau).

     

    BUSINESS PERSPECTIVE

     

    The two partners, authors of this report, Baratsas Sotiris and Spanos Nikos, are two aspiring Data Scientists recently hired by a well-known real estate agency, Brooklyn Estate Inc., based in New York City. The company is operating for more than 20 years. The company has always relied on their deep area expertise, Brooklyn-based connections and traditional data analysis solutions, like Excel reports by their real estate agents, to compete in the dynamic NYC property market.


    However, as of recently, the company wishes to deploy more sophisticated tools for the monitoring, interpretation and analysis of the property market, in order to get an edge over their competitors and ensure their company’s survival in the future. They also want to empower their real estate agents, whether they are experienced or junior-level, to make data-driven decisions, based on real market insights, and be established as experts in their field.
    The above mentioned Data Scientists, are entrusted with the task of creating a Data Warehouse application that fulfils these objectives.


    The main business questions, that the company wishes to be able to answer through the new system are:

    • On which areas to focus their marketing efforts and manpower?
    • What kind of properties would be the best use of our time & budget?
    • How can we take advantage of market fluctuations quickly?
    • How can we make informed decisions about the correct pricing of different properties, based on characteristics, such as location, size, age and lot type?

    CODE

    Code is available upon request

  • SHORT PRESENTATION