
Using linear models to predict US house prices
Multiple Linear Regression in R
DATASET INFO
The data for this project are a random sample of 63 cases from the files of a big real estate agency in USA concerning house sales from February 15 to April 30, 1993. The data was collected from many cities (and corresponding local real estate agencies) and is used as a basis for the whole company.
THEMES
R, Multiple Linear Regression, LASSO
PROJECT DESCRIPTION
In this study, we attempted to formulate a Multiple Linear Regression model, to predict US house prices.
Steps involved:
- Perform descriptive analysis and visualisation for each variable to get an initial insight of what the data looks like.
- Conduct pairwise comparisons between the variables in the dataset to investigate if there are any associations implied by the dataset.
- Construct a model for the expected selling prices according to the remaining features. Check whether this linear model fits well to the data.
- Find the best model for predicting the selling prices and select the appropriate features using stepwise methods (used Forward, Backward and Stepwise procedures according to AIC or BIC to choose which variables appear to be more significant for predicting selling prices).
- Get the summary of our final model, interpret the coefficients. Comment on the significance of each coefficient and write down the mathematical formulation of the model. Consider whether the intercept should be excluded from our model.
- Check the assumptions of your final model. Are the assumptions satisfied? If not, what is the impact of the violation of the assumption not satisfied in terms of inference? What could someone do about it?
- Conduct LASSO as a variable selection technique and compare the variables that we end up having using LASSO to the variables that you ended up having using stepwise methods.
CODE
Code in R is available here
SHORT REPORT
Sotiris Baratsas © 2022. All rights reserved.