
House Prices: Advanced Regression Techniques
Kaggle Competition: Machine Learning, Stacking, Ensemble
DATASET INFO
Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home, based on the values of the explanatory variables.
Practice Skills
- Creative feature engineering
- Advanced regression techniques like random forest and gradient boosting
The Dataset is available here.
THEMES
Python, Machine Learning, Advanced regression techniques, Lasso, Decision Tree, Random Forest, Gradient Boosting, XGBoost, LightGBM, Keras, Tensorflow, Neural Networks, Ensemble, Vertical Stacking
PROJECT DESCRIPTION
In this project, we will participate in a Kaggle competition, based on Advanced Regression Techniques. Using a training dataset of 1460 houses, each with 79 explanatory features, we will try to predict the house prices for 1459 houses in the test dataset. We achieved the greatest results with vertical stacking of XGBoost, LightGBM, Elastic Net and Lasso.
As part of the project, we did:
- A comprehensive exploration of the data
- Univariate and pairwise comparisons
- Data cleaning (treating missing values, dealing with leverage points and wrong data types)
- Transformations (Quadratic - Cubic - Square Root Transformations, Box-Cox Transformation to treat skewed features, One-hot encoding to create dummy variables for different categorical levels)
- Model formulation (Linear Regression, Lasso, Elastic net, Decision Tree, Random Forest, Adaboost, XGBoost, LightGBM, Neural Networks with Keras)
- Stacking (Ensemble, Vertical Stacking)
CODE
A comprehensive Jupyter Notebook is available here
Sotiris Baratsas © 2022. All rights reserved.