
Welcome to part two of the project series “Apartment Pricing: Advanced Regression Techniques.”
I highly recommend checking part one of the series here, if you have not already. All the working shown below can be found at my GitHub repository here. You can also view the interactive notebook here.
Introduction
In part one of this project series, we discussed our business understanding related to the project and then moved forward to the data extraction phase. After extracting approximately 2000 properties, we started preparing the data for our data visualization and exploratory analysis phase, and ultimately for our machine learning model development.
Now that our data is tinkered as per our need let’s begin.
Property Price
The project aims to predict an apartment’s price, so we have to make sure that our dataset’s price feature is sane. For this purpose, let’s look at price distribution:

The above chart is nowhere standard. We have a very long thin tail on the right hand indicating +ve skewness but with not much volume. The following depicts this behavior:

We can see that the tail is originating a price point of 5 Million. I did a split on the dataset with properties less than 5 Million and greater then. Following were the results:

The above displays 1816 properties at a price point of less than 5 million, while only 89 properties range between 5 to 35 Million. The data skewness is caused by these extremely high valued properties, which are very few in numbers in our dataset, as represented by a thin tail.
We can consider these 89 properties as outliers as they are very few in numbers but highly influence the median price. These properties also do not represent the price trend in our dataset. Therefore I decided to remove these 89 properties.

After removing these 89 properties, we can see that our distribution is in a much better state now, and the same observe by updated Skewness and Kurtosis values. Ideally, the Skewness and Kurtosis values should be as close to 0 as possible, representing equal distribution.
Pricey Neighborhoods
Let’s take a look at the map to observe property clusters and their pricing density across Dubai.

Let’s see the same information in terms of bar chart

We can easily make out from the above information that Palm Jumeirah, Dubai Marina, and City Walk are at the top 3 positions in pricier properties.
But there is another figure which generally represents the valuation of real estate in a neighborhood, i.e., the price per sqft.

If you look at the above bar chart, you can see that Downtown Dubai, Palm Jumeirah, and Jumeirah neighborhoods have higher price per SQFT, making them top 3 in more valuable property to own.
But other than a pricier neighborhood, there are other factors which impact the price of a property. Let’s move for forward to those.
Covered Area
If you look at below scatter plot of Price vs. Area, we can establish that the higher the covered area of an apartment, the pricier it becomes.

We have recently seen a trend in Dubai’s real estate market, where one-bedroom apartments can be bought under a million. That’s because the property developers are tinkering with the covered area of an apartment to make them affordable for customers. You can see above a concentration of properties under 1000 sqft available under a million AED.
Bed & Bath
Let’s see how number of bedrooms and bathrooms correlate with price, covered area, and with each other.


Does number of bedrooms impact price?

Overall Correlation
Let’s see the overall correlation of all features we have for a property and see what’s related and what’s not.

We can see sparse light shade clusters. The lighter the box is more correlated it is to the feature. For example, we can see a light color cluster on the top left corner, which contains price, covered area, the price per sqft, number of bedrooms, and bathrooms. These attributes are usually correlated to some extent and determine the price or worth of the property.
Continue
In the final part, we will cover model development, training, and how to predict the prices.
You can follow below links for further reading or to catchup with initial work in this project series:
Part 3: Apartment Pricing: Model Development, Training, and Predictions
Part 1: Apartment Pricing: Advance Regression Techniques
Leave a Reply