House Sales¶
The dataset contains house sale prices for King County, USA between May 2014 and May 2015. It is well suited to practice regression techniques.
Column descriptions¶
- id: A unique identifier
- year: Year of sale
- month: Month of sale
- day: Day of sale
- zipcode: Zipcode
- latitude: Latitude
- longitude: Longitude
- sqft_lot: Lot area in square feet
- sqft_living: Interior living space in square feet
- sqft_above: Interior living space above ground in square feet
- sqft_basement: Interior living space below ground in square feet
- floors: Number of floors
- bedrooms: Number of bedrooms
- bathrooms: Number of bathrooms. Fractional values indicate that components (toilet/sink/shower/bathtub) are missing.
- waterfront: Whether the building overlooks a waterfront (0 = no, 1 = yes)
- view: Rating of the view (1 to 5, higher is better)
- condition: Rating of the condition of the house (1 to 5, higher is better)
- grade: Rating of building construction and design (1 to 13, higher is better)
- year_built: Year the house was built
- year_renovated: Year the house was last renovated. A value of 0 indicates that it was never renovated.
- sqft_lot_15nn: Lot area of the 15 nearest neighbors in square feet
- sqft_living_15nn: Interior living space of the 15 nearest neighbors in square feet
- price: Price the house sold for in USD
Sample¶
shape: (10, 23)
| id | year | month | day | zipcode | latitude | longitude | sqft_lot | sqft_living | sqft_above | sqft_basement | floors | bedrooms | bathrooms | waterfront | view | condition | grade | year_built | year_renovated | sqft_lot_15nn | sqft_living_15nn | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| i64 | i64 | i64 | i64 | i64 | f64 | f64 | i64 | i64 | i64 | i64 | f64 | i64 | f64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 |
| 0 | 2014 | 5 | 2 | 98001 | 47.3406 | -122.269 | 9397 | 2200 | 2200 | 0 | 2.0 | 4 | 2.5 | 0 | 1 | 3 | 8 | 1987 | 0 | 9176 | 2310 | 285000 |
| 1 | 2014 | 5 | 2 | 98003 | 47.3537 | -122.303 | 10834 | 2090 | 1360 | 730 | 1.0 | 3 | 2.5 | 0 | 1 | 4 | 8 | 1987 | 0 | 8595 | 1750 | 285000 |
| 2 | 2014 | 5 | 2 | 98006 | 47.5443 | -122.177 | 8119 | 2160 | 1080 | 1080 | 1.0 | 4 | 2.25 | 0 | 1 | 3 | 8 | 1966 | 0 | 9000 | 1850 | 440000 |
| 3 | 2014 | 5 | 2 | 98006 | 47.5746 | -122.135 | 8800 | 1450 | 1450 | 0 | 1.0 | 4 | 1.0 | 0 | 1 | 4 | 7 | 1954 | 0 | 8942 | 1260 | 435000 |
| 4 | 2014 | 5 | 2 | 98006 | 47.5725 | -122.133 | 10000 | 1920 | 1070 | 850 | 1.0 | 4 | 1.5 | 0 | 1 | 4 | 7 | 1954 | 0 | 10836 | 1450 | 430000 |
| 5 | 2014 | 5 | 2 | 98007 | 47.6022 | -122.134 | 6700 | 1570 | 1570 | 0 | 1.0 | 3 | 1.5 | 0 | 1 | 4 | 7 | 1956 | 0 | 7300 | 1570 | 419000 |
| 6 | 2014 | 5 | 2 | 98008 | 47.6188 | -122.114 | 8030 | 2000 | 1000 | 1000 | 1.0 | 3 | 2.25 | 0 | 1 | 4 | 8 | 1963 | 0 | 8250 | 2070 | 420000 |
| 7 | 2014 | 5 | 2 | 98011 | 47.7698 | -122.222 | 9655 | 2210 | 1460 | 750 | 1.0 | 5 | 2.5 | 0 | 1 | 3 | 8 | 1976 | 0 | 8633 | 2080 | 470000 |
| 8 | 2014 | 5 | 2 | 98011 | 47.7419 | -122.205 | 12261 | 2730 | 2730 | 0 | 2.0 | 4 | 2.5 | 0 | 1 | 3 | 9 | 1991 | 0 | 10872 | 2730 | 612500 |
| 9 | 2014 | 5 | 2 | 98014 | 47.6517 | -121.906 | 23103 | 1800 | 1800 | 0 | 1.0 | 3 | 1.75 | 0 | 1 | 3 | 7 | 1968 | 0 | 18163 | 1410 | 284000 |
Schema¶
{
'id': Int64,
'year': Int64,
'month': Int64,
'day': Int64,
'zipcode': Int64,
'latitude': Float64,
'longitude': Float64,
'sqft_lot': Int64,
'sqft_living': Int64,
'sqft_above': Int64,
'sqft_basement': Int64,
'floors': Float64,
'bedrooms': Int64,
'bathrooms': Float64,
'waterfront': Int64,
'view': Int64,
'condition': Int64,
'grade': Int64,
'year_built': Int64,
'year_renovated': Int64,
'sqft_lot_15nn': Int64,
'sqft_living_15nn': Int64,
'price': Int64
}
Statistics¶
shape: (9, 24)
| metric | id | year | month | day | zipcode | latitude | longitude | sqft_lot | sqft_living | sqft_above | sqft_basement | floors | bedrooms | bathrooms | waterfront | view | condition | grade | year_built | year_renovated | sqft_lot_15nn | sqft_living_15nn | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 |
| "min" | 0.0 | 2014.0 | 1.0 | 1.0 | 98001.0 | 47.1559 | -122.519 | 520.0 | 290.0 | 290.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1900.0 | 0.0 | 651.0 | 399.0 | 75000.0 |
| "max" | 21612.0 | 2015.0 | 12.0 | 31.0 | 98199.0 | 47.7776 | -121.315 | 1.651359e6 | 13540.0 | 9410.0 | 4820.0 | 3.5 | 33.0 | 8.0 | 1.0 | 5.0 | 5.0 | 13.0 | 2015.0 | 2015.0 | 871200.0 | 6210.0 | 7.7e6 |
| "mean" | 10806.0 | 2014.322954 | 6.574423 | 15.688197 | 98077.939805 | 47.560053 | -122.213896 | 15106.967566 | 2079.899736 | 1788.390691 | 291.509045 | 1.494309 | 3.370842 | 2.114757 | 0.007542 | 1.234303 | 3.40943 | 7.656873 | 1971.005136 | 84.402258 | 12768.455652 | 1986.552492 | 540088.141767 |
| "median" | 10806.0 | 2014.0 | 6.0 | 16.0 | 98065.0 | 47.5718 | -122.23 | 7618.0 | 1910.0 | 1560.0 | 0.0 | 1.5 | 3.0 | 2.25 | 0.0 | 1.0 | 3.0 | 7.0 | 1975.0 | 0.0 | 7620.0 | 1840.0 | 450000.0 |
| "standard deviation" | 6239.28002 | 0.467616 | 3.115308 | 8.635063 | 53.505026 | 0.138564 | 0.140828 | 41420.511515 | 918.440897 | 828.090978 | 442.575043 | 0.539989 | 0.930062 | 0.770163 | 0.086517 | 0.766318 | 0.650743 | 1.175459 | 29.373411 | 401.67924 | 27304.179631 | 685.391304 | 367127.196483 |
| "distinct value count" | 21613.0 | 2.0 | 12.0 | 31.0 | 70.0 | 5034.0 | 752.0 | 9782.0 | 1038.0 | 946.0 | 306.0 | 6.0 | 13.0 | 30.0 | 2.0 | 5.0 | 5.0 | 12.0 | 116.0 | 70.0 | 8689.0 | 777.0 | 4028.0 |
| "idness" | 1.0 | 0.000093 | 0.000555 | 0.001434 | 0.003239 | 0.232915 | 0.034794 | 0.452598 | 0.048027 | 0.04377 | 0.014158 | 0.000278 | 0.000601 | 0.001388 | 0.000093 | 0.000231 | 0.000231 | 0.000555 | 0.005367 | 0.003239 | 0.402027 | 0.035951 | 0.186369 |
| "missing value ratio" | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| "stability" | 0.000046 | 0.677046 | 0.111692 | 0.041919 | 0.027854 | 0.000787 | 0.005367 | 0.016564 | 0.006385 | 0.009809 | 0.60732 | 0.494147 | 0.454541 | 0.248924 | 0.992458 | 0.901726 | 0.649193 | 0.415537 | 0.025864 | 0.957711 | 0.019757 | 0.009115 | 0.007958 |
Correlation heatmap¶
Attribution¶
This dataset is a modified version of the "House Sales in King County, USA" dataset by Kaggle user harlfoxem. The original dataset is licensed under CC0: Public Domain.
Column descriptions are based on this Kaggle discussion.