DMP: Predicting Housing Prices Using Real Estate Data in European Cities
Creators
Description
This dataset is part of a machine learning project in the domain of real estate analytics and urban economics. The purpose of the dataset is to support the prediction of housing prices in European cities using regression models.
The dataset was obtained from the European Union Open Data Portal and contains structured data related to real estate properties. It includes features such as property size (in square meters), number of rooms, location, and year of construction.
The dataset was not created from scratch but reused from an open data source. It was preprocessed by handling missing values, cleaning inconsistent entries, and transforming categorical variables (such as location) into numerical representations suitable for machine learning models.
Technically, the dataset is stored in CSV format and organized into separate files for training, validation, and testing. A clear folder structure is used, including directories for raw data, processed data, and model outputs. File naming conventions are consistent and descriptive (e.g., train.csv, test.csv).
The dataset can be opened and processed using standard tools such as Python (with libraries like Pandas, NumPy, and Scikit-learn) or R. No proprietary software is required.
Additional resources include the source dataset from the EU Open Data Portal, as well as the project code used for preprocessing and model training. Documentation is provided in the form of a README file, which explains variable definitions, preprocessing steps, and usage instructions.
There are no major restrictions on reuse, as the dataset is publicly available. However, users should consider that external factors such as economic conditions and market trends are not fully captured in the dataset, which may influence prediction results.