DMP: Air Pollution Emissions Classification using European Open Data
Creators
Description
Dataset Description: Air Pollution Emissions Classification Dataset
Context and methodology
This dataset is used in a machine learning project analysing air pollution emissions in Europe. The data comes from the European Open Data Portal (EAA20 - Air Pollution Emissions dataset).
The purpose is to train a classification model that predicts pollution levels. Since the dataset has no labels, a binary target variable (high/low pollution) is created using a median threshold.
The dataset is downloaded in CSV format and processed using Python, including data cleaning and feature engineering
Technical details
The dataset is organised as follows:
- data/raw/ – original dataset
- data/processed/– cleaned data
- src/ – code
- results/ – outputs
- docs/ – documentation
File naming follows consistent conventions (e.g., raw data, processed data, model files, and plots).
The dataset can be used with Python (pandas, numpy, scikit-learn). Documentation and code are provided.
Further details
- The dataset is publicly available and reusable under its licence.
- No personal or sensitive data are included.
- The dataset is managed following FAIR principles to ensure that it is findable, accessible, interoperable, and reusable.