Published April 14, 2026 | Version v1
Data Management Plan Open

DMP: Air Pollution Emissions Classification using European Open Data

  • 1. ROR icon TU Wien

Description

Dataset Description: Air Pollution Emissions Classification Dataset

Context and methodology

This dataset is used in a machine learning project analysing air pollution emissions in Europe. The data comes from the European Open Data Portal (EAA20 - Air Pollution Emissions dataset).

The purpose is to train a classification model that predicts pollution levels. Since the dataset has no labels, a binary target variable (high/low pollution) is created using a median threshold.

The dataset is downloaded in CSV format and processed using Python, including data cleaning and feature engineering

Technical details

  • The dataset is organised as follows:

    • data/raw/ – original dataset
    • data/processed/– cleaned data
    • src/ – code
    • results/ – outputs
    • docs/ – documentation

    File naming follows consistent conventions (e.g., raw data, processed data, model files, and plots).

    The dataset can be used with Python (pandas, numpy, scikit-learn). Documentation and code are provided.

Further details

  • The dataset is publicly available and reusable under its licence.
  • No personal or sensitive data are included.
  • The dataset is managed following FAIR principles to ensure that it is findable, accessible, interoperable, and reusable.

Files

12436439-DMP.pdf

Files (59.5 KiB)

NameSize
md5:061ed99781f3b4d886a45aa5574476e0
59.5 KiBPreview Download