Published May 30, 2026 | Version 2.0
Data Management Plan Open

DMP-FINAL: Predicting Road Accident Severity in Great Britain

Description

DMP: UK Collision Severity Prediction

DOI

Predicting the severity of road traffic collisions in the United Kingdom using the Department for Transport's STATS19 open data (2020–2024).

Project context

Developed as part of the FAIR Data Science course (DaSt 2026) at TU Wien.

Abstract

The UK government publishes road safety data — every accident reported by the police since 1979. We load the last few years into a normalised database on TU Wien's DBRepo and train a classifier that tries to predict how bad a collision was (Fatal, Serious, or Slight) from things like the weather, road type, and time of day.

The whole pipeline runs from one REST API source, no local CSVs. We compare three classifiers on a validation set, pick the best (Gradient Boosting), and evaluate it on a held-out test set. Everything — code, model, predictions, figures, metadata — is open licensed and documented with the usual FAIR metadata stuff (RO-Crate, CodeMeta, FAIR4ML, Croissant, Model Card).

Data source

Department for Transport, UK Government — Road Safety Data: https://www.gov.uk/government/statistical-data-sets/road-safety-open-data

Mirrored as a 3NF database on TU Wien DBRepo: https://test.dbrepo.tuwien.ac.at/database/82c19b39-246c-4409-b25c-8baf3a158a70 (DOI: 10.82556/c8r3-bf26)

Licences

This project involves three categories of artefact, each with a separate licence.

Input Data

The input dataset is the STATS19 Road Safety Open Dataset published by the UK Department for Transport and available at https://www.gov.uk/government/statistical-data-sets/road-safety-open-data.

It is licensed under the Open Government Licence v3.0 (OGL v3.0)https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/

Obligations: Attribution is required. Derived works must acknowledge the source with the statement:

Contains public sector information licensed under the Open Government Licence v3.0.

OGL v3.0 is compatible with Creative Commons Attribution 4.0 (CC BY 4.0) and does not impose ShareAlike restrictions, so the output data licence (CC BY 4.0) is compatible.

Software / Code

All source code in this repository is licensed under the MIT Licence. See LICENSE for the full text.

MIT was chosen because it is a permissive open-source licence that is fully compatible with OGL v3.0 and imposes no restrictions on reuse, modification, or distribution. It is one of the most widely adopted licences for research software.

Output Data

All output artefacts produced by this experiment — including trained model files, preprocessed datasets, evaluation figures (confusion matrices, performance charts, feature importance plots), and predictions — are released under the Creative Commons Attribution 4.0 International (CC BY 4.0) licence: https://creativecommons.org/licenses/by/4.0/

This licence permits unrestricted reuse, redistribution, and adaptation for any purpose, including commercially, provided appropriate credit is given. CC BY 4.0 is compatible with OGL v3.0 and consistent with the FWF Open Access policy.

Files

DMP-Final_Predicting-Road-Accident-Severity-in-Great-Britain.pdf

Files (700.4 KiB)

Additional details

Dates

Created
2026-05-29