DMP-FINAL: Predicting Road Accident Severity in Great Britain
Description
DMP: UK Collision Severity Prediction
Predicting the severity of road traffic collisions in the United Kingdom using the Department for Transport's STATS19 open data (2020–2024).
Project context
Developed as part of the FAIR Data Science course (DaSt 2026) at TU Wien.
Abstract
The UK government publishes road safety data — every accident reported by the police since 1979. We load the last few years into a normalised database on TU Wien's DBRepo and train a classifier that tries to predict how bad a collision was (Fatal, Serious, or Slight) from things like the weather, road type, and time of day.
The whole pipeline runs from one REST API source, no local CSVs. We compare three classifiers on a validation set, pick the best (Gradient Boosting), and evaluate it on a held-out test set. Everything — code, model, predictions, figures, metadata — is open licensed and documented with the usual FAIR metadata stuff (RO-Crate, CodeMeta, FAIR4ML, Croissant, Model Card).
Data source
Department for Transport, UK Government — Road Safety Data: https://www.gov.uk/government/statistical-data-sets/road-safety-open-data
Mirrored as a 3NF database on TU Wien DBRepo: https://test.dbrepo.tuwien.ac.at/database/82c19b39-246c-4409-b25c-8baf3a158a70 (DOI: 10.82556/c8r3-bf26)
Licences
This project involves three categories of artefact, each with a separate licence.
Input Data
The input dataset is the STATS19 Road Safety Open Dataset published by the UK Department for Transport and available at https://www.gov.uk/government/statistical-data-sets/road-safety-open-data.
It is licensed under the Open Government Licence v3.0 (OGL v3.0): https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
Obligations: Attribution is required. Derived works must acknowledge the source with the statement:
Contains public sector information licensed under the Open Government Licence v3.0.
OGL v3.0 is compatible with Creative Commons Attribution 4.0 (CC BY 4.0) and does not impose ShareAlike restrictions, so the output data licence (CC BY 4.0) is compatible.
Software / Code
All source code in this repository is licensed under the MIT Licence. See LICENSE for the full text.
MIT was chosen because it is a permissive open-source licence that is fully compatible with OGL v3.0 and imposes no restrictions on reuse, modification, or distribution. It is one of the most widely adopted licences for research software.
Output Data
All output artefacts produced by this experiment — including trained model files, preprocessed datasets, evaluation figures (confusion matrices, performance charts, feature importance plots), and predictions — are released under the Creative Commons Attribution 4.0 International (CC BY 4.0) licence: https://creativecommons.org/licenses/by/4.0/
This licence permits unrestricted reuse, redistribution, and adaptation for any purpose, including commercially, provided appropriate credit is given. CC BY 4.0 is compatible with OGL v3.0 and consistent with the FWF Open Access policy.
Files
DMP-Final_Predicting-Road-Accident-Severity-in-Great-Britain.pdf
Additional details
Dates
- Created
- 2026-05-29
