Published May 27, 2026 | Version 1.0.0
Dataset Open

Crash Severity Prediction for Emergency Dispatch

Description

This dataset contains the final output artifacts and diagnostic figures generated during the "Crash Severity Prediction for Emergency Dispatch" experiment. Developed for the Data Stewardship 2026 (Part 3) course, this project models road collision severity (categorized as Slight, Serious, or Fatal) using a Random Forest classifier. The models were trained and evaluated on the 2023 UK Road Safety Open Data (STATS19).

These outputs provide full transparency into the model's performance and the exploratory data analysis (EDA) conducted prior to training, ensuring the reproducibility and evaluability of the machine learning pipeline.

Dataset Contents

The deposit is organized into two main categories:

1. Evaluation Reports (reports/) Tabular data detailing the model's predictive performance:

  • report-classification-metrics-2023-v1.csv: Comprehensive evaluation metrics (Precision, Recall, F1-Score) for each severity class.

  • report-predictions-test-2023-v1.csv: The raw model predictions mapped against the actual test set values.

2. Visual Diagnostics & EDA (figures/) Graphical representations of the data distribution and model behavior:

  • Model Diagnostics: A confusion matrix (fig-confusion-matrix-2023-v1.png) detailing prediction accuracies and misclassifications, alongside a feature importance chart (fig-feature-importance-2023-v1.png).

  • Target Distribution: A visualization of the class imbalance in the dataset (fig-severity-distribution-2023-v1.png).

  • Feature Histograms: Distribution plots for all scene conditions used as features in the model, including day of the week, light conditions, number of vehicles, road surface conditions, road type, speed limit, vehicle type, and weather conditions.

 

Methodology

The data was processed and modeled using Python (scikit-learn, pandas). The original raw data was flattened from three relational tables (collision, casualty, vehicle) into a single feature matrix before being split into training, validation, and test sets.

Files

fig-confusion-matrix-2023-v1.png

Files (729.2 KiB)

NameSize
md5:9fd097b99a7d28cccf1468290d5fc8ed
37.3 KiBPreview Download
md5:9168637f47668ec3889bb4f500663052
38.7 KiBPreview Download
md5:3e4d3d8fd3b33c23f77a21e6e2cb3099
38.5 KiBPreview Download
md5:ebacab6594339b02031165e96e78214d
37.8 KiBPreview Download
md5:94827e6d576cb7594228295daae1753e
44.2 KiBPreview Download
md5:22e4e9e07f80f0f43035580d2e5ecb2f
41.6 KiBPreview Download
md5:5917f4ce9e7a340e7e997db0e1d347e8
37.9 KiBPreview Download
md5:4941ab97abd2547abdda836ee21dc42b
36.0 KiBPreview Download
md5:1c0e5abdd6bf4958aac65e82627229b9
37.3 KiBPreview Download
md5:930a9d6571063f1cdf767909f69c8042
42.6 KiBPreview Download
md5:e125d764ed775d7ce77683d4b81a7f74
31.5 KiBPreview Download
md5:345ec711eeaceb3750c7a48c2821ffda
493 BytesPreview Download
md5:9e98310122bae3caf022e0aa80819703
305.2 KiBPreview Download

Additional details

Dates

Submitted
2026-05-28