NBA MVP Likelihood Investigator (Dataset and Source code)

Reti, Peter

doi:10.70124/vmf7g-nv342

Published November 28, 2025 | Version v1

Software Open

NBA MVP Likelihood Investigator (Dataset and Source code)

Reti, Peter (Contact person)¹

1. TU Wien

NBA MVP Likelihood Investigator Project Demo

This file is a zipped Github repository of an NBA MVP Likelihood Investigator demo project for the Research Data Management Course. It contains source code in the format of Jupyter notebooks, configuration files and the result of the preprocessing step an exemplary processed dataset P1.

Context and methodology

The project serves three main purposes:

To structure an analysis on how player- and team-level statistics relate to MVP outcomes for the 2025–2026 NBA season.
To provide a reproducible pipeline for going from raw NBA statistics to cleaned, analysis-ready data.
To offer an exemplary processed dataset (P1 – Cleaned NBA Player Game Statistics 2025–2026) that demonstrates good practice in structuring and documenting research data.

The dataset and code in this archive were created as follows:

Raw player statistics for the 2025–2026 NBA season were obtained from an external NBA statistics provider (R1 – NBA Player Statistics 2025–2026). These raw data are not redistributed in the archive; instead, instructions for obtaining them are provided.
The raw data are loaded and processed in a Jupyter notebook (`src/groundwork/preprocessing.ipynb`) using Python and standard data-science libraries listed in the uv.lock file.
Preprocessing is about selecting the first 3 rows and exporting a table (P1 – Cleaned NBA Player Game Statistics 2025–2026) into `data/processed/` as an exemplary effort.
An additional notebook can be found in `src/models/` to illustrate how the structure for building simple MVP likelihood models could look like.

Datasets

R1 – NBA Historical Statistics 1974-2025 (raw, external dataset)

Type: Reused external dataset (not redistributed in the project)
Content: All kinds of statistics for NBA games since the 1974-75 season up until today.

The whole dataset can be found in this Kaggle link.

P1 - Exemplary NBA Player Game Statistics(2025-2026)

Type: Derived dataset created in this project (e.g. data/processed/Processed_Stat.csv)
Content: Cleaned subset of the original NBA player statistics dataset.

Each row still represents one player in one game, with:

Clean identifiers & context
- personId (player ID), firstName, lastName
- gameId, gameDateTimeEst
- playerteamCity, playerteamName, opponentteamCity, opponentteamName
- Flags such as win (team won) and home (home game)
Core performance metrics
- numMinutes, points, assists, reboundsOffensive, reboundsDefensive, reboundsTotal
- Shooting stats: fieldGoalsAttempted, fieldGoalsMade, fieldGoalsPercentage,
  threePointersAttempted, threePointersMade, threePointersPercentage,
  freeThrowsAttempted, freeThrowsMade, freeThrowsPercentage
- Other stats: steals, blocks, turnovers, foulsPersonal, plusMinusPoints

P1 is produced by the preprocessing notebook (src/groundwork/preprocessing.ipynb) and is intended as an analysis-ready example dataset for exploration and MVP-modelling experiments.

Technical details

To familirize yourself with the technical details please download the zip file and take a look at the readme file included or go straight to where the repository can be found: GitHub Repository.

If you would like to know more about the datasets other then mentioned in the Datasets sections aboive, two additional documentation files can be found in the data/raw and data/processed folders.

Further details

As the intention of this upload is to fulfill the necessary requirements of the course, it will take time to fully develop this project if one intends to do so.

Licenses

CC BY 4.0 - The produced dataset by this porject namely P1 or Processed_Stat.csv is under this license.

MIT License - The source code in this repository is made avaliable under the MIT License.

Files

NBA-MVP-Likelihood-Investigator-main.zip

Files (241.5 KiB)

Name	Size
NBA-MVP-Likelihood-Investigator-main.zip md5:75da4c3ffa000cbccbb21115061f2aed	241.5 KiB	Preview Download

Additional details

URL: https://github.com/retipeti/NBA-MVP-Likelihood-Investigator

Is cited by: Data Management Plan: 10.5281/zenodo.17714832 (DOI)

NBA MVP Likelihood Investigator (Dataset and Source code)

NBA MVP Likelihood Investigator Project Demo

Context and methodology

Datasets

Technical details

Further details

Licenses

Files

NBA-MVP-Likelihood-Investigator-main.zip

Files (241.5 KiB)

Additional details

Identifiers

Related works

NBA MVP Likelihood Investigator (Dataset and Source code)

Creators

Description

NBA MVP Likelihood Investigator Project Demo

Context and methodology

Datasets

Technical details

Further details

Licenses

Files

NBA-MVP-Likelihood-Investigator-main.zip

Files (241.5 KiB)

Additional details

Identifiers

Related works