NBA MVP Likelihood Investigator (Dataset and Source code)
Creators
Description
NBA MVP Likelihood Investigator Project Demo
This file is a zipped Github repository of an NBA MVP Likelihood Investigator demo project for the Research Data Management Course. It contains source code in the format of Jupyter notebooks, configuration files and the result of the preprocessing step an exemplary processed dataset P1.
Context and methodology
The project serves three main purposes:
- To structure an analysis on how player- and team-level statistics relate to MVP outcomes for the 2025–2026 NBA season.
- To provide a reproducible pipeline for going from raw NBA statistics to cleaned, analysis-ready data.
- To offer an exemplary processed dataset (P1 – Cleaned NBA Player Game Statistics 2025–2026) that demonstrates good practice in structuring and documenting research data.
The dataset and code in this archive were created as follows:
- Raw player statistics for the 2025–2026 NBA season were obtained from an external NBA statistics provider (R1 – NBA Player Statistics 2025–2026). These raw data are not redistributed in the archive; instead, instructions for obtaining them are provided.
- The raw data are loaded and processed in a Jupyter notebook (`src/groundwork/preprocessing.ipynb`) using Python and standard data-science libraries listed in the uv.lock file.
- Preprocessing is about selecting the first 3 rows and exporting a table (P1 – Cleaned NBA Player Game Statistics 2025–2026) into `data/processed/` as an exemplary effort.
- An additional notebook can be found in `src/models/` to illustrate how the structure for building simple MVP likelihood models could look like.
Datasets
R1 – NBA Historical Statistics 1974-2025 (raw, external dataset)
Type: Reused external dataset (not redistributed in the project)
Content: All kinds of statistics for NBA games since the 1974-75 season up until today.
The whole dataset can be found in this Kaggle link.
P1 - Exemplary NBA Player Game Statistics(2025-2026)
Type: Derived dataset created in this project (e.g. data/processed/Processed_Stat.csv)
Content: Cleaned subset of the original NBA player statistics dataset.
Each row still represents one player in one game, with:
Clean identifiers & context
personId(player ID),firstName,lastNamegameId,gameDateTimeEstplayerteamCity,playerteamName,opponentteamCity,opponentteamNameFlags such as
win(team won) andhome(home game)
Core performance metrics
numMinutes,points,assists,reboundsOffensive,reboundsDefensive,reboundsTotalShooting stats:
fieldGoalsAttempted,fieldGoalsMade,fieldGoalsPercentage,threePointersAttempted,threePointersMade,threePointersPercentage,freeThrowsAttempted,freeThrowsMade,freeThrowsPercentageOther stats:
steals,blocks,turnovers,foulsPersonal,plusMinusPoints
P1 is produced by the preprocessing notebook (src/groundwork/preprocessing.ipynb) and is intended as an analysis-ready example dataset for exploration and MVP-modelling experiments.
Technical details
To familirize yourself with the technical details please download the zip file and take a look at the readme file included or go straight to where the repository can be found: GitHub Repository.
If you would like to know more about the datasets other then mentioned in the Datasets sections aboive, two additional documentation files can be found in the data/raw and data/processed folders.
Further details
- As the intention of this upload is to fulfill the necessary requirements of the course, it will take time to fully develop this project if one intends to do so.
Licenses
CC BY 4.0 - The produced dataset by this porject namely P1 or Processed_Stat.csv is under this license.
MIT License - The source code in this repository is made avaliable under the MIT License.
Files
NBA-MVP-Likelihood-Investigator-main.zip
Additional details
Identifiers
Related works
- Is cited by
- Data Management Plan: 10.5281/zenodo.17714832 (DOI)