Published November 30, 2025 | Version v1
Dataset Open

Topologically Accurate Segmentation and Brain MRI Generation

  • 1. ROR icon TU Wien

Description

Topologically accurate brain cancer MRI generation

Context and methodology

This project investigates whether it is possible to generate topologically accurate magnetic resonance images (MRIs) of brain tumors using a two-stage neural network pipeline. The research question is : can a machine learning pipeline produce synthetic MRIs that preserve correct brain anatomy and tumor structure ? This is important for data augmentation in medical imaging, especially for brain cancer, where real data is scarce and expensive to collect.

The project builds on a study examining the role of topological constraints in guiding machine learning models toward anatomically accurate segmentations. Diffusion models were applied to brain MRI data to generate segmentation masks and synthetic images. The pipeline consists of two stages : first, a diffusion model trained with a topological loss generates segmentation masks that preserve brain anatomy. Second, a ControlNet model generates MRI images constrained by these masks.

The experiment uses the BRATS Task 01 dataset (glioma MRIs with segmentations) and the IXI dataset (healthy brain MRIs). The workflow is as follows : 1. train a ControlNet on real MRIs and ground truth segmentations to map masks to realistic MRIs ; 2. train a diffusion model with a topological loss to generate segmentation masks with correct topology ; 3. use these masks with the ControlNet to generate synthetic MRIs.

The resulting synthetic images can be used to augment training datasets for segmentation models. Full understanding of the pipeline requires knowledge of diffusion models and concepts from topological data analysis, including persistent homology.

 

Technical details

Repository Structure


- **raw_data/**  
  Original datasets separated into `brats2021` and `ixi`. Each dataset contains subject-specific folders with MRI images and metadata files. BRATS contains T1, T2, T1ce, FLAIR, and segmentation images, while IXI contains T1 and T2 images. Metadata JSON files include acquisition parameters and provenance information.

- **preprocessed_data/**  
  Preprocessed T1 and T2 images from BRATS and IXI, standardized in voxel size and intensity. Each subject has a folder with `.nii.gz` images and metadata `.json` files describing preprocessing steps.

- **synthetic_data/**  
  Images generated by the pipeline, including synthetic segmentation masks and synthetic MRIs. Filenames indicate the pipeline stage and synthetic subject ID, with accompanying metadata JSON files specifying model type, training epoch, and pipeline parameters.

- **models/**  
  PyTorch model checkpoints for the diffusion and ControlNet networks (`.pth`) with corresponding JSON files documenting parameters and epoch.

- **code/**  
  Python scripts (`.py`) and Jupyter notebooks (`.ipynb`) for preprocessing, training, generating synthetic images, and testing the models, as well as examples.  
  Key scripts include:  
  - `generate_synthetic.py` : generate synthetic MRI images from segmentation masks.  
  - `generate_segmentation.py` : generate synthetic segmentation masks.  
  - `test_diffusion.py` and `test_controlnet.py` : test individual networks.  

- **README.md** – this file, describing the repository.  
- **requirements.txt** – Python dependencies.

Metadata and Documentation


- Filenames follow the convention: `sub-<id>_<modality>.nii.gz` for real and preprocessed data, `sub-synth<id>_<modality>.nii.gz` for synthetic data.  
- Metadata JSON files describe acquisition parameters, preprocessing steps, model parameters, and provenance information.  
- Model checkpoints include metadata with training parameters and epochs.  
- The repository is versioned : each Git commit corresponds to a snapshot of the data and models stored on the TU Wien test repository. Large files are only stored on TU Wien repository.

Usage


1. Install dependencies from `requirements.txt`.
2. Run preprocessing, training, or generation scripts from the `code/` directory.  
3. Follow metadata in JSON files to interpret preprocessing steps, model parameters, and synthetic image generation.  

Licensing


- Raw and preprocessed datasets are used according to their respective licenses (BRATS 2021, IXI).  
- Code is provided under the MIT License.  
- Generated data is provided under the CC BY 4.0 license.


Files

topo_seg.zip

Files (32.5 KiB)

NameSize
md5:f6d91ff223bf639154fdcc8c65cf6f22
32.5 KiBPreview Download

Additional details

Dates

Available
2025-11-30