Published November 30, 2025 | Version v1.0
Dataset Open

Wine Dataset for Machine Learning Classification Experiment

  • 1. ROR icon TU Wien

Contributors

  • 1. UCI / University of Genoa
  • 2. ROR icon University of Genoa

Description

Dataset Description

These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines. 

Context and methodology

This dataset was created for research in machine learning classification and chemometrics, specifically to test and compare different supervised learning algorithms.

The dataset provides chemical analysis measurements of 178 wines from three cultivars grown in the same Italian region. It is used to evaluate the performance of classification algorithms (Multi-Layer Perceptron, Decision Tree, Gaussian Process Classifier) in predicting wine classes. It allows testing models under well-posed classification conditions and reproducibility studies.

Data were obtained by laboratory chemical analysis of wines, measuring 13 constituents such as Alcohol, Malic acid, Ash, Magnesium, Total phenols, Flavanoids, and Proline. The original dataset was donated by Riccardo Leardi (University of Genoa) and hosted in the UCI Machine Learning Repository.

Technical details

The dataset is a single file (wine.data) with 178 rows (instances) and 14 columns (13 features + 1 target class).

Column names follow a consistent naming convention corresponding to chemical constituents (e.g., Alcohol, Malic_Acid, Ash, …, Proline, class).

It can be opened programmatically using Python (pandas, numpy), R, or similar tools.

The  ucimlrepo Python package can fetch the dataset programmatically.

The dataset is ready for machine learning analysis after standard preprocessing (scaling recommended for some classifiers).

Further details

  • No missing values in the dataset.

  • First column (class, Column 0) is the target variable (values 1–3).

  • All features are numeric (continuous or integer).

  • Recommended for testing classification algorithms and reproducibility studies.

  • Licensed under CC-BY 4.0, so reuse and redistribution is allowed with proper attribution.

Files

Files (10.5 KiB)

NameSize
md5:3e584720e6718d28509f86f05b7885a1
10.5 KiBDownload

Additional details

Identifiers

Related works

Is derived from
Dataset: 10.24432/C5PC7J (DOI)

Dates

Submitted
2025-11-30
Date of deposit to test repository

References