Published April 28, 2025 | Version 1.0
Dataset Open

Sentiment Prediction Outputs for Twitter Dataset

  • 1. ROR icon TU Wien

Contributors

Data manager:

Description

Context and Methodology:

This dataset was created as part of a sentiment analysis project using enriched Twitter data. The objective was to train and test a machine learning model to automatically classify the sentiment of tweets (e.g., Positive, Negative, Neutral).
The data was generated using tweets that were sentiment-scored with a custom sentiment scorer. A machine learning pipeline was applied, including text preprocessing, feature extraction with CountVectorizer, and prediction with a HistGradientBoostingClassifier.

Technical Details:

The dataset includes five main files:

  • test_predictions_full.csv – Predicted sentiment labels for the test set.

  • sentiment_model.joblib – Trained machine learning model.

  • count_vectorizer.joblib – Text feature extraction model (CountVectorizer).

  • model_performance.txt – Evaluation metrics and performance report of the trained model.

  • confusion_matrix.png – Visualization of the model’s confusion matrix.

The files follow standard naming conventions based on their purpose.
The .joblib files can be loaded into Python using the joblib and scikit-learn libraries.
The .csv,.txt, and .png files can be opened with any standard text reader, spreadsheet software, or image viewer.
Additional performance documentation is included within the model_performance.txt file.

Additional Details:

  • The data was constructed to ensure reproducibility.

  • No personal or sensitive information is present.

  • It can be reused by researchers, data scientists, and students interested in Natural Language Processing (NLP), machine learning classification, and sentiment analysis tasks.

Files

confusion_matrix.png

Files (3.3 MiB)

Name Size
md5:88a3794a78d6267a1da0f19b517b0b91
28.7 KiB Preview Download
md5:e6fd3556e8636d264611ac7eb06b3e95
141.2 KiB Download
md5:e2bae9f8b376574fe4399bac7d17e6e2
571 Bytes Preview Download
md5:b4ce5dd5a5d0efdd785e4deaeac23f92
2.1 MiB Download
md5:2b59407fb2ecbfbdd004a14030d9e1c3
1.0 MiB Preview Download

Additional details

Dates

Accepted
2025-04-28