Sentiment Prediction Outputs for Twitter Dataset
Contributors
Data manager:
Description
Context and Methodology:
This dataset was created as part of a sentiment analysis project using enriched Twitter data. The objective was to train and test a machine learning model to automatically classify the sentiment of tweets (e.g., Positive, Negative, Neutral).
The data was generated using tweets that were sentiment-scored with a custom sentiment scorer. A machine learning pipeline was applied, including text preprocessing, feature extraction with CountVectorizer, and prediction with a HistGradientBoostingClassifier.
Technical Details:
The dataset includes five main files:
test_predictions_full.csv – Predicted sentiment labels for the test set.
sentiment_model.joblib – Trained machine learning model.
count_vectorizer.joblib – Text feature extraction model (CountVectorizer).
model_performance.txt – Evaluation metrics and performance report of the trained model.
confusion_matrix.png – Visualization of the model’s confusion matrix.
The files follow standard naming conventions based on their purpose.
The .joblib files can be loaded into Python using the joblib and scikit-learn libraries.
The .csv,.txt, and .png files can be opened with any standard text reader, spreadsheet software, or image viewer.
Additional performance documentation is included within the model_performance.txt file.
Additional Details:
The data was constructed to ensure reproducibility.
No personal or sensitive information is present.
It can be reused by researchers, data scientists, and students interested in Natural Language Processing (NLP), machine learning classification, and sentiment analysis tasks.
Files
confusion_matrix.png
Files (3.3 MiB)
Name | Size | |
---|---|---|
md5:88a3794a78d6267a1da0f19b517b0b91 | 28.7 KiB | Preview Download |
md5:e6fd3556e8636d264611ac7eb06b3e95 | 141.2 KiB | Download |
md5:e2bae9f8b376574fe4399bac7d17e6e2 | 571 Bytes | Preview Download |
md5:b4ce5dd5a5d0efdd785e4deaeac23f92 | 2.1 MiB | Download |
md5:2b59407fb2ecbfbdd004a14030d9e1c3 | 1.0 MiB | Preview Download |
Additional details
Dates
- Accepted
- 2025-04-28