Sentiment Prediction Outputs for Twitter Dataset
Contributors
Data manager:
Description
Context and Methodology:
This dataset was created as part of a sentiment analysis project using enriched Twitter data. The objective was to train and test a machine learning model to automatically classify the sentiment of tweets (e.g., Positive, Negative, Neutral).
The data was generated using tweets that were sentiment-scored with a custom sentiment scorer. A machine learning pipeline was applied, including text preprocessing, feature extraction with CountVectorizer, and prediction with a HistGradientBoostingClassifier.
Technical Details:
The dataset includes five main files:
-
test_predictions_full.csv – Predicted sentiment labels for the test set.
-
sentiment_model.joblib – Trained machine learning model.
-
count_vectorizer.joblib – Text feature extraction model (CountVectorizer).
-
model_performance.txt – Evaluation metrics and performance report of the trained model.
-
confusion_matrix.png – Visualization of the model’s confusion matrix.
The files follow standard naming conventions based on their purpose.
The .joblib files can be loaded into Python using the joblib and scikit-learn libraries.
The .csv,.txt, and .png files can be opened with any standard text reader, spreadsheet software, or image viewer.
Additional performance documentation is included within the model_performance.txt file.
Additional Details:
-
The data was constructed to ensure reproducibility.
-
No personal or sensitive information is present.
-
It can be reused by researchers, data scientists, and students interested in Natural Language Processing (NLP), machine learning classification, and sentiment analysis tasks.
Files
confusion_matrix.png
Files
(3.3 MiB)
Name | Size | |
---|---|---|
md5:88a3794a78d6267a1da0f19b517b0b91
|
28.7 KiB | Preview Download |
md5:e6fd3556e8636d264611ac7eb06b3e95
|
141.2 KiB | Download |
md5:e2bae9f8b376574fe4399bac7d17e6e2
|
571 Bytes | Preview Download |
md5:b4ce5dd5a5d0efdd785e4deaeac23f92
|
2.1 MiB | Download |
md5:2b59407fb2ecbfbdd004a14030d9e1c3
|
1.0 MiB | Preview Download |
Additional details
Dates
- Accepted
-
2025-04-28