Datasets for classification tasks related to social media text
(an incomplete list)
If you use any of the provided datasets, please make sure to cite
the corresponding papers! Links are given without warranty of any
kind.
You can notify me of expired links or other datasets I should add here.
Sentiment
- Germeval Task 2017 - Shared Task on Aspect-based Sentiment in
Social Media Customer Feedback (also includes on task on "relevance"
(to a topic) that may be interesting)
Sarcasm
Geolocation
User classification
- TWISTY: a Multilingual Twitter Stylometry Corpus for Gender and Personality Profiling
http://www.clips.ua.ac.be/datasets/twisty-corpus
Paper dazu:
http://www.clips.ua.ac.be/~walter/papers/2016/vdp16.pdf
- Author Profiling Shared Task
http://pan.webis.de/clef17/pan17-web/author-profiling.html
unten gibt es andere Datasets (u.a. von vorigen Jahren (2013-2016)).
Hate Speech
Others
- Clickbait
Challenge (identify clickbait)
- Dialog act annotation for German Twitter conversations (contact me)