Classification Approaches for Social Media Text

Course organization

Instructor: Tatjana Scheffler
Time: fridays, 12-14:00
summer term 2017
Place: Golm, Haus 14, Raum 009
Modules: AM3b (B.Sc.)
AM11, AM12 (M.Sc.)
Moodle: Please register to the class's Moodle page.


active participation
presentation of either a classification algorithm or a task
Grade: Final Project -- Either a programming project implementing a classification approach based on some social media data, or a theoretical/concept paper discussion different approaches to the same problem, giving a literature overview, etc.

Course description

In this class, we study classification and clustering approaches for social media. Social media data typically comes with text but also other metadata (user information, geo-tags, network structure, etc.) which can be exploited for classification.

We will present different classification algorithms including:

We will work with concrete implementations of these algorithms to try them out on Twitter data. The class will include a practical component where we collaboratively build a classifier, probably for detecting hate speech.

We will discuss tasks in social media classification such as:

For practical matters, we will also introduce toolkits such as WEKA or RapidMiner if requested by participants, but will mainly use Python and scikit-learn


Available datasets for final projects.


Listed below with each topic. For a detailed overview on classification approaches in general, see:


Date Topic Readings Presenters/Notes
21.4. Introduction to Text Classification -- --
28.4. Naive Bayes Jurafsky/Martin SLP3, Chapter 6 assignment 1
5.5. Hate speech (I) Quandt, Thorsten, and Ruth Festl. (2017). Cyberhate. The International Encyclopedia of Media Effects.
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web (pp. 145-153).
8.5. Hate Speech on Twitter CL Colloquium: Zeerak Waseem (Sheffield)
Please note the special time! Mon, 8.5., 12-14, bldg. 14, room 009

Waseem, Z., & Hovy, D. (2016). Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of NAACL-HLT (pp. 88-93).
Waseem, Z. (2016). Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the 1st Workshop on Natural Language Processing and Computational Social Science (pp. 138-142).
Special event!
12.5. Hate speech (II) Jurafsky & Martin, SLP3, Chapter 7, Logistic Regression
Wulczyn, E., Thain, N., & Dixon, L. (2017). Ex Machina: Personal Attacks Seen at Scale. Proceedings of WWW, Arxiv preprint.
19.5. Decision trees / practical issues Mitchell, Chapter 3, Decision Trees Leitner
26.5. no class
2.6. Sentiment Günther, T. (2013). Sentiment Analysis of Microblogs. M.Sc. thesis, University of Gothenburg.
Szerszen, D., & Palsson, A. (2016). An Analysis of Methods and the Impact of Sentiment Classification in Social Media. Ms. KTH, Sweden.
9.6. Sarcasm (I) Davidov, D., & Tsur, O. (2010). Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon, 107–116.
Liebrecht, C., Kunnemann, F., & van den Bosch, A. (2013). The perfect solution for detecting sarcasm in tweets # not. In 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (pp. 29–37).
16.6. Sarcasm (II) Abercrombie, G., & Hovy, D. (2016). Putting Sarcasm Detection into Context: The Effects of Class Imbalance and Manual Labelling on Supervised Machine Classification of Twitter Conversations. ACL 2016, 107. Lloyd
23.6. Impartiality Zafar, M. B., Gummadi, K. P., & Danescu-Niculescu-Mizil, C. (2016). Message Impartiality in Social Media Discussions. In ICWSM (pp. 466-475). Sadler
30.6. Geolocation (I) Wing, B., & Baldridge, J. (2014). Hierarchical Discriminative Classification for Text-Based Geolocation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 336–348).
Gontrum/Scheffler (2015). Text-based Geolocation of German Tweets. In: Proceedings of the NLP4CMC 2015 Workshop at GSCL, Duisburg-Essen, Germany.
7.7. Geolocation (II) Jurgens, D., Finnethy, T., Mccorriston, J., Xu, Y. T., & Ruths, D. (2015). Geolocation Prediction in Twitter Using Social Networks : A Critical Analysis and Review of Current Practice. In AAAI Conference on Weblogs and Social Media.
Miura, Y., Taniguchi, M., Taniguchi, T., & Ohkuma, T. (2016). A simple scalable neural networks based model for geolocation prediction in Twitter. WNUT 2016, 9026924, 235.
14.7. User characteristics (I) Simaki, V., Mporas, I., & Megalooikonomou, V. Age Identification of Twitter Users: Classification Methods and Sociolinguistic Analysis.
Flekova, L., Ungar, L., & Preotiuc-Pietro, D. (2016, August). Exploring stylistic variation with age and income on Twitter. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL (pp. 313-319).
Zhang, J., Hu, X., Zhang, Y., & Liu, H. (2016, March). Your Age Is No Secret: Inferring Microbloggers' Ages via Content and Interaction Analysis. In ICWSM (pp. 476-485).
21.7. User characteristics (II) Ljubešic, N., & Fišer, D. (2016). Private or Corporate? Predicting User Types on Twitter. WNUT 2016, 4.
Priante, A., Hiemstra, D., Broek, T., Saeed, A., Ehrenhard, M., & Need, A. (2016). # WhoAmI in 160 Characters? Classifying Social Identities Based on Twitter.
28.7. Project presentations -- all
Last modified: Thu Jun 29 10:03:33 CEST 2017