Instructor: | Tatjana Scheffler |
Time: | fridays, 12-14:00 summer term 2017 |
Place: | Golm, Haus 14, Raum 009 |
Modules: | AM3b (B.Sc.) AM11, AM12 (M.Sc.) |
Moodle: | Please register to the class's Moodle page. |
readings
active participation
presentation of either a classification algorithm or a task
Grade: Final Project -- Either a programming project implementing a classification approach based on some social media data, or a theoretical/concept paper discussion different approaches to the same problem, giving a literature overview, etc.
In this class, we study classification and clustering approaches for social media. Social media data typically comes with text but also other metadata (user information, geo-tags, network structure, etc.) which can be exploited for classification.
We will present different classification algorithms including:
We will work with concrete implementations of these algorithms to try them out on Twitter data. The class will include a practical component where we collaboratively build a classifier, probably for detecting hate speech.
We will discuss tasks in social media classification such as:
For practical matters, we will also introduce toolkits such as WEKA or RapidMiner if requested by participants, but will mainly use Python and scikit-learn
Available datasets for final projects.
Listed below with each topic. For a detailed overview on classification approaches in general, see:
Date | Topic | Readings | Presenters/Notes |
---|---|---|---|
21.4. | Introduction to Text Classification | -- | -- |
28.4. | Naive Bayes | Jurafsky/Martin SLP3, Chapter 6 | assignment 1 |
5.5. | Hate speech (I) | Quandt, Thorsten, and Ruth Festl. (2017). Cyberhate. The
International Encyclopedia of Media Effects. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web (pp. 145-153). |
Feinhals Seidler |
8.5. | Hate Speech on Twitter | CL Colloquium: Zeerak Waseem (Sheffield) Please note the special time! Mon, 8.5., 12-14, bldg. 14, room 009 Waseem, Z., & Hovy, D. (2016). Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of NAACL-HLT (pp. 88-93). Waseem, Z. (2016). Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the 1st Workshop on Natural Language Processing and Computational Social Science (pp. 138-142). |
Special event! |
12.5. | Hate speech (II) | Jurafsky & Martin, SLP3, Chapter 7, Logistic
Regression Wulczyn, E., Thain, N., & Dixon, L. (2017). Ex Machina: Personal Attacks Seen at Scale. Proceedings of WWW, Arxiv preprint. |
Johannsmeier Garda |
19.5. | Decision trees / practical issues | Mitchell, Chapter 3, Decision Trees | Leitner |
26.5. | no class | ||
2.6. | Sentiment | Günther, T. (2013). Sentiment Analysis of
Microblogs. M.Sc. thesis, University of Gothenburg. Szerszen, D., & Palsson, A. (2016). An Analysis of Methods and the Impact of Sentiment Classification in Social Media. Ms. KTH, Sweden. |
Gantzlin Stazherova |
9.6. | Sarcasm (I) | Davidov, D., & Tsur, O. (2010). Semi-Supervised Recognition of
Sarcastic Sentences in Twitter and Amazon, 107–116. Liebrecht, C., Kunnemann, F., & van den Bosch, A. (2013). The perfect solution for detecting sarcasm in tweets # not. In 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (pp. 29–37). |
Haß Conle |
16.6. | Sarcasm (II) | Abercrombie, G., & Hovy, D. (2016). Putting Sarcasm Detection into Context: The Effects of Class Imbalance and Manual Labelling on Supervised Machine Classification of Twitter Conversations. ACL 2016, 107. | Lloyd |
23.6. | Impartiality | Zafar, M. B., Gummadi, K. P., & Danescu-Niculescu-Mizil, C. (2016). Message Impartiality in Social Media Discussions. In ICWSM (pp. 466-475). | Sadler |
30.6. | Geolocation (I) | Wing, B., & Baldridge, J. (2014). Hierarchical Discriminative Classification for Text-Based
Geolocation. In Proceedings of the 2014 Conference on Empirical
Methods in Natural Language Processing (EMNLP) (pp. 336–348). Gontrum/Scheffler (2015). Text-based Geolocation of German Tweets. In: Proceedings of the NLP4CMC 2015 Workshop at GSCL, Duisburg-Essen, Germany. |
Raithel Pornavalai |
7.7. | Geolocation (II) | Jurgens, D., Finnethy, T., Mccorriston, J., Xu, Y. T., & Ruths,
D. (2015). Geolocation Prediction in Twitter Using Social Networks :
A Critical Analysis and Review of Current Practice. In AAAI
Conference on Weblogs and Social Media. Miura, Y., Taniguchi, M., Taniguchi, T., & Ohkuma, T. (2016). A simple scalable neural networks based model for geolocation prediction in Twitter. WNUT 2016, 9026924, 235. |
Feldhus Rakowski |
14.7. | User characteristics (I) | Simaki, V., Mporas, I., & Megalooikonomou, V. Age Identification
of Twitter Users: Classification Methods and Sociolinguistic
Analysis. Flekova, L., Ungar, L., & Preotiuc-Pietro, D. (2016, August). Exploring stylistic variation with age and income on Twitter. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL (pp. 313-319). Zhang, J., Hu, X., Zhang, Y., & Liu, H. (2016, March). Your Age Is No Secret: Inferring Microbloggers' Ages via Content and Interaction Analysis. In ICWSM (pp. 476-485). |
Dobler Zawierucha |
21.7. | User characteristics (II) | Ljubešic, N., & Fišer, D. (2016). Private or Corporate? Predicting
User Types on Twitter. WNUT 2016, 4. Priante, A., Hiemstra, D., Broek, T., Saeed, A., Ehrenhard, M., & Need, A. (2016). # WhoAmI in 160 Characters? Classifying Social Identities Based on Twitter. |
Schulz Uhlemann |
28.7. | Project presentations | -- | all |