Classification Approaches for Social Media Text

Course organization

Instructor: Tatjana Scheffler

Time: fridays, 12-14:00
summer term 2017

Place: Golm, Haus 14, Raum 009

Modules: AM3b (B.Sc.)
AM11, AM12 (M.Sc.)

Moodle: Please register to the class's Moodle page.

Requirements

readings
active participation
presentation of either a classification algorithm or a task
Grade: Final Project -- Either a programming project implementing a classification approach based on some social media data, or a theoretical/concept paper discussion different approaches to the same problem, giving a literature overview, etc.

Course description

In this class, we study classification and clustering approaches for social media. Social media data typically comes with text but also other metadata (user information, geo-tags, network structure, etc.) which can be exploited for classification.

We will present different classification algorithms including:

naive Bayes
support vector machines (SVM)
k-nearest-neighbor
decision trees
neural network approaches

We will work with concrete implementations of these algorithms to try them out on Twitter data. The class will include a practical component where we collaboratively build a classifier, probably for detecting hate speech.

We will discuss tasks in social media classification such as:

sentiment
spam
impartiality
hate speech
bot or not
on/off topic
user characteristics: gender, age, occupation, social class
... and others

For practical matters, we will also introduce toolkits such as WEKA or RapidMiner if requested by participants, but will mainly use Python and scikit-learn

Datasets

Available datasets for final projects.

Readings

Listed below with each topic. For a detailed overview on classification approaches in general, see:

Aggarwal, C. C. and Zhai, C. (2012). A survey of text classification algorithms. In Aggarwal, C. C. and Zhai, C. (Eds.), Mining text data, pp. 163–222. Springer.

Schedule

Date	Topic	Readings	Presenters/Notes
21.4.	Introduction to Text Classification	--	--
28.4.	Naive Bayes	Jurafsky/Martin SLP3, Chapter 6	assignment 1
5.5.	Hate speech (I)	Quandt, Thorsten, and Ruth Festl. (2017). Cyberhate. The International Encyclopedia of Media Effects. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web (pp. 145-153).	Feinhals Seidler
8.5.	Hate Speech on Twitter	CL Colloquium: Zeerak Waseem (Sheffield) Please note the special time! Mon, 8.5., 12-14, bldg. 14, room 009 Waseem, Z., & Hovy, D. (2016). Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of NAACL-HLT (pp. 88-93). Waseem, Z. (2016). Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the 1st Workshop on Natural Language Processing and Computational Social Science (pp. 138-142).	Special event!
12.5.	Hate speech (II)	Jurafsky & Martin, SLP3, Chapter 7, Logistic Regression Wulczyn, E., Thain, N., & Dixon, L. (2017). Ex Machina: Personal Attacks Seen at Scale. Proceedings of WWW, Arxiv preprint.	Johannsmeier Garda
19.5.	Decision trees / practical issues	Mitchell, Chapter 3, Decision Trees	Leitner
26.5.	no class
2.6.	Sentiment	Günther, T. (2013). Sentiment Analysis of Microblogs. M.Sc. thesis, University of Gothenburg. Szerszen, D., & Palsson, A. (2016). An Analysis of Methods and the Impact of Sentiment Classification in Social Media. Ms. KTH, Sweden.	Gantzlin Stazherova
9.6.	Sarcasm (I)	Davidov, D., & Tsur, O. (2010). Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon, 107–116. Liebrecht, C., Kunnemann, F., & van den Bosch, A. (2013). The perfect solution for detecting sarcasm in tweets # not. In 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (pp. 29–37).	Haß Conle
16.6.	Sarcasm (II)	Abercrombie, G., & Hovy, D. (2016). Putting Sarcasm Detection into Context: The Effects of Class Imbalance and Manual Labelling on Supervised Machine Classification of Twitter Conversations. ACL 2016, 107.	Lloyd
23.6.	Impartiality	Zafar, M. B., Gummadi, K. P., & Danescu-Niculescu-Mizil, C. (2016). Message Impartiality in Social Media Discussions. In ICWSM (pp. 466-475).	Sadler
30.6.	Geolocation (I)	Wing, B., & Baldridge, J. (2014). Hierarchical Discriminative Classification for Text-Based Geolocation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 336–348). Gontrum/Scheffler (2015). Text-based Geolocation of German Tweets. In: Proceedings of the NLP4CMC 2015 Workshop at GSCL, Duisburg-Essen, Germany.	Raithel Pornavalai
7.7.	Geolocation (II)	Jurgens, D., Finnethy, T., Mccorriston, J., Xu, Y. T., & Ruths, D. (2015). Geolocation Prediction in Twitter Using Social Networks : A Critical Analysis and Review of Current Practice. In AAAI Conference on Weblogs and Social Media. Miura, Y., Taniguchi, M., Taniguchi, T., & Ohkuma, T. (2016). A simple scalable neural networks based model for geolocation prediction in Twitter. WNUT 2016, 9026924, 235.	Feldhus Rakowski
14.7.	User characteristics (I)	Simaki, V., Mporas, I., & Megalooikonomou, V. Age Identification of Twitter Users: Classification Methods and Sociolinguistic Analysis. Flekova, L., Ungar, L., & Preotiuc-Pietro, D. (2016, August). Exploring stylistic variation with age and income on Twitter. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL (pp. 313-319). Zhang, J., Hu, X., Zhang, Y., & Liu, H. (2016, March). Your Age Is No Secret: Inferring Microbloggers' Ages via Content and Interaction Analysis. In ICWSM (pp. 476-485).	Dobler Zawierucha
21.7.	User characteristics (II)	Ljubešic, N., & Fišer, D. (2016). Private or Corporate? Predicting User Types on Twitter. WNUT 2016, 4. Priante, A., Hiemstra, D., Broek, T., Saeed, A., Ehrenhard, M., & Need, A. (2016). # WhoAmI in 160 Characters? Classifying Social Identities Based on Twitter.	Schulz Uhlemann
28.7.	Project presentations	--	all

Last modified: Thu Jun 29 10:03:33 CEST 2017