Computerlinguistische Analyse von Twitterdaten

Organisatorisches

Dozentin: Tatjana Scheffler

Zeit: Mittwochs, 12-14 Uhr
Sommersemester 2013

Ort: Golm, Haus 14, Raum 215/16 (NEU!)

Module: AM4

Voraussetzungen zur Scheinvergabe

Aktive Mitarbeit
Vortrag
Eigenes Projekt und Ausarbeitung: Abgabe 31.08.2013

Hinweise für die Ausarbeitung

Hinweise

Kursbeschreibung

Soziale Medien wie Twitter bieten neue Datenquellen für linguistische Analysen. Erste Arbeiten existieren zur Verarbeitung von Twitterdaten und deren sprachwissenschaftlicher Betrachtung, beziehen sich allerdings fast ausschließlich auf englische Daten: Diese sind zahlreicher, leichter zu erhalten, und oft auch leichter zu analysieren. Zahlreiche Tools, z.B. ein dezidierter Twitter-Tagger existieren schon für die Verarbeitung von englischsprachigen Social Media-Daten. In dem Blockseminar soll die computerlinguistische Analyse von deutschen Tweets anhand von vorhandenen und neu gecrawlten Daten gemeinsam erarbeitet werden. Vorverarbeitungsskripte sind vorhanden und können angepasst werden. Mögliche Themen sind Stimmungsanalyse, Themenklassifizierung, die Erstellung von Subkorpora, lexikalische Studien (Zeit- oder Ortsbezug von Wörtern) und vieles mehr.

Semesterplan

Syllabus

Datum Thema Details

10.4. Einführung, Motivation Folien 1

17.4. Korpuserstellung Folien 2

24.4. Vorverarbeitung, Technisches & Praktisches Folien 3

1.5. Keine Veranstaltung (Tag der Arbeit)

8.5. Isa Fodor, Normalisierung
Han, Cook, Baldwin, 2012: "Automatically Constructing a Normalisation Dictionary for Microblogs"

15.5. Matthias Wegel, Topikerkennung
Karandikar, 2010: "Clustering short status messages: A topic model based approach"

22.5. Johannes Gontrom, Trendverfolgung
Benhardus, Kalita, 2012: "Streaming Trend Detection in Twitter"

29.5. Katarina Krüger, Sentimentanalyse I
Brown, Frazee, Beaver, Liu, Hoyt, Hancock, 2011: "Evolution of Sentiment in the Libyan Revolution", Blogpost: http://languagelog.ldc.upenn.edu/nll/?p=3537

5.6. Anna Lukowiak, Sentimentanalyse II
Pak, Paroubek, 2010: "Twitter as a Corpus for Sentiment Analysis and Opinion Mining" Proc. of LREC

12.6. Ulf Hillenbrand, Sentimentanalyse III
Davidov, Tsur, Rappoport, 2010: "Enhanced Sentiment Learning Using Twitter Hashtags and Smileys"

19.6. Frank Bubitz, Conversation Retrieval
Magnani, Montesi, Rossi, 2012: "Conversation retrieval for microblogging sites"

26.6. Steve Wendler, Lokationsabhängige Wörter
Arakawa, Tagashira, Fukuda, 2012: "Spatial Statistics with Three-tier Breadth First Search for Analyzing Social Geocontents"

3.7. Kurzpitch der Ausarbeitungen/Projekte alle

10.7. Zusammenfassung, Abschlussdiskussion

Literatur

Vorverarbeitung, Säuberung

Han, Cook, Baldwin, 2012: “Automatically Constructing a Normalisation Dictionary for Microblogs” Proc. of EMNLP www.cs.toronto.edu/~pcook/Hanetal2012.pdf
Petrovic, Osborne, Lavrenko: “The Edinburgh Twitter Corpus” (deprecated)
Tokenizer, Emoticons: https://github.com/brendano/tweetmotif

Topikerkennung

O’Connor, Krieger, Ahn, 2010: “TweetMotif: Exploratory Search and Topic Summarization for Twitter” https://github.com/brendano/tweetmotif http://anyall.org/oconnor_krieger_ahn.icwsm2010.tweetmotif.pdf http://brenocon.com/blog/2009/05/announcing-tweetmotif-for-summarizing-twitter-topics-with-a-dash-of-nlp/
Kireyev, Palen, Anderson, 2009: “Applications of Topics Models to Analysis of Disaster-Related Twitter Data” www.umiacs.umd.edu/~jbg/nips_tm_workshop/15.pdf
Karandikar, 2010: “Clustering short status messages: A topic model based approach” http://ebiquity.umbc.edu/get/a/publication/518.pdf

Trenderkennung und -verfolgung

Mathioudakis, Koudas, 2010: “TwitterMonitor: Trend Detection over the Twitter Stream” http://www.inf.utfsm.cl/~mmendoza/descargas/p1155-mathioudakis.pdf
Benhardus, Kalita, 2012: “Streaming Trend Detection in Twitter” http://www.cs.uccs.edu/~jkalita/papers/2012/BenhardusJamesIJWBC2012.pdf
Becker, Naaman, Gravano, 2011: “Beyond Trending Topics: Real-World Event Identification on Twitter” http://academiccommons.columbia.edu/download/fedora_content/download/ac:135416/CONTENT/cucs-012-11.pdf

Tonalitätsanalyse

Meinungsbild der Zielgruppe (Sentiment Analysis)

Pak, Paroubek, 2010: “Twitter as a Corpus for Sentiment Analysis and Opinion Mining” Proc. of LREC http://www.lrec-conf.org/proceedings/lrec2010/pdf/385_Paper.pdf
Davidov, Tsur, Rappoport, 2010: “Enhanced Sentiment Learning Using Twitter Hashtags and Smileys” www.aclweb.org/anthology/C10-2028
Barbosa, Feng, 2010: “Robust Sentiment Detection on Twitter from Biased and Noisy Data” www.aclweb.org/anthology/C10-2005
Brown, Frazee, Beaver, Liu, Hoyt, Hancock, 2011: “Evolution of Sentiment in the Libyan Revolution” Blogpost: http://languagelog.ldc.upenn.edu/nll/?p=3537 Working Paper https://webspace.utexas.edu/dib97/libya-report-10-30-11.pdf

Soziolinguistik, Stil, Variabilität

Linguistics of Retweets http://danzarrella.com/retweet-linguistics.html#
Bamman, Eisenstein, Schnoebelen, 2012: “Gender in Twitter: Styles, stances, and social networks” http://arxiv.org/abs/1210.4567
Schnoebelen, 2012: “Do You Smile with Your Nose? Stylistic Variation in Twitter Emoticons” http://repository.upenn.edu/pwpl/vol18/iss2/14/

Profiling

Erkenne Meinungsführer und Multiplikatoren

Weng, Lim, Jiang, He, 2010: “TwitterRank: finding topic-sensitive influential twitterers” http://dl.acm.org/citation.cfm?id=1718520

IR/DR

Zanzotto, Pennacchiotti, Tsioutsiouliklis, 2011: “Linguistic Redundancy in Twitter” Proc. of EMNLP www.aclweb.org/anthology/D11-1061
Magnani, Montesi, Rossi, 2012: “Conversation retrieval for microblogging sites” http://link.springer.com/article/10.1007/s10791-012-9189-9/fulltext.html

Weitere mögliche Themen

Semantic Role Labelling
Conversation Modelling

Quellen für weitere Literatur

ICWSM 2012 http://www.icwsm.org/2012/
ICWSM 2011 http://www.icwsm.org/2011/
ACL Anthology http://aclweb.org/anthology-new/ (z.B. LREC, ACL conferences)
Twitter Research Bibliography http://www.danah.org/researchBibs/twitter.php (lückenhaft für CL)
TREC Microblog Track http://trec.nist.gov/pubs/trec20/t20.proceedings.html
Language Log on “Twitter Linguistics” http://languagelog.ldc.upenn.edu/nll/?p=3536

Mögliche Projekt-/Ausarbeitungsthemen:

Spamerkennung in Tweets
Verbesserte Tweet-Suche (z.B. durch Synonyme)
Tonalitätsanalyse (z.B. “2013” vor/nach Silvester)
Übertragung eines der behandelten Themen auf deutsche Twitterdaten
Identifikation von ort- oder zeitabhängigen Wörtern
Analyse eines linguistischen Phänomens (z.B. weil-V2)
Dialoge auf Twitter
Twitterdaten & Standard-NLP-Tools (Tagger, Parser, etc.): Wie vertragen sie sich, welche speziellen Tools gibt es schon, wie auf deutsche Daten anpassen
Reguläre Hausarbeit: Vergleich von Ansätzen, Evaluierung (spezieller Fokus auf deutsche Daten)
... (eigene Ideen)

Dozentin:	Tatjana Scheffler
Zeit:	Mittwochs, 12-14 Uhr Sommersemester 2013
Ort:	Golm, Haus 14, Raum 215/16 (NEU!)
Module:	AM4

Datum	Thema	Details
10.4.	Einführung, Motivation	Folien 1
17.4.	Korpuserstellung	Folien 2
24.4.	Vorverarbeitung, Technisches & Praktisches	Folien 3
1.5.	Keine Veranstaltung (Tag der Arbeit)
8.5.	Isa Fodor, Normalisierung Han, Cook, Baldwin, 2012: "Automatically Constructing a Normalisation Dictionary for Microblogs"
15.5.	Matthias Wegel, Topikerkennung Karandikar, 2010: "Clustering short status messages: A topic model based approach"
22.5.	Johannes Gontrom, Trendverfolgung Benhardus, Kalita, 2012: "Streaming Trend Detection in Twitter"
29.5.	Katarina Krüger, Sentimentanalyse I Brown, Frazee, Beaver, Liu, Hoyt, Hancock, 2011: "Evolution of Sentiment in the Libyan Revolution", Blogpost: http://languagelog.ldc.upenn.edu/nll/?p=3537
5.6.	Anna Lukowiak, Sentimentanalyse II Pak, Paroubek, 2010: "Twitter as a Corpus for Sentiment Analysis and Opinion Mining" Proc. of LREC
12.6.	Ulf Hillenbrand, Sentimentanalyse III Davidov, Tsur, Rappoport, 2010: "Enhanced Sentiment Learning Using Twitter Hashtags and Smileys"
19.6.	Frank Bubitz, Conversation Retrieval Magnani, Montesi, Rossi, 2012: "Conversation retrieval for microblogging sites"
26.6.	Steve Wendler, Lokationsabhängige Wörter Arakawa, Tagashira, Fukuda, 2012: "Spatial Statistics with Three-tier Breadth First Search for Analyzing Social Geocontents"
3.7.	Kurzpitch der Ausarbeitungen/Projekte	alle
10.7.	Zusammenfassung, Abschlussdiskussion