... a project by Tatjana Scheffler
Goal:
Access Twitter data for linguistic research
Posters/Slides
Using Twitter Data for (Linguistic) Research
How-To: Corpus Construction
Links and Resources
Measuring
Social Jetlag in Twitter Data. (with Christopher Kyba) Proceedings of the Tenth International AAAI Conference on Web and Social Media (ICWSM 2016), AAAI, Köln, Germany. 2016. |
|
Dialog Act Recognition for Twitter Conversations. (with Elina
Zarisheva) Proceedings of the Workshop on Normalisation and Analysis of Social Media Texts (NormSoMe), Portorož, Slovenia. 2016. |
|
Dialog act annotation for Twitter conversations. SigDial Conference, September 2, 2015, Prague, Czech Republic |
|
Dialogakte in deutschen Twitterkonversationen. (German) Langer Tag der Wissenschaft, May 9, 2015, Universität Potsdam |
|
Conversations on German Twitter. Social Media Workshop, October 24, 2014, FU Berlin |
|
Introduction to Twitter data and its use for linguistic research.
Contains example for data (tweet in JSON format). (German) Gastvortrag im Seminar “Soziale Bewegungen im Internet”, Mai 2014, FU Berlin |
|
A German Twitter Snapshot. Corpus construction and analysis. (English) 9th Language Resources and Evaluation Conference (LREC), May 26-31, 2014, Reykjavik, Iceland |
|
Analyse von Diskursen in Social Media. Presentation of the BMBF-subproject. (German) Workshop “Grenzen überschreiten – Digitale Geisteswissenschaft heute und morgen”, 28.2.2014, Berlin |
|
Basic statistics about German twitter data. (German) Tausend Fragen – Eine Stadt, 8.6.2013, Golm/Potsdam |
|
Erstellung eines deutschen Twitterkorpus. (German) DGfS-CL Postersession, 35. Tagung der Deutschen Gesellschaft für Sprachwissenschaft, 14.3.2013, Potsdam |
General comments on using Twitter for linguistic research - coming soon!
The Twitter API doesn't allow distribution of aggregated tweets (= corpora), but researchers can collect their own data. This package allows the real-time recording of a representative portion of Twitter data in a specific language.
In particular, for languages other than English, it is possible to collect a near-complete snapshot of tweets over a real-time period (without hitting API rate limits).
Some programming experience is helpful, but running the script should be doable without it if you are able to install the necessary Python packages.
In order to build your own custom Twitter corpus, in particular of all tweets in a particular language, follow the steps below:
... please email me if you want tools included in this list.