Natural Language Processing

CS4120/6120 Fall 2021

Course Description   Assignments   Schedule



Course Description

Seminar class in which we read, critique and discuss recent research papers about the methods, applications and implications of social media analysis. Social media analysis is a multidisciplinary subject that encompasses a broad range of topics. In this class we will focus on: (i) cutting-edge Machine Learning/Deep Learning and Natural Language Processing methods to infer signals from single posts, individual users, and user networks; (ii) applications of social media analysis to real-world analyses in domains such as the social sciences, political science, mental-health and epidemiology; and (iii) the implications, as well the ethical and moral considerations, of deploying social media analysis systems and Human-centered AI systems more broadly.

Students will take turns presenting a paper each week (2 presentations per class) which we will discuss together critically during class meetings. The presentations should also cover any relevant background material; this is important because in general we will have a strong bias toward very recent work. Critical discussion will follow, and we will conclude class by talking about potential means of extending the work. Students will also be responsible for writing short summaries of the research papers we read each week. Student led presentations will be complemented with several guest lectures from top researchers working on different facets of social media analysis.

In addition to the presentations and paper reviews, students will propose and execute a mini project culminating in a research paper (draft). The results and findings of the project will be presented in a class, structured as a research talk. Therefore, this class will very much be research-oriented.

⚠️ As per university policy, the class the will be in-person only and everyone must be wearing a mask.

Prerequisites

Students should have an interest in conducting (or learning how to conduct) research. There are no official pre-reqs for this class, however students are expected to have some background in data science, machine learning and statistical natural language processing. This is mainly to make students life easier since we will not cover background knowledge related to the papers in class.

Assignments

Oral Presentations

Students will rotate responsibility for presenting (one of the) assigned papers and leading the ensuing discussion. The presentation should provide necessary background, the core contribution of the paper, (perceived) strengths and weaknesses, and ideas for improving the work.

Written Critiques

Prior to every class all students are to write a brief “review” of one of the assigned papers for that class (their choice). These should include a concise summary of the work and enumerate at least two strong points and at least two weak points therein. Furthermore, these critiques will note ways in which the work could potentially be extended/improved (which may be coupled with the weak points). Assessments are to be submitted via canvas by 11.59pm of the day prior to class (i.e. Mondays and Thursdays). Note that is assumed students read all the assigned papers, but may read only one in-depth. The idea is basically to ensure robust discussion during class meetings.

Project

The class will culminate in projects with accompanying papers (along with the respective code, when possible). The papers will be structured as research papers following a template, available here and the code should be submitted as a jupyter notebook (if possible). Students may work together but then a concomitant increase in complexity and a clear delineation of contributions is expected, for fairness sake.

Students will be able to choose which projects to work on. Projects may involve replicating a state-of-the-art paper we read, or, ideally, extending one of these or developing new ideas in the area. Students are to write a short (1 page) project proposal including the motivation, goals, expected results and a timeline of key milestones. Project proposals will then be reviewed and approved by the instructor. Some class time will be reserved for students to present their project ideas, results and findings (these will be structured as research talks).

Projects will be evaluated for their relevance, scientific rigour, and originality. The papers will be evaluated as paper draft submissions using the template available here.

Grades
  • 55% Project and write-up
  • 20% In class presentations of papers
  • 15% Participation
  • 10% Written summaries/critiques of papers (on canvas)

Schedule

Tentative weekly schedule. The assigned papers are unlikely to change but the presentation dates might move to accommodate the (busy) schedules of the invited speakers.

Date   Agenda   Speaker
Fri 9/10   🎉 Introduction   Silvio
Hate-speech Detection        
Tue 9/14   🎓 Lecture: Social Media Analysis: methods, applications and implications   Silvio
    📃 Contextualizing Hate Speech Classifiers with Post-hoc Explanation, Kennedy, B., Jin, X., Davani, A. M., Dehghani, M., & Ren, X. (2020)   Silvio
         
additional readings   📚 A Survey on Automatic Detection of Hate Speech in Text; Fortuna, P., & Nunes, S. (2018).    
    📚 Towards generalisable hate speech detection: a review on obstacles and solutions; FYin, W., & Zubiaga, A. (2021).    
    📚 A Survey on Hate Speech Detection using Natural Language Processing; Schmidt, A., & Wiegand, M. (2017).    
Misinformation Detection        
Fri 9/17   📃 Deep Structure Learning for Rumor Detection on Twitter, Huang, Q., Zhou, C., Wu, J., Wang, M., & Wang, B. (2019).   Sanjana
    📃 Detecting Propaganda Techniques in Memes, Dimitrov, D., Ali, B.B., Shaar, S., Alam, F., Silvestri, F., Firooz, H., Nakov, P. & Da San Martino, G., (2021)   Hye Sun
         
additional readings   📚 Combating fake news: A survey on identification and mitigation techniques;Sharma, K., Qian, F., Jiang, H., Ruchansky, N., Zhang, M., & Liu, Y. (2019).    
    📚 A survey on fake news and rumour detection techniques; Bondielli, A., & Marcelloni, F. (2019).    
    📚 The science of fake news; Lazer, D.M., Baum, M.A., Benkler, Y., Berinsky, A.J., Greenhill, K.M., Menczer, F., Metzger, M.J., Nyhan, B., Pennycook, G., Rothschild, D. & Schudson, M., (2018)    
         
         
Digital Activism        
Tue 9/21   📃 Reclaiming Stigmatized Narratives: The Networked Disclosure Landscape of #MeToo, Gallagher et al. (2019)   Xiaoyu
    📃 Say Their Names: Resurgence in the Collective Attention toward Black Victims of Fatal Police Violence Following the Death of George Floyd, Wu, H.H., Gallagher, R.J., Alshaabi, T., Adams, J.L., Minot, J.R., Arnold, M.V., Welles, B.F., Harp, R., Dodds, P.S. & Danforth, C.M., (2021)   Mayur
         
additional readings   📚 Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter, Alshaabi, T., Adams, J.L., Arnold, M.V., Minot, J.R., Dewhurst, D.R., Reagan, A.J., Danforth, C.M. and Dodds, P.S., (2021)    
         
Fri 9/24   📃 Demographic Representation and Collective Storytelling in the Me Too Twitter Hashtag Activism Movement, Mueller, A., Wood-Doughty, Z., Amir, S., Dredze, M., & Nobles, A. L. (2021)   Sinjini
    🎓 Guest Lecture: #HashtagActivism: Networks of Race and Gender Justice   Brooke Foucault Welles
         
additional readings   📚 # HashtagActivism: Networks of race and gender justice; Jackson, S. J., Bailey, M., & Welles, B. F. (2020).    
         
Deep Learning        
Tue 9/28   📃 Compositional Demographic Word Embeddings, Welch, C., Kummerfeld, J. K., Pérez-Rosas, V., & Mihalcea, R. (2020)   Michael
    📃 Developing a Twitter bot that can join a discussion using state-of-the-art architectures, Çetinkaya, Y. M., Toroslu, İ. H., & Davulcu, H. (2020).   Grainne
         
Fri 10/1   📃 SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics, Yin, D., Meng, T., & Chang, K. W. (2020)   Shijia
    📃 Adversarial Learning for Zero-Shot Stance Detection on Social Media, Allaway, E., Srikanth, M., & McKeown (2021)   Jessica
Mental Health        
Tue 10/5   📃 Suicide Ideation Detection via Social and Temporal User Representations using Hyperbolic Learning, Sawhney, R., Joshi, H., Shah, R., & Flek, L. (2021)   Aldo
    📃 Characterizing Anxiety Disorders with Online Social and Interactional Networks, Dutta, S., & De Choudhury, M. (2020).   Carlos
         
Fri 10/8   📃 Inferring Social Media Users’ Mental Health Status from Multimodal Information, Xu, Z., Pérez-Rosas, V., & Mihalcea, R. (2020)   Alex
    Person-Centered Predictions of Psychological Constructs with Social Media Contextualized by Multimodal Sensing, Saha, K., Grover, T., Mattingly, S.M., Swain, V.D., Gupta, P., Martinez, G.J., Robles-Granda, P., Mark, G., Striegel, A. & De Choudhury, M., (2021).   Sanjana
         
additional readings   📚 Methods in predictive techniques for mental health status on social media: a critical review, Chancellor, S., & De Choudhury, M. (2020)    
    📚 Do Models of Mental Health Based on Social Media Data Generalize?, Harrigian, K., Aguirre, C., & Dredze, M. (2020)    
         
Multimodal Social Media Analysis        
Tue 10/12   📃 MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks, Wu, P. Y., & Mebane Jr, W. R. (2021)   Xiaoyu Fan
    📃 Reasoning with Multimodal Sarcastic Tweets via Modeling Cross-Modality Contrast and Semantic Association, Xu, N., Zeng, Z., & Mao, W. (2020)   Grainne
         
Public Health        
Fri 10/15   📃 Quantifying Community Characteristics of Maternal Mortality Using Social Media, Abebe, R., Giorgi, S., Tedijanto, A., Buffone, A., & Schwartz, H. A. (2020)   Manaswini
    📃 Examining Peer-to-Peer and Patient-Provider Interactions on a Social Media Community Facilitating Ask the Doctor Services , Nobles, A. L., Leas, E. C., Dredze, M., & Ayers, J. W. (2020)   Jessica
         
additional readings   📚 Social Monitoring for Public Health; Paul, M. J., & Dredze, M. (2017). Synthesis Lectures on Information Concepts, Retrieval, and Services [preprint]    
         
Covid-19
       
Tue 10/19   📃 Explaining the ‘Trump Gap’ in Social Distancing Using COVID Discourse, Van Loon, A., Stewart, S., Waldon, B., Lakshmikanth, S.K., Shah, I., Guntuku, S.C., Sherman, G., Zou, J. & Eichstaedt, J., 2020 (2020)   Hye Sun
    📃 COVID-19 Surveillance through Twitter using Self-Supervised and Few Shot Learning, Lwowski and Rad (2020)   Sinjini
         
Fri 10/22   🚀 Discussion of Project Ideas    
    🎓 Discussion: From #HashtagActivism to Data Justice   Brooke Foucault Welles
         
additional readings   📚 Twitter and Facebook posts about COVID-19 are less likely to spread false and low-credibility content compared to other health topics, Broniatowski, D. A., Kerchner, D., Farooq, F., Huang, X., Jamison, A. M., Dredze, M., & Quinn, S. C. (2020).    
         
Social Sciences        
Tue 10/26   📃 Psychosocial Effects of the COVID-19 Pandemic: Large-scale Quasi-Experimental Study on Social Media, Saha, K., Torous, J., Caine, E. D., & De Choudhury, M. (2020)   Shijia
    🎓 Guest Lecture: The Light and Dark Side of Social Media   Nick Beauchamp
         
Fri 10/29   📃 Who Says What with Whom: Using Bi-Spectral Clustering to Organize and Analyze Social Media Protest Networks, Joseph et al. (2020).   Michael
    🎓 Guest Lecture: Measuring Algorithmically Infused Societies [paper]   Tina Eliassi-rad
    🚀 Project proposal deadline    
         
Political Science        
Tue 11/2   📃 RAFFMAN: Measuring and Analyzing Sentiment in Online Political Forum Discussions with an Application to the Trump Impeachment, Tachaiya, J., Gharibshah, J., Esterling, K. E., & Faloutsos, M. (2021).   Jessica
    🎓 Guest Lecture: Measuring Misinformation on Social Media   David Lazer
         
Fri 11/5   📃 How Metaphors Impact Political Discourse: A Large-Scale Topic-Agnostic Study Using Neural Metaphor Detection, Prabhakaran, V., Rei, M., & Shutova, E. (2021)   Mayur
    📃 Unsupervised User Stance Detection on Twitter, Darwish, K., Stefanov, P., Aupetit, M., & Nakov, P. (2020)   Carlos
         
Bias and Fairness        
Tue 11/9   📃 The Risk of Racial Bias in Hate Speech Detection, Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019)   Aldo
    📃 Social Bias Frames: Reasoning about Social and Power Implications of Language, Sap, M., Gabriel, S., Qin, L., Jurafsky, D., Smith, N. A., & Choi, Y. (2020)   Grainne
         
Fri 11/12   📃 A Taxonomy of Ethical Tensions in Inferring Mental Health States from Social Media, Chancellor, S., Birnbaum, M. L., Caine, E. D., Silenzio, V. M., & De Choudhury, M. (2019)   Manaswini
    📃 Gender and racial fairness in depression research using social media, Aguirre, C., Harrigian, K., & Dredze, M. (2021)   Xiaoyu Fan
         
Projects I        
Tue 11/16   🎉 No class; work on projects    
Fri 11/19   🚀 Project Presentations (Preliminary)    
         
Ethics & Moral I        
Tue 11/23   📚 How Twitter Gamifies Communication, Nguyen, C. T., & Lackey, J. (2021)    
    📚 Big Data’s End Run around Anonymity and Consent, Barocas, S., & Nissenbaum, H. (2014)    
    🎓 Guest Lecture   John Basl, Vance Ricks and Meica Magnani
         
Fri 11/26   🎉 Thanks Giving    
         
Ethics & Moral II        
Tue 11/30   🎓 Guest Lecture: Ethics and Fairness of Mental Health Research using Social Media   Stevie Chancellor and Carlos Aguirre
         
Fri 12/3   🎉 No class; work on projects    
         
Projects II        
Tue 12/7   🚀 Project Presentations (Final)