Schedule (June 7):
9:00 Introductions
9:15-10:15 Analyzing Text at the Middle Distance between the Close Read and Culturomics. Keynote talk by Prof. Marti Hearst, School of Information, UC Berkeley
10:15-10:30 Break
10:30-11:00 Analyzing Urdu Social Media for Sentiments using Transfer Learning with Controlled Translations. Smruthi Mukund and Rohini Srihari
11:00-11:30 Detecting Distressed and Non-distressed Affect States in Short Forum Texts. Michael Thaul Lehrman, Cecilia Ovesdotter Alm and Ruben A. Proano
11:30-12:00 Detecting Hate Speech on the World Wide Web. William Warner and Julia Hirschberg
12:00-1:00 Lunch
1:00-1:30 A Demographic Analysis of Online Sentiment during Hurricane Irene. Benjamin Mandel, Aron Culotta, John Boulahanis, Danielle Stark, Bonnie Lewis and Jeremy Rodrigue
1:30-2:00 Detecting Influencers in Written Online Conversations. Or Biran, Sara Rosenthal, Jacob Andreas, Kathleen McKeown and Owen Rambow.
2:00 - 2:30 Re-tweeting from a linguistic perspective. Aobo Wang, Tao Chen and Min-Yen Kan
2:30-3:00 Break
3:00-3:30 Robust kaomoji detection in Twitter. Steven Bedrick, Russell Beckley, Brian Roark and Richard Sproat
3:30-4:00 Language Identification for Creating Language-Specific Twitter Collections. Shane Bergsma, Paul McNamee, Mossaab Bagdouri, Clayton Fink and Theresa Wilson
4:00-4:30 Processing Informal, Romanized Pakistani Text Messages. Ann Irvine, Jonathan Weese and Chris Callison-Burch
4:30 Wrap Up
List of accepted papers:
We received great submissions, and it was a hard task to select the papers to accept. After spending a lot of time with reviews and the papers themselves and discussing each paper individually this is our final list of accepted papers. It should be a very interesting program!
- Analyzing Urdu Social Media for Sentiments using Transfer Learning with Controlled Translations
Smruthi Mukund - Detecting Influencers in Written Online Conversations
Or Biran, Sara Rosenthal, Jacob Andreas, Kathleen McKeown and Owen Rambow - Re-tweeting from a linguistic perspective
Aobo Wang, Tao Chen and Min-Yen Kan - Processing Informal, Romanized Pakistani Text Messages
Ann Irvine, Jonathan Weese and Chris Callison-Burch - Detecting Distressed and Non-distressed Affect States in Short Forum Texts
Michael Thaul Lehrman, Cecilia Ovesdotter Alm and Ruben A. Proano - A Demographic Analysis of Online Sentiment during Hurricane Irene
Benjamin Mandel, Aron Culotta, John Boulahanis, Danielle Stark and Bonnie Lewis - Detecting Hate Speech on the World Wide Web
William Warner and Julia Hirschberg - Language Identification for Creating Language-Specific Twitter Collections
Shane Bergsma, Paul McNamee, Mossaab Bagdouri, Clayton Fink and Theresa Wilson - Robust kaomoji detection in Twitter
Steven Bedrick, Russell Beckley, Brian Roark and Richard Sproat
Goals of the Workshop
Over the last few years, there has been a growing public and enterprise interest in 'social media' and their role in modern society. At the heart of this interest is the ability for users to create and share content via a variety of platforms such as blogs, micro-blogs, collaborative wikis, multimedia sharing sites, social networking sites etc. The volume and variety of user-generated content (UGC) and the user participation network behind it are creating new opportunities for understanding web-based practices and building socially intelligent and personalized applications. The goals for our workshop are to focus on sharing research efforts and results in the area of understanding language usage on social media.
While there is a rich body of previous work in processing textual content, certain characteristics of UGC on social media introduce challenges in their analyses. A large portion of language found in UGC is in the Informal English domain — a blend of abbreviations, slang and context specific terms; lacking in sufficient context and regularities and delivered with an indifferent approach to grammar and spelling. Traditional content analysis techniques developed for a more formal genre like news, Wikipedia or scientific articles do not translate effectively to UGC. Consequently, well-understood problems such as information extraction, search or monetization on the Web are facing pertinent challenges owing to this new class of textual data.
Workshops and conferences such as the NIPS workshop on Machine Learning for Social Computing, the International Conference on Social Computing and Behavioral Modeling, the Workshop on Algorithms and Models for the Web Graph, the International Conference on Weblogs and Social Media, the Workshop on Search on Social Media, the Workshop on Social Data on the Web etc., have focused on a variety of problem areas in Social Computing. Results of these meetings have highlighted the challenges in processing social data and the insights that can be garnered to complement traditional techniques (e.g., polling methods).
The goal of the workshop we propose is to bring together researchers from all of these areas but, in contrast to the above conferences and workshops, with a focused goal on exploration of characteristics and challenges associated with language on this evolving digital platform. We believe that the proposed workshop can serve as a focused venue for the linguistics community around the topic of language in social media.
Call For Papers
We invite original and unpublished research papers on all topics related to the intersection of computational linguistics and language in social media, including but not limited to the sample topics below. Note that we will also consider submissions on email corpora, with the caveat that the research should be generalizable or emphasize cross-applicability to web-based public social media.
The following is a list of possible topics that may be covered in contributions to this workshop:
- What are people talking about?
- What are the Named Entities and topics that people are making references to?
- What are effective summaries of volumes of user comments around a news-worthy event that offer a lens into the society's perceptions?
- How are cultures interpreting any situation in local contexts and supporting them in their variable observations on a social medium?
- How are they expressing themselves?
- What do word usages tell us about an active population or about individual allegiances or non-conformity to group practices?
- Are we seeing differences in how users self-present on this new form of digital media?
- Can groups of users be described in terms of their language use (e.g. stylistic properties)?
- Why do they scribe?
- What are the diverse intentions that produce the diverse content on social media?
- Can we understand why we share by looking at what we predominantly do with the medium? What emotions are people sharing about content?
- How are community structures and roles evidenced via language usage? Can content analysis shed more light on network properties of community such as link-based diffusion models?
- What level of linguistic analysis is possible/necessary in a noisy medium such as social media?
- How can existing analysis techniques be adapted to this medium?
- Language and network structure: How do language and social network properties interact?
- What properties of a network (structural connections) or the participants (personalities, influencers, followers) correlate with which properties of the language used?
- Semantic Web / Ontologies / Domain models to aid in social data understanding:
- Given the recent interest in the Semantic Web and LOD community to expose models of a domain, how can we utilize these public knowledge bases to serve as priors in linguistic analysis?
The small selection of recent publications in this area provided below gives an indication of the broad range of questions related to the study of language on social media platforms. Workshop participants and contributors are expected to come from various areas of research: NLP, Text Mining, Information Retrieval, Question Answering, Machine Learning, Semantic Web etc.
Selected Literature:
Cristian Danescu-Niculescu-Mizil, Michael Gamon and Susan Dumais. 2011: Mark my words! Linguistic style accommodation in social media. Proceedings of WWW, pp. 745--754, 2011.
Jacob Eisenstein, Brendan O'Connor, Noah. A. Smith, and Eric P. Xing, 2010: A latent variable model for geographic lexical variation. EMNLP, 2010.
Jennifer Foster, 2010. "cba to check the spelling" Investigating Parser Performance on Discussion Forum Posts. Proceedings of Human Language Technologies: 2010 Annual Conference of the North American Chapter of the ACL, , pp. 381--384, Los Angeles, CA.
Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit P. Sheth, 2009. Context and Domain Knowledge Enhanced Entity Spotting in Informal Text. International Semantic Web Conference 2009: 260-276.
Emre. Kiciman, 2010. Language differences and metadata features on twitter. Proceedings of Web N-gram Workshop, SIGIR 2010.
Meenakshi Nagarajan, Marti A. Hearst, 2009: An Examination of Language Use in Online Dating Profiles. ICWSM 2009.
John C. Paolillo, 2001: Language variation on internet relay chat: A social network approach. Journal of Sociolinguistics 5:2, 180 -213.
Daniel Ramage, Susan Dumais, Dan Liebling, 2010: Characterizing Microblogs with Topic Models. Proceedings of ICWSM 2010.
Alan Ritter, Colin Cherry, Bill Dolan, 2010: Unsupervised Modeling of Twitter Conversations. Proceedings of NAACL 2010.
James G. Shanahan, Yan Qu, and Janyce Wiebe (editors) 2006: Computing Attitude and Affect in Text: Theory and Applications Dordrecht: Springer
Sara Owsley Sood, Judd Antin, and Elizabeth F. Churchill. Profanity use in online communities. In submission to ACM SIGCHI, September 2011.
Sara Owsley Sood, Elizabeth F. Churchill, and Judd Antin. Automatic identification of personal insults on social news sites. In press - the Journal of the American Society for Information Science and Technology (JASIST), September 2011.
L. Venkata Subramaniam, Shourya Roy, Tanveer A. Faruquie, Sumit Negi , 2009: A Survey of Types of Text Noise and Techniques to Handle Noisy Text., AND 2009
Submissions:
Please submit papers in the NAACL paper format (full papers, PDF files only) at the START Submission page. Note that our multiple submission policy is the same as for NAACL. If you cannot present an accepted paper you need to notify us by April 27.
Important Dates:
Mar 26, 2012: paper due date
Apr 23, 2012: notification of acceptance
May 4, 2012: camera-ready deadline
Workshop date: June 7
Organizers:
Meena Nagarajan (IBM Almaden)
Sara Owsley Sood (Pomona College)Michael Gamon (Microsoft Research)
Program Committee:
|
John Breslin (U of Galway) |
| Cindy Chung (UTexas) |
| Munmun De Choudhury (Arizona State University) |
| Cristian Danescu-Niculescu-Mizil (Cornell) |
| Susan Dumais (Microsoft Research) |
| Jennifer Foster (Dublin City University) |
| Daniel Gruhl (IBM) |
| Kevin Haas (Microsoft) |
| Emre Kiciman (Microsoft Research) |
| Nicolas Nicolov (Microsoft) |
| Daniel Ramage (Stanford) |
| Alan Ritter (University of Washington) |
| Christine Robson (IBM) |
| Hassan Sayyadi (University of Maryland) |
| Valerie Shalin (Wright State) |
| Amit Sheth (Wright State) |
| Ian Soboroff (NIST) |
| Scott Spangler (IBM) |
| Patrick Pantel (Microsoft Research) |
| Andrew Gordon (USC) |
| Georgia Koutrika (IBM) |
| Hyung-il Ahn (IBM) |
| Smaranda Muresan (Rutgers) |
About the organizers:
Meena Nagarajan:
My research interests are in natural language understanding and text mining. My Ph.D. dissertation was in the analysis of informal user-generated content on social media platforms. Specifically, the focus was in understanding what people write about (named entity identification, topic modeling), how they write (characterizing word usages) and why they write (user intents, sharing sentiment, opinion expression mining) on social media platforms.
Sara Owsley Sood:
My research interests span sentiment analysis, social media, text classification and information retrieval. My most recent work is on detecting insults in online communities, specifically social news sites. The goal of this work is to better understand how to model insults within communities, toward tools to aid community managers in moderation efforts. My Ph.D. dissertation work (at Northwestern University in 2007) was on a digital theater system called Buzz, using social media as a source for performance content. This system was exhibited both as artistic installations in Chicago's Second City Theater and Wired Magazine's NextFest, and as a marketing research tool at the Wrigley Global Marketing Summit.
Michael Gamon:
I believe that while much research has focused on the network properties of social media, a combination of language and network information can lead to a better understanding of the dynamics of such systems and to applications that address user needs in terms of information management more effectively.



