News about LSM 2012:
Sara Owsley Sood (http://www.cs.pomona.edu/~sara/Site/Home.html) has joined Meenakshi Nagarajan and Michael Gamon as co-organizer!
We are working on a proposal for LSM 2012 which we plan to submit to NAACL-HLT (Montreal) - stay tuned!
Workshop papers:
Workshop Papers are available in the ACL anthology. We will link to the slides, as the authors make them available to us.
Keynote: Automating Analysis of Social Media Communication: Insights from CMDA - Susan Herring
How can you say such things?!?: Recognizing Disagreement in Informal Political Argument - Rob Abbott, Marilyn Walker, Pranav Anand, Jean E. Fox Tree, Robeson Bowmani and Joseph King. Presentation slides
What pushes their buttons? Predicting comment polarity from the content of political blog posts - Ramnath Balasubramanyan, William W. Cohen, Doug Pierce and David P. Redlawsk
Contextual Bearing on Linguistic Variation in Social Media - Stephan Gouws, Donald Metzler, Congxing Cai and Eduard Hovy
Sentiment Analysis of Twitter Data - Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow and Rebecca Passonneau
Detecting Forum Authority Claims in Online Discussions - Alex Marin, Bin Zhang and Mari Ostendorf
Annotating Social Acts: Authority Claims and Alignment Moves in Wikipedia Talk Pages - Emily M. Bender, Jonathan T. Morgan, Meghan Oxley, Mark Zachry, Brian Hutchinson, Alex Marin, Bin Zhang and Mari Ostendorf
Analyzing the Dynamic Evolution of Hashtags on Twitter: a Language-Based Approach - Evandro Cunha, Gabriel Magno, Giovanni Comarela, Virgilio Almeida, Marcos André Goncalves and Fabricio Benevenuto
Why is ”SXSW” trending? Exploring Multiple Text Sources for Twitter Topic Summarization - Fei Liu, Yang Liu and Fuliang Weng. Presentation slides
Language use as a reflection of socialization in online communities - Dong Nguyen and Carolyn P. Rosé
Email Formality in the Workplace: A Case Study on the Enron Corpus - Kelly Peterson, Matt Hohensee and Fei Xia
Workshop Description:
Over the last few years, there has been a growing public and enterprise interest in 'social media' and their role in modern society. At the heart of this interest is the ability for users to create and share content via a variety of platforms such as blogs, micro-blogs, collaborative wikis, multimedia sharing sites, social networking sites etc. The volume and variety of user-generated content (UGC) and the user participation network behind it are creating new opportunities for understanding web-based practices and building socially intelligent and personalized applications. Investigations around social data can be broadly categorized along the following dimensions: (a) understanding aspects of the user-generated content (b) modeling and observing the user network that the content is generated in and (c) characterizing individuals and groups that produce and consume the content.
The goals for this workshop are to focus on sharing research efforts and results in the first area of understanding language usage on social media.
While there is a rich body of previous work in processing textual content, certain characteristics of UGC on social media introduce challenges in their analyses. A large portion of language found in UGC is in the Informal English domain — a blend of abbreviations, slang and context specific terms; lacking in sufficient context and regularities and delivered with an indifferent approach to grammar and spelling. Traditional content analysis techniques developed for a more formal genre like news, Wikipedia or scientific articles do not translate effectively to UGC. Consequently, well-understood problems such as information extraction, search or monetization on the Web are facing pertinent challenges owing to this new class of textual data.
Topics of Interest
- What are people talking about?
What are the Named Entities and topics that people are making references to?
What are effective summaries of volumes of user comments around a news-worthy event that offer a lens into the society's perceptions?
How are cultures interpreting any situation in local contexts and supporting them in their variable observations on a social medium?
- How are they expressing themselves?
What do word usages tell us about an active population or about individual allegiances or non-conformity to group practices?
Are we seeing differences in how users self-present on this new form of digital media?
- Why do they scribe?
What are the diverse intentions that produce the diverse content on social media?
Can we understand why we share by looking at what we predominantly do with the medium? What emotions are people sharing about content?
- What level of linguistic analysis is possible/necessary in a noisy medium such as social media?
How can existing analysis techniques be adapted to this medium?
- Language and network structure: How do language and social network properties interact?
What properties of a network (structural connections) or the participants (personalities, influencers, followers) correlate with which properties of the language used?
- Semantic Web / Ontologies / Domain models to aid in social data understanding:
Given the recent interest in the Semantic Web and LOD community to expose models of a domain, how can we utilize these public knowledge bases to serve as priors in linguistic analysis?
- How does what people say on the web correlate with other kinds of measurable behavior?
Can we correlate consumer sentiment on the web with purchase behavior, and under what circumstances?
Does web chatter about a product or service serve in any way to predict future demand?
Related events:
AAAI Workshop on Analyzing Microtext (San Francisco, August 2011)
Organizers:
Meenakshi Nagarajan (IBM Research)
Michael Gamon (Microsoft Research)
Program committee members:
- John Breslin (U of Galway)
- Cindy Chung (UTexas)
- Munmun De Choudhury (Arizona State University)
- Cristian Danescu-Niculescu-Mizil (Cornell)
- Susan Dumais (Microsoft Research)
- Jennifer Foster (Dublin City University)
- Sam Gosling (UTexas)
- Julia Grace (IBM)
- Daniel Gruhl (IBM)
- Kevin Haas (Microsoft)
- Emre Kiciman (Microsoft Research)
- Nicolas Nicolov (Microsoft)
- Daniel Ramage (Stanford)
- Alan Ritter (University of Washington)
- Christine Robson (IBM)
- Hassan Sayyadi (University of Maryland)
- Valerie Shalin (Wright State)
- Amit Sheth (Wright State)
- Ian Soboroff (NIST)
- Hari Sundaram (ASU)
- Scott Spangler (IBM)
- Smaranda Muresan (Rutgers)



