Collaboration and Multimedia Systems

Building and assessing collaborative applications based on new and emerging technologies.


The Collaborative and Multimedia Group conducted research from the spring of 1998 to the fall of 2001, at which time its members moved to other groups. Subsequent work can be found on the pages of individual group members. This is a summary of the group's projects and papers. Papers written in 2001 are updated to reflect final status.

Online communication, collaboration, and communities are the basis for exciting new application areas for computers. Our group explored new technologies and novel applications in these areas. For online communication and collaboration, we strove to make audio and video as flexible and easy to use as text. We worked to help people access and collaborate around multimedia, in real-time and on-demand. We explored new communication paradigms and techniques to increase awareness and interaction among geographically distributed participants, and exploring how our technologies apply in the consumer space, developing enhanced media-browsing interfaces and services for interactive television. For online communities, we explored how sociological principles and data mining techniques can provide a framework for enhanced services, including formation of peer support networks, development of reputations, and incentive structures that encourage continued contribution for the collective good.


Low cost capture of audio-video: For audio-video content to be pervasive (e.g., recording of all presentations and meetings in a corporation), low cost capture and store is essential. Storage cost is already down to a few dollars per hour of video, but the cost of a camera operator to record a presentation or meeting is excessive. We worked on automated camera management technologies to produce high quality records, using vision technologies, microphone array technologies, and novel camera configurations, coupled with rules from cinematography.

Multimedia Browsing: In a world saturated with audio-video content, efficient browsing is essential. People are very good at skimming text, but the same is not true for digital multimedia. We created techniques that allow people to work more efficiently, including time compression (speeding-up audio-video with minimal distortion) and automatically generated indices and highlights. The latter exploit information from audio- and video-track analyses, speech-to-text and natural-language processing, viewing patterns of previous users of on-demand content, speaker slide transitions, and other sources. We explored novel user interfaces and addressing challenges that arise in client-server environments. We combined our work in multimedia browsing and annotations (below) to explore new paradigms for interactive television.

Tele-presentations, Tele-meetings, Connected spaces: We explored more effective uses of real-time audio and video for both formal and informal interactions. Issues include techniques to increase awareness and interaction among geographically distributed participants, impact of social factors and current practices in the adoption of these technologies, design of user interfaces, support for seamless flow between synchronous (real-time) and asynchronous (on-demand or post-meeting) collaboration, and infrastructure requirements for supporting these. We built and deployed several prototypes: Flatland, a desktop-to-desktop presentation environment; TELEP, a tele-presentation environment supporting live and remote desktop audiences; Collaborative video viewing (CVV), enabling people in different locations to watch and discuss pre-recorded videos, extended tutored video instruction pioneered at Stanford; Video Windows, linking public spaces in different buildings to facilitate informal interaction. The Sideshow project explored the influences on collaboration of interfaces that provide peripheral awareness of people and information.

Common Annotation Framework (CAF): Documents considered broadly (books, videos, web pages, and even physical objects) form the core of our human knowledge repository. Annotation is a common operation on documents: marking up significant passages, providing feedback to others by writing in the margins, creating highlights to share with colleagues. Annotations on paper are pervasive, but computer support for annotation is currently limited to non-existent, with inconsistent user-experience within and across applications. The Microsoft Common Annotation Framework (CAF) is targeted to be a lightweight, extensible, and storage-independent solution for support of rich and consistent annotation functionality. CAF was led by our group in cooperation with several product groups.

Multimedia Annotation: Current video viewing experience over the Internet is passive. People cannot mark up video content, make personal notes or indices into it, or share comments. MRAS, our web-based client-server system, enables rich in context annotations. Issues explored include software architecture, user interface, and social and collaborative aspects of this technology. We explored annotation use in on-demand training. The system also forms the foundation for archiving time-line synchronized meta-data with live collaborations (e.g., chat conversations, audience questions, slide transitions in tele-presentations). By adding meta-data on-the-fly after a live event has ended, users can seamlessly merge synchronous and asynchronous collaboration.

Face Modeling: Generating realistic 3D human face models and facial animations has been a persistent challenge in computer vision and graphics. We developed a system that constructs textured 3D face models from videos with minimal user interaction. Our system takes a video sequence of a face with an ordinary video camera. After five manual clicks on two images to tell the system where the eye corners, nose tip and mouth corners are, the system automatically generates a realistic looking 3D human head model and the constructed model can immediately be animated (different poses, facial expressions and talking). Optionally, in order to achieve higher accuracy, a user can click an additional set of three points to indicate the lower part of the face. A user, with a PC and an ordinary camera, can use our system to generate his/her face model in a few minutes.

Studies of Technology Use: A major aspect of our work consisted of assessment of systems through experiments and deployments in working groups in Microsoft and other organizations. These include studies of our experimental systems and leading-edge commercial communication and collaboration technologies. How do people access and watch archived video and audio presentations on demand? How should presentations be designed for later viewing, and what tools can support this? How do multiple monitor users exploit extra display space, and how can applications support this better? How do large groups make effective use of NetMeeting? How is Office10 web-based annotation used by teams in developing products?

Social Analysis of Online Communities: Interfaces to social cyberspaces, such as discussion boards, email lists and chat rooms, present limited information about the social context of the interactions they host. Basic social cues about the size and nature of groups, reputations of individuals and quantity and quality of their contributions are all missing. Discovery, navigation and self-regulation are growing challenges as the size and scope of these cyberspaces expand. The Netscan project has used sociological principles and data mining techniques to address these challenges for newsgroups and discussion boards. Interfaces built using this meta information provide incentive structures that can catalyze the online social spaces through competitive cooperation and increase content quality and user satisfaction.

Multi-University / Research Laboratory Seminar Series (MURL): Our group sponsored the MURL online seminar series, making Computer Science research seminars from major institutions freely available on the web.

Selected Publications

Project-Related Conference Papers, Journal Articles, Book Chapters, and Reports