The common meeting is an integral part of everyday life for most workgroups. However, due to travel, time, or other constraints, people are often not able to attend all the meetings they need to. Teleconferencing and recording of meetings can address this problem. In this project we use a variety of capture devices (a novel 360º camera, a whiteboard camera, an overview camera, and a microphone array) to provide a rich experience for people who want to participate in a meeting from a distance.
During the research phase of this project these people contributed for creating the devices and technologies:
- Ross Cutler - at the time Researcher in Communication, Collaboration, and Multimedia Processing group in Microsoft Research. Later Ross Cutler become architect in the product team assigned to convert the technologies we designed into a product now called RoundTable device.
- Anoop Gupta - at the time group manager of Communication, Collaboration, and Multimedia Processing group in Microsoft Research. Later Anoop Gupta became Corporate Vice President of Unified Communications division. One of the teams in his division was the RoundTable.
- Alex Colburn - RSDE in Communication, Collaboration, and Multimedia Processing group in Microsoft Research. Now Alex Colburn pursues his PhD degree in University of Washington.
Meetings are part of everyday life of each workgroup. Due to location or time constrains capturing, recording, broadcasting, and analyzing meetings is important to keep members of the group updated, engaged, and productive even when they can’t meet face to face. In this project we designed capturing devices and processing technologies to provide better meeting experience for remote participants.
The project was initiated in 2001 in the Communication, Collaboration, and Multimedia Processing group in Microsoft Research with Anoop Gupta as group manager. Later Anoop Gupta toke the position of Bill Gates technical assistant and the research group was merged with Signal Processing Group in Microsoft Research with group manager Rico Malvar. The new name of the research group was Communication, Collaboration and Signal Processing.
Around 2003 a decision was made to convert the technologies, designed during this project, into Microsoft product. Ross Cutler left Microsoft Research to become architect of the new team. Microsoft researchers continued to help the product team. Three years later the product was named RoundTable and deployed for evaluation in most of the conference rooms in Microsoft. Now it is available for our customers and is the default client for Microsoft LiveMeeting and Microsoft Office Communicator. More information for this product can be found here. See Bill Gates and Jeff Rikes introducing the new product on CNN here.
The meetings are captured by:
- Novel 360º camera called RingCam which consists of five 90º wide angle cameras. The view angles of the cameras overlap, which allows stitching the images in real time and creating a panoramic 360º degrees view of the conference room. The place of this device is the center of the conference room table. It is best for capturing meetings for recording purposes.
- Very wide angle camera with high resolution, placed under the screen. This camera consists of five cameras with lenses with different focal length. The signals from the cameras are stitched for achieving best image for real-time broadcasting.
- Whiteboard camera which automatically takes pictures of the whiteboard every several seconds. The camera should be able to capture the whiteboard image in low light conditions.
- Overview camera, placed in one of the upper corners of the conference room. This camera has wide angle of view and provides general view of the conference room.
- Screen capture device, which converts the projector signal to digital form for further processing, recording and broadcasting.
- Microphone array for capturing the sounds in the conference room. In some designs it is a circular eight element microphone array in the base of the RingCam device, in another it is a linear four element array working with the wide angle camera. The purpose of this device is to provide good sound from the conference room and position information of the sound source.
Wide angle camera: stitched image of five cameras with distance correction
Processing blocks and algorithms
Stitching of multiple video streams into one panoramic stream in real time plays critical role in the Distributed Meetings designs. It varies from stitching a cylindrical image (RingCam) to more complex variant when we have different focal length and aim to introduce geometrical distortions for achieving better experience of the remote participant (PING!). In all cases color correction and auto-calibration is critical due to different cameras (manufacturing tolerances) and light conditions. Special measures are taken to compensate for variations in camera positions and orientation. Face detection algorithm is used to detect the positions of the meeting attendees.
Enhancing the whiteboard image consists of white balance (variable for different parts of the picture due to the light), removing obstacles (the person drawing the image), recognition and remembering the strokes, color correction and additional cleanup. As a result we have a clear, limited colors image, even white color of the board. Every stroke is remembered and time stamped, so in the meeting browser a single click on it jumps to the time when it was recorded.
Similar technologies are applied to the screen capture device. As with the whiteboard software, so called key frames (when something changes) are detected and recorded. This allows substantial reduction of the required bandwidth or storage space, and much better experience browsing the recorded meeting.
Microphone array processing employs beamforming to enhance the captured sound and suppress unwanted noise and reverberation. The processing module does sound source localization and tracking – important cues who is talking. During broadcasting and recording these sound source localization cues are fused with the output of the face detector and used to control so called speaker view – extracted high resolution image of the current speaker. In the recorded meeting these cues are used to create the speaker audio tracks, allowing listening to selected group of speaker(s).
Browsing recorded meeting: panoramic image from RingCam and time line with speaker tracks
Distributed Meetings Server captures the meeting, performs in real time many of the processing above and records the meeting in the hard disk. In some versions it has the ability to broadcast the meeting to on-line participants.
Distributed Meetings Post-processor does additional processing of the recorded meeting, retrieving more information. Here we run algorithms which are computationally heavy, require multiple passes, or do not make any sense in real time – speakers clustering forming of their audio tracks, for example.
Distributed Meetings Client runs on the remote attendee machine. It can serve as real time client and show the panoramic view, speaker view, whiteboard, screen view, meeting data, overview camera. In meeting browsing mode the client displays additional information and add functionality: listen only to a set of speakers, faster preview, jump to certain slide on the screen or stroke on the whiteboard.
Client screenshot browsing recorded meeting: speaker view (upper left), overview (middle left), whiteboard (upper right), panoramic view and speakers tracks (low).