*
Quick Links|Home|Worldwide
Microsoft*
Search for



Back to Microsoft Portrait

Technology Introduction

Video communications on mobile devices over wireless networks call for low bit-rate video coding technologies. Although conventional video processing and coding technologies such as MPEG1/2/4 and H.261/263 can also code video for very low bit rates, the resultant images usually look like a collection of color blocks and the motion in the scene becomes discontinuous. The block artifacts of these methods originate from the common architecture of MPEG1/2/4 and H.261/263, i.e. discrete cosine transform (DCT) based coding. In general, DCT-based coding groups pixels into blocks, e.g. 8x8 or 16x16 pixels blocks. These blocks are transformed from the spatial domain into a set of DCT coefficients of the frequency domain. Each of these coefficients is a weight associated with its corresponding DCT basis waveform. These coefficients are then quantized, and nonzero quantized values are compressed using an entropy coder. As a result, the low spatial frequency values that represent the "basic colors" of the blocks possess high priority. Thus, if DCT-based compression methods work at limited bandwidths, the basic colors of blocks will be kept in preference.

While the above schemes would not present a problem in situations of broad bandwidths, this is not the case when video is transmitted through a very low bandwidth network and the outline features of scenes are more important than basic colors of blocks. For example, in video communications, facial expressions that are represented by the motions of the outlines of the face, eyes, eyebrows and mouth deliver more information than the basic colors of the face. Since the representation of outlines needs only two types of colors, we developed a video form in which each pixel is represented by only 1 bit. We call it bi-level video (or portrait video in general). No matter how much the video can be further compressed, this is a significant step in bit-rate reduction. Tests showed that surprisingly it was very easy to identify a person in a bi-level image that was generated from a gray-scale image using a thresholding method, and it was also very easy to perceive the facial expressions of the person in a bi-level image sequence. By analyzing the temporal correlation between successive frames and flexibilities in the scene presentation using bi-level images, we achieved very high ratios in bi-level video compression. Experiments show that in low bandwidths, bi-level video provides clearer shape, smoother motion, shorter initial latency and much cheaper computational cost than do DCT-based technologies. Based on bi-level video, we further extend portrait video to tri-level, four-level and multiple level videos. If more bandwidths are available, more gray scale levels are coded and transmitted therefore portrait video always delivers the most important information in a video for a given bandwidth. Portrait video technology is especially suitable for small mobile devices such as handheld PCs, palm-size PCs and mobile phones that possess small display screens and light computational power, and work in wireless networks with limited bandwidths. Since 20-40 Kbps bandwidth is the bandwidth that 2.5G wireless networks such as GPRS and CDMA 1X can stably provide although the theoretical bandwidths of GPRS and CDMA 1X are up to 115 Kbps and 153.6 Kbps respectively, portrait video fits into this situation.

For more information, please refer to:

Jiang Li, Gang Chen, Jizheng Xu, Yong Wang, Hanning Zhou, Keman Yu, King To Ng and Heung-Yeung Shum, "Bi-level Video: Video Communications at Very Low Bit Rates," ACM Multimedia Conference 2001, Sep. 30 - Oct. 5, Ottawa, Ontario, Canada, pages 392 - 400.

Jiang Li, Keman Yu, Gang Chen, Yong Wang, Hanning Zhou, Jizheng Xu, King To Ng, Kaibo Wang, Lijie Wang and Heung-Yeung Shum, "Portrait Video Phone," ACM Multimedia Conference 2001, Sep. 30 - Oct. 5, Ottawa, Ontario, Canada, pages 597 - 598.

Keman Yu, Jiang Li, and Shipeng Li, "Bi-level/full-color Video Combination for Ubiquitous Video Communication," IEEE International Symposium on Circuits and Systems, May 26 - 29, Scottsdale, Arizona, pages 245 - 248.

Jiang Li, Keman Yu, Tielin He, Yunfeng Lin, Shipeng Li, and Ya-Qin Zhang, Scalable Portrait Video for Mobile Video Communication, IEEE Transactions on Circuits and Systems for Video Technology, Volume: 13, Issue: 5, pages: 376 - 384, May 2003, ISSN: 1051-8215.

Keman Yu, Jiang Li, Jizheng Xu, and Shipeng Li, "Very Low Bit Rate Watercolor Video," IEEE International Symposium on Circuits and Systems (ISCAS) 2003, May 25-29, Bangkok, Thailand, pages 712 - 715.

Keman Yu, Jiangbo Lv, Jiang Li and Shipeng Li, "Practical Real-time Video Codec for Mobile Devices," IEEE International Conference on Multimedia & Expo (ICME) 2003, July 6-9, Baltimore, Maryland, pages 509-512.

Keman Yu, Jiang Li, Tielin He, Yunfeng Lin, Jiangbo Lv and Shipeng Li, "Microsoft Portrait: A Real-time Mobile Video Communication System," IEEE International Conference on Multimedia & Expo (ICME) 2003, July 6-9, Baltimore, Maryland, demonstration III-2.

Back to Microsoft Portrait


©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement