|
|
Back to Microsoft Portrait
Technology Introduction
Video communications on mobile devices over wireless networks call for low
bit-rate video coding technologies. Although conventional video processing and
coding technologies such as MPEG1/2/4 and H.261/263 can also code video for very
low bit rates, the resultant images usually look like a collection of color
blocks and the motion in the scene becomes discontinuous. The block artifacts of
these methods originate from the common architecture of MPEG1/2/4 and H.261/263,
i.e. discrete cosine transform (DCT) based coding. In general, DCT-based coding
groups pixels into blocks, e.g. 8x8 or 16x16 pixels blocks. These blocks are
transformed from the spatial domain into a set of DCT coefficients of the
frequency domain. Each of these coefficients is a weight associated with its
corresponding DCT basis waveform. These coefficients are then quantized, and
nonzero quantized values are compressed using an entropy coder. As a result, the
low spatial frequency values that represent the "basic colors" of the blocks
possess high priority. Thus, if DCT-based compression methods work at limited
bandwidths, the basic colors of blocks will be kept in preference.
While the above schemes would not present a problem in situations of broad
bandwidths, this is not the case when video is transmitted through a very low
bandwidth network and the outline features of scenes are more important than
basic colors of blocks. For example, in video communications, facial
expressions that are represented by the motions of the outlines of the face, eyes,
eyebrows and mouth deliver more information than the basic colors of the face.
Since the representation of outlines needs only two types of colors, we
developed a video form in which each pixel is represented by only 1 bit. We call
it bi-level video (or portrait video in general). No matter how much the video
can be further compressed, this is a significant step in bit-rate reduction. Tests showed that
surprisingly it was very easy to identify a person in a bi-level image that was
generated from a gray-scale image using a thresholding method, and it was also
very easy to perceive the facial expressions of the person in a bi-level image
sequence. By analyzing the temporal correlation between successive frames and
flexibilities in the scene presentation using bi-level images, we achieved very
high ratios in bi-level video compression. Experiments show
that in low bandwidths, bi-level video provides clearer shape, smoother motion,
shorter initial latency and much cheaper computational cost than do DCT-based
technologies. Based on bi-level video, we further extend portrait video to
tri-level, four-level and multiple level videos. If more bandwidths are
available, more gray scale levels are coded and transmitted therefore portrait
video always delivers the most important information in a video for a given
bandwidth. Portrait video technology is especially
suitable for small mobile devices such as handheld PCs, palm-size PCs and mobile
phones that possess small display screens and light computational power, and
work in wireless networks with limited bandwidths. Since 20-40 Kbps bandwidth is
the bandwidth that 2.5G wireless networks such as GPRS and CDMA 1X can stably
provide although the theoretical bandwidths of GPRS and CDMA 1X are up to 115 Kbps and
153.6 Kbps respectively, portrait video fits into this situation.
For more information, please refer to:
Jiang Li, Gang Chen, Jizheng Xu, Yong Wang, Hanning Zhou, Keman Yu, King To Ng
and Heung-Yeung Shum, "Bi-level
Video: Video Communications at Very Low Bit Rates," ACM Multimedia
Conference 2001, Sep. 30 - Oct. 5, Ottawa, Ontario, Canada, pages 392 - 400.
Jiang Li, Keman Yu, Gang Chen, Yong Wang, Hanning Zhou, Jizheng Xu, King To Ng,
Kaibo Wang, Lijie Wang and Heung-Yeung Shum, "Portrait
Video Phone," ACM Multimedia Conference 2001, Sep. 30 - Oct. 5,
Ottawa, Ontario, Canada, pages 597 - 598.
Keman Yu, Jiang Li, and Shipeng Li, "Bi-level/full-color
Video Combination for Ubiquitous Video Communication," IEEE International
Symposium on Circuits and Systems, May 26 - 29, Scottsdale, Arizona, pages 245 -
248.
Jiang Li, Keman Yu, Tielin He, Yunfeng Lin, Shipeng Li, and Ya-Qin Zhang,
Scalable Portrait Video for Mobile Video Communication, IEEE Transactions on
Circuits and Systems for Video Technology, Volume: 13, Issue: 5, pages: 376 -
384, May 2003, ISSN: 1051-8215.
Keman Yu, Jiang Li, Jizheng Xu, and Shipeng Li, "Very Low
Bit Rate Watercolor Video," IEEE International Symposium on Circuits and
Systems (ISCAS) 2003, May 25-29, Bangkok, Thailand, pages 712 - 715.
Keman Yu, Jiangbo Lv, Jiang Li and Shipeng Li, "Practical
Real-time Video Codec for Mobile Devices," IEEE International Conference on
Multimedia & Expo (ICME) 2003, July 6-9, Baltimore, Maryland, pages 509-512.
Keman Yu, Jiang Li, Tielin He, Yunfeng Lin, Jiangbo Lv and Shipeng Li, "Microsoft
Portrait: A Real-time Mobile Video Communication System," IEEE International
Conference on Multimedia & Expo (ICME) 2003, July 6-9, Baltimore, Maryland,
demonstration III-2.
Back to Microsoft Portrait
|