Tutorial Title:   Multimedia Signal Processing on Personal Computers

 

Subject Area:   Implementation and performance evaluation of multimedia applications on personal computers, parallelizing and optimization techniques.

 

Tutorial Abstract:  

For the best performance of multimedia applications on personal computers, we must carefully consider the interplay between microprocessors and algorithms/applications. The performance of personal computers has improved significantly during the past two decades. Significant portion of the improvement comes from data-level parallelism (e.g., MMX/SSE instructions) and thread-level parallelism (e.g., the latest Intel Core Duo processor). Moving forward, we expect a trend of increasing not only the capability of single-instruction-multiple-data instructions but also the number of processing cores in a single personal computer. Conventional optimization of digital signal processing algorithms in terms of numbers of operations may not be suitable for modern personal computers. To capture the increasing computational performance provided by the future architectures, we must carefully design or choose the algorithm for a specific task. This tutorial covers algorithm design and algorithmic-level optimization for modern processors.

 

Tutorial Outline:  

1.      Overview & motivation

a.       Sequential vs parallel processing

b.      Goal of the tutorial---from architectural style to algorithm design

2.      Performance enhancement features in personal computer for media applications

a.       Data-level parallelism, e.g., MMX/SSE Technologies

b.      Thread-level parallelism, e.g., Hyper-Threading Technology and Dual Core

3.      SIMD optimization techniques

a.       Match the algorithms to SIMD instruction capability---put data into the right format for parallel execution (using H.264 integer transform as an example)

b.      Execute multiple identical operations in one instruction (using H.264 luminance sub-pel interpolation as an example)

c.       Transform conditional executions into logic operations (using MPEG-4 repetitive pixel padding as an example)

d.      Reduce shuffling and maximizing group of operations into one instruction (using MPEG-4 SA-DCT as an example)

4.      Multi-threading algorithm design

a.       Partition application into multiple threads, which have same program, but on different pieces of data (using H.264 encoder as an example)

b.      Dynamically balance loads for better parallelism (using MPEG-2 video decoder as an example)

c.       Take advantage of sharing cache to increase effectiveness (using SVM-based face detection as an example)

5.      Conclusions

a.       Match algorithms to SIMD instruction capabilities

b.      Design algorithm with minimal/simple data dependencies for data-level and functional-level parallelism

 

Duration: 3 hours

 

Potential audience:   

This tutorial is intended to provide a basic overview of implementing multimedia applications (specifically video codec) on modern personal computers.

 

Speaker's Biography:  

Yen-Kuang Chen received his Ph.D. from Princeton University in Electrical Engineering.  He is a Principal Researcher in Corporate Technology Group, Intel Corporation. His research interests include developing innovative multimedia and Internet applications, studying the performance bottleneck in current computers, and designing next generation microprocessor/platform. He has 10+ US patents, 25+ pending patent applications, and 50+ technical publications. He is one of the key contributors to Supplemental Streaming SIMD Extension 3 (SSSE3). As an expert in video compression (e.g., MPEG-2, MPEG-4, H.263, & H.264) and computer architecture for emerging applications (e.g., SIMD and multi-threading), he is an invited speaker to 2005 Emerging Information Technology Conference, 2005 New Technology Business Opportunities Forum, 2004 Sino-American Technology & Engineering Conference, and 2003 Workshop on Media and Signal Processors for Embedded Systems and SoCs. He is an associate editor of the Journal of VLSI Signal Processing Systems (including special issues on “System-on-a-Chip for Multimedia Systems” and on “Design and Programming of Signal Processors for Multimedia Communication”) and of IEEE Transactions on Circuit and System I. He has served a program committee member of 20+ international conferences and workshops on multimedia, video communication, image processing, VLSI circuits and systems, parallel processing, and software optimization. He is an invited participant to 2002 Frontiers of Engineering Symposium (National Academy of Engineering) and to 2003 German-American Frontiers of Engineering Symposium (Alexander von Humboldt Foundation). He is an IEEE Senior Member and an ACM Senior Member.

 

Related Publications:

1.      “Digital Signal Processing on MMX Technology,” Y.-K. Chen, N. Yu, and B. Shah, in Programmable Digital Signal Processors: Architecture, Programming and Design, Y. H. Hu, Ed., (Marcel Dekker: NY), pp. 295-331, 2002.

2.      “Implementation of H.264 Encoder and Decoder on Personal Computers,” Y.-K. Chen, E. Q. Li, X. Zhou, and S. L. Ge, Journal of Visual Communications and Image Representations, vol. 17, no. 2 , pp 509-532, Apr. 2006.

3.      "A Compiler for Exploiting Nested-Parallelism in OpenMP Programs," X. Tian, J. Hoeflinger, G. Haab, Y.-K. Chen, M. Girkar, S. Shah, Parallel Computing Journal, vol. 31, no. 10-12, pp. 960-983, Oct. 2005.

4.      “Media Applications on Hyper-Threading Technology,” Y.-K. Chen, M. Holliman, E. Debes, S. Zheltov, A. Knyazev, S. Bratanov, R. Belenov, I. Santos, Intel Technology Journal, pp. 47-57, Feb. 2002.

5.      “Computer Vision on Multi-Core Processors: Articulated Body Tracking,” T. Chen, D. Budnikov, C. Hughes, and Y.-K. Chen, to appear in Int’l Conf. on Multimedia and Expo, July 2007.

6.      "Adaptive Parallel Graph Mining for CMP Architectures," G. Buehrer, S. Parthasarathy, and Y.-K. Chen, in Int’l Conf. on Data Mining, pp. 97-106, Dec. 2006.

7.      "Efficient Frequent Pattern Mining on Shared Memory Systems: Implications for Chip Multiprocessor Architectures," G. Buehrer, S. Parthasarathy, A. Ghoting, Y.-K. Chen, D. Kim, and A. Nguyen, in Memory Systems Performance and Correctness Workshop, Oct. 2006.

8.       “Towards Efficient Multi-Level Threading of H.264 Encoder on Intel Hyper-Threading Architectures,” Y.-K. Chen, X. Tian, S. Ge, M. Girkar, in Proc. of Int’l Parallel and Distributed Processing Symp., Apr. 2004.

9.      “Implementation of H.264 Encoder on General-Purpose Processors with Hyper-Threading Technology,” E. Q. Li and Y.-K. Chen, in Proc. of SPIE Visual Communications and Image Processing, vol. 5308, pp. 384—395, Jan. 2004.

10.  “Efficient Multithreading Implementation of H.264 Encoder on Intel Hyper-Threading Architectures,” S. Ge, X. Tian, and Y.-K. Chen, in Pacific-Rim Conf. on Multimedia, Dec 2003.

11.  “Exploring the Use of Hyper-Threading Technology for Multimedia Applications with Intel OpenMP Compiler,” X. Tian, Y.-K. Chen, M. Girkar, S. Ge, R. Lienhart, and S. Shah, in Int’l Parallel and Distributed Processing Symp., pp. 36-43, Apr. 2003.

12.  “Exploring the Use of Hyper-Threading Technology for Multimedia Apps,” X. Tian, M. Girkar, Y.-K. Chen, A. Bik, and E. Su, OSnews Magazine, Mar. 12, 2003.

13.  “MPEG Decoding Workload Characterization,” M. Holliman, E. Q. Li, and Y.-K. Chen, in Workshop on Computer Architecture Evaluation using Commercial Workloads, pp. 23-34, Feb. 2003.

14.  “Implementation of H.264 Decoder on General-Purpose Processors with Media Instructions,” X. Zhou, E. Q. Li, and Y.-K. Chen, in Proc. of SPIE Conf. on Image and Video Communications and Processing, vol. 5022, pp. 224-235, Jan. 2003.

15.  "Evaluating Performance of Multimedia Application on Simultaneous Multi-Threading," Y.-K. Chen, E. Debes, R. Lienhart, M. Holliman, and M. Yeung, in Proc. of Int'l Conf. on Parallel and Distributed Systems, pp. 529-534, Dec. 2002.

16.  "The Impact of SMT/SMP Designs on Multimedia Software Engineering---A Workload Analysis Study,” Y.-K. Chen, R. Lienhart, E. Debes, M. Holliman, and M. Yeung, in Proc. of Int’l Symp. on Multimedia Software Engineering, Dec. 2002.

17.  "Video Applications on Hyper-Threading Technology," Y.-K. Chen, M. Holliman, and E. Debes, in Int'l Conf. on Multimedia and Expo, vol. 2, pp. 193 -196, Aug. 2002.

18.   “Real-Time Detection of Video Watermark on Intel Architecture,” Y.-K. Chen, M. Holliman, R. Liu, W. Macy, and M. M. Yeung, in Proc. of Image and Video Communications and Processing, vol. 3971, pp. 198--208, Jan. 2000.