Techniques for Efficient DCT/IDCT Implementation on Generic GPU

  • Bo Fang ,
  • Guobin Shen ,
  • Shipeng Li ,
  • Huifang Chen

Published by Institute of Electrical and Electronics Engineers, Inc.

Publication

The emergence of programmable graphics processing units (GPU) has led to increasing interest in off-loading numerically intensive computations on to graphics hardware. DCT/IDCT is widely adopted in modern image/video compression standards and is usually one of the most computationally expensive parts. We present several techniques for efficient implementation of DCT/IDCT on generic programmable GPU, using direct matrix multiplication. Our experimental results demonstrate that the speed of IDCT on a GPU using the proposed techniques can well exceed that on a CPU with MMX optimization.