IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 2, FEBRUARY 2010
An Efﬁcient Architecture for 3-D Discrete Wavelet Transform Anirban Das, Anindya Hazra, and Swapna Banerjee, Senior Member, IEEE
Abstract—This paper presents an architecture of the liftingbased running 3-D discrete wavelet transform (DWT), which is a powerful image and video compression algorithm. The proposed design is one of the ﬁrst lifting based complete 3-DDWT architectures without group of pictures restriction. The new computing technique based on analysis of lifting signal ﬂow graph minimizes the storage requirement. This architecture enjoys reduced memory referencing and related low power consumption, low latency, and high throughput compared to those of earlier reported works. The proposed architecture has been successfully implemented on Xilinx Virtex-IV series ﬁeld-programmable gate array, offering a speed of 321 MHz, making it suitable for realtime compression even with large frame dimensions. Moreover, the architecture is fully scalable beyond the present coherent Daubechies ﬁlterbank (9, 7). Index Terms—Discrete wavelet transform, image compression, lifting, video, VLSI architecture.
TILL IMAGE compression technique based on 2-D discrete wavelet transform (DWT) has already gained superiority over traditional JPEG based on discrete cosine transform and is standardized in forms like JPEG2000 . Quite similarly, the application of its 3-D superset, i.e., 3-D-DWT on video, outperforms the current predictive coding standards, like H.261-3, MPEG1-2,4 by rendering the quality features like better peak signal-to-noise ratio (PSNR), absence of blocky artifacts in low bit rates. Furthermore, it has the added provisions of highly scalable compression, which is mostly coveted in modern communications over heterogeneous channels like the Internet . Successful application of 3-D-DWT has been reported in the literature in emerging ﬁelds like medical image compression , hyper-spectral and space image compression , etc. Software-based approaches are experimented to combat the huge computational complexity and memory requirement associated with 3-D-DWT
Manuscript received March 19, 2007; revised December 4, 2008 and April 8, 2009. First version published September 4, 2009; current version published February 5, 2010. This work was supported by the Ministry of Information Technology, Government of India. This paper was recommended by Associate Editor L.-G. Chen. A. Das is with the Bangalore Design Center, Nvidia Corporation, Bangalore 560001, India (e-mail: firstname.lastname@example.org). A. Hazra is with STMicroelectronics Private Ltd., Greater Noida, Uttar Pradesh 201308, India (e-mail: email@example.com). S. Banerjee is with the Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology Kharagpur, Kharagpur 721302, India (e-mail: firstname.lastname@example.org). Digital Object Identiﬁer 10.1109/TCSVT.2009.2031551
realization , . Though the processor speed of modern computers soars high at the order of GHz, data fetching and communicating with external memories consume several T states, making the computation quite slower at the end. As the speeds of the peripherals are still far behind the modern processors, it causes more problems. Nowadays, most of the applications require real-time DWT engines with large computing potentiality for which a fast and dedicated very-large-scale integration (VLSI) architecture appears to be the best possible solution. While it ensures high resource utilization, that too in cost effective platforms like ﬁeldprogrammable gate array (FPGA), designing such architecture does offer some ﬂexibilities like speeding up the computation by adopting more pipelined structures and parallel processing, possibilities of reduced memory consumptions through better task scheduling or low-power and portability features. To overcome...