A Fast CRC Implementation on FPGA Using a Pipelined Architecture for the Polynomial Division Fabrice MONTEIRO, Abbas DANDACHE, Amine M’SIR,Bernard LEPLEY LICM, University of Metz, SUPELEC, Rue Edouard Belin, 57078 Metz Cedex phone: +33(0)3875473 11, fax: +33(0)387547301, email: firstname.lastname@example.org
ABSTRACT The CRC error detection is a very common function on telecommunication applications. The evolution towards increasing data rates requires more and more sofisticated implementations. In this paper, we present a method to implement the CRC function based on a pipeline structure for the polynomial division. It improves very effectively the speed performance, allowing data rates from 1 Gbits/s to 4 Gbits/s on FPGA implementions, according to the parallelisation level (8 to 32 bits). 1 INTRODUCTION The CRC (Cyclic Redundancy Checking) codes are used in a lot of telecommunication applications. They are used in the internal layers of protocols such as Ethernet, X25, FDDI and ATM (AAL5). However, on modem networks, the need for increasing data rates (over 1 Gbit/s) is setting the constraints on performance very high. Indeed, the speed improvement (higher clock rates) due to the technological evolution is unable to fit the demand. Consequently, new architectures must be devised. Targetting the applications to an FPGA device is an issue for this paper, as it allows low-cost designs. The simple and evident serial implementation is a classical hardware implementation of the CRC algorithm. Unfortunatly, on an FPGA implementation with maximal clock frequency of 250 MHz, maximal data rate is limited to 250 Mbits/s is the best case. Higher data rates can only be obtained through parallelisation. Some parallel architectures have been proposed in the past to address the need for high data throughput [ 1]. The main problem is usually to limit the rapidly increasing area overhead while improving the speed performance. In this paper, we present a parallel approach for the polynomial division based on a pipeline structure. The parallelisation can be led to any level and is only lim-
ited by the area constraint set on the design. The data throughput is almost directly linked to the parallelisation level, as the maximal clock rate is not very sensitive to it.
The polynomial division is the fundamental operation of the CRC applications. The serial implementation of the division is shown in figure 1 for the case where the polynomial divisor is G ( X ) = Go + G1.X1 + Gz.X2 + G3.X3 = 1 + X + X 3 . As indicated previously, the data throughput of this serial implementation is quite low. Very high data rates can only be achieved with high clock frequencies, which in turn can only be obtained using rather expensive technological solutions. Parallelisation of data processing is the main solution to improve the speed performance of a circuit (or system) if the clock rate must remain low. Pipelining may be used as an effective parallelisation method when a repeatitive process must be applied on large volumes of ‘data. Previous works have addressed the parallelisation problem in large demanding computational applications, particularly in arithmetic (eg. ) and error control coding circuits (eg. [11[21[61). In the serial architecture (figure I), a new data bit is inject on each clock cycle. The previous cumulated remainder is simultaneously multiplied by X and divided by G(z) (where G(z) is the polynomial divisor). On P
Figure I : Serial polynomial division for G ( X ) = 1 -tX + X 3
0-7803-7057-0/01/$10.00 02001 IEEE.
successive clock cycle , P bits are injected and P successive multiplication and divisions are performed. The next formula (related to the example of figure 1) describes the operation performed on one clock cycle.
T = [
0 i ] 0
This architecture have been implemented on FPGA devices of the FLEXlOKE ALTERA...
Please join StudyMode to read the full document