Horst G¨ rtz Institute for IT Security, Ruhr University Bochum, Germany o
solved accurately, approximative solutions using fast iterative algorithms are simply useless.
Another kind of LSEs - which we primarily address in this
paper - appears in the cryptanalysis of certain classes of symmetric ciphers. Here, large numbers of medium-sized1 LSEs over GF(2) must be solved extremely fast to enable practical attacks. Software implementations of existing algorithms have proven to be too slow for this task. However, to the
best of our knowledge the alternative of an efﬁcient implementation in hardware has not been analyzed thoroughly yet. In this paper we show that there is still room for drastic efﬁciency improvements of current algorithms when targeting special purpose hardware such as re-programmable logic.
In order to develop such a special purpose hardware architecture we analyzed the suitability of various algorithms while taking possible simpliﬁcations over GF(2) into account. Gaussian elimination turned out to be basically best suited
for hardware implementation due to its simplicity. Moreover, the coefﬁcient matrices of LSEs are not required to have any special properties like being symmetric positive deﬁnite2 as this is the case for example for the conjugate gradient methods . However, compared to software, a direct implementation of Gaussian elimination in hardware does not yield the desired advantage in efﬁciency. By slightly modifying the
logic of the algorithm and parallelizing element and row operations, we were able to develop a highly efﬁcient architecture for solving medium-sized LSEs and computing inverse matrices. In a similar way, an architecture for fast matrix
multiplication can be obtained and can possibly be used in
conjunction with the other architecture to solve large LSEs
by means of Strassen’s algorithm .
This paper is roughly structured as follows. We start with
a brief discussion of previous work in this area. After reviewing standard Gaussian elimination with backward substitution, we present our optimized version of the algorithm for hardware implementation and analyze its performance.
After that, further applications of the hardware architecture are discussed including its particular suitability in the area of cryptanalysis. We then present the actual design of the
hardware architecture for solving medium-sized LSEs. Finally, we describe our proof-of-concept implementation on a contemporary low-cost FPGA and provide estimates for a
possible ASIC implementation.
This paper presents a new variant of the well-known Gaussian elimination for solving linear systems of equations (LSEs) over GF(2) and its highly efﬁcient implementation in hardware. The proposed hardware architecture can solve any
regular and (uniquely solvable) overdetermined LSE and is
not limited to matrices of a certain structure. Its average running time for n×n matrices with uniformly distributed entries 1
equals 2n (clock cycles) as opposed to about 4 n3 in software. Moreover, the average running time remains very close to 2n
for random matrices with densities much greater or lower
than 0.5. Our architecture has a worst-case time complexity
of O(n2 ) and also a space complexity of O(n2 ). With these
characteristics the architecture is particularly suited to efﬁciently solve medium-sized LSEs as they frequently appear in, e.g., cryptanalysis of stream ciphers.
Besides solving LSEs over GF(2), the architecture at hand
can also accomplish the computation of matrix inversion extremely fast. Using a similar architecture, matrix-by-matrix multiplication can be done in linear time and quadratic
space. Thus, these architectures can possibly be used as
building blocks of a more complex architecture for solving...