This paper deals with the optimization of the CORDIC-based modified Gram-Schmidt (MGS) algorithm for QR decomposition (QRD) and presents a scalable algorithm with maximum throughput, the least possible latency, and hardware resources. The optimized algorithm is implemented on Xilinx Virtex 6 FPGA using ISE software as a fixed point with selected accuracy based on the results of MATLAB simulation. Using the loop unrolling technique with different coefficients, an attempt is made to reduce the latency and increase the throughput. In contrast, increasing the unrolling factor leads to a decrease in the frequency of the CORDIC unit as well as a decrease in the number of resources. As a result, there is a trade-off between the unrolling factor and the frequency of the CORDIC unit. By investigating the different unrolling factors, it is shown that the loop unrolling technique with a factor of 4 has the highest throughput with the value of 5.777 MQRD/s and the lowest latency with the value of 173 ns. Moreover, it is shown that throughput and latency are improved by 42.52% and 73.74% respectively compared to the not optimized case. The proposed method is also scalable for different sizes of m×m complex channel matrices, where log2 m ∈ N.
Type of Study:
Research Paper |
Subject:
VLSI Received: 2021/06/06 | Revised: 2024/05/13 | Accepted: 2022/02/10