Number-Theoretic Transform (NTT)
This competition track required us to build a Number Theoretic Transform (NTT) accelerator capable of performing transforms of size $2^24$. NTTs are conceptually similar to the Fourier Transforms - working over a finite field instead of complex numbers. For this project, the finite field contains values of size 64 bits modulo a so called Solinas prime equal to $2^64 - 2^32 + 1$.
The mathematical formulation of the NTT is as the following:
\[X_{i} = ∑↙{j=0}↖{N-1} x_{j} w^{ij} (mod P)\]where $N$ is the size of the input vector $x$, $P$ is the Solinas prime and $w$ is the $N$th root of unity. $X_{i}$ is defined for $0 ≤ i < N$.
The main design challenge is that such large transforms cannot fit within a single FPGA’s on-chip RAM resources, so we need to make efficient use of large external DRAM resources.
The platform targeted is the Xilinx Varium C1100 accelerator card. The card contains a Virtex UltrasScale+ FPGA with HBM2.
Design overview
Our work is built around the 4-step algorithm to break down a large NTT into much smaller NTTs. The smaller NTTs are computed using the well-known decimation in time Cooley-Tukey FFT algorithm. The following pages discuss the algorithms and design considerations:
- 4-step details the 4-step algorithm
- Bandwidth Considerations
You can read more about our FPGA implementation in the following pages:
- core NTT design
- scaling Up performance with shared controllers and a wide memory bus
- top level Vitis design
We present the performance results for our design and show how you can build the design.
Hardcaml on the Web
Configure our designs, download RTL and perform simulations all within your browser:
- Core NTT design which includes the IO RAMs, datapath and controller
- Top level design as a Vitis kernel which performs the full 4-step algorithm
Code structure
The code is built from a couple of libraries within our submission repository:
The Hardcaml_ntt
library provides the base implementation of the NTT core and includes a software model and various unit tests.
Zprize_ntt
contains the code for both the top-level
Hardcaml NTT kernel and
tests, the
C++ HLS DMA kernel,
host benchmarking software, and
build scripts.