In recent years, the continued push to gain the best computing performance possible has led to the
realization of Heterogeneous computing and Heterogeneous platforms. These systems gain
performance and energy efficiency by adding dissimilar accelerators as co-processors with specialized
processing capabilities, to handle specific intensive tasks. Field Programmable Gate Arrays (FPGAs)
have gained the interest of system architects due to their rapid prototyping and fast accelerator
developing capabilities. As their name denotes, FPGAs are programmable “in the field”, meaning that
their internal logic can be configured after the fabrication process and modified, if needed, without
going to re-fabrication process, as common ASICs. Partial Reconfiguration (PR) takes this flexibility
one step further, by allowing an operating FPGA design to modify a part of itself, while the rest of the
system continues to function normally, without compromising the integrity of the computation running
on those parts of the device that are not being reconfigured. This technique leads to reduction of the
amount of resources required to implement a given function, with consequent reductions in cost and
power consumption, provides flexibility in the algorithms/protocols available to an application and
accelerates computing by enabling a design to be ready to correspond to new computation
requirements much faster. This thesis tried to explore the PR technology on FPGAs and apply the
knowledge acquired to implement a cryptographic system on a Xilinx Zynq-7000 SoC device. Zynq
combines the coexistence of programmable logic and an embedded ARM processor on a single chip,
thus forming a system-on-a-chip (SoC), while enabling fast interconnection between them and power
efficiency. For the purposes of this thesis we chose four cryptographic modules (AES128, AES192,
AES256 and SHA3-512). Firstly, we made all the appropriate modifications needed to utilize the
cryptographic modules in the SoC and designed the appropriate AXI4-Stream compliant interfaces to
enable communication between the peripherals and the processor, with respective compromises to the
different modules’ architecture, the processing system’s limitations and PR’s restrictions. Then, we
established connection between the peripherals and the processing system through an AXI DMA IP in
Scatter/Gather mode. Scatter/Gather resulted in a high-speed communication and applied interrupt
coalescing strategy to reduce the number of interrupts occupying the ARM, thus it allowed the
processor to handle the peripherals more efficiently. We also applied decoupling strategy to isolate the
reconfigurable modules during PR to avoid undesirable outcoming signals to affect the rest of the
design. Finally, we made an evaluation of our work and constructed a benchmark to show the
acceleration advantages of PR. In this benchmark, the system could adapt to computation requirements
and reconfigured idle peripherals with others that were needed, to distribute the computational load
between them and so, to reduce the total computation time. As a result, we achieved almost full
hardware utilization and approximated the optimal speedup.
(EN)