Treffer: Design-Space Exploration of Serialized Floating-Point Division for DLP Architectures
collection:INSA-LYON
collection:INRIA2
collection:CITI
collection:INSA-GROUPE
collection:UDL
collection:INRIA-LYS
collection:DDRS-TEST-CJ
Weitere Informationen
We propose a framework for generating floatingpoint division units that leverages data-level parallelism (DLP) at the hardware level to enhance scalability, energy efficiency, and design flexibility. While modern ISAs support DLP via SIMD and vector extensions, arithmetic units often remain optimized for latency and do not scale efficiently across parallel workloads. Our approach revisits the microarchitecture of division units by trading latency for area, exposing parallelism through the instantiation of multiple smaller, slower units replicated spatially to exploit data-level parallelism. The framework operates as a high-level generator written in Python, automatically producing synthesizable ASIC designs across multiple floating-point formats and technology nodes. It supports format-agnostic and processagnostic design exploration, enabling rapid evaluation of tradeoffs between latency, area, and power. We validate our contributions through extensive evaluations encompassing 11 floating-point formats, including posits, IEEE754, and recent formats like EXMY, as well as process nodes ranging from 180nm to 7nm, demonstrating the scalability of our approach. Notably, area and power reductions reach up to 9.18× and 61.28×, respectively, for Posit64 and IEEE754 double precision. When constraining designs to preserve performance, the best gains reach 3.12× and 8.03×, demonstrating the viability of the approach for parallel workloads and vector datapaths.