Treffer: Design-Space Exploration of Serialized Floating-Point Division for DLP Architectures

Title:
Design-Space Exploration of Serialized Floating-Point Division for DLP Architectures
Authors:
Contributors:
Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA), Systèmes Embarqués audio programmables (EMERAUDE), CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre national de création musicale [Lyon] (GRAME), Centre National de Création Musicale (CNCM)-Centre National de Création Musicale (CNCM)-Centre Inria de Lyon, Institut National de Recherche en Informatique et en Automatique (Inria)
Source:
DSD 2025 - 28th Euromicro Conference on Digital System Design, Sep 2025, Salerno, Italy
Publisher Information:
CCSD, 2025.
Publication Year:
2025
Collection:
collection:INRIA
collection:INSA-LYON
collection:INRIA2
collection:CITI
collection:INSA-GROUPE
collection:UDL
collection:INRIA-LYS
collection:DDRS-TEST-CJ
Subject Geographic:
Original Identifier:
HAL: hal-05385247
Document Type:
Konferenz conferenceObject<br />Conference papers
Language:
English
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edshal.hal.05385247v1
Database:
HAL

Weitere Informationen

We propose a framework for generating floatingpoint division units that leverages data-level parallelism (DLP) at the hardware level to enhance scalability, energy efficiency, and design flexibility. While modern ISAs support DLP via SIMD and vector extensions, arithmetic units often remain optimized for latency and do not scale efficiently across parallel workloads. Our approach revisits the microarchitecture of division units by trading latency for area, exposing parallelism through the instantiation of multiple smaller, slower units replicated spatially to exploit data-level parallelism. The framework operates as a high-level generator written in Python, automatically producing synthesizable ASIC designs across multiple floating-point formats and technology nodes. It supports format-agnostic and processagnostic design exploration, enabling rapid evaluation of tradeoffs between latency, area, and power. We validate our contributions through extensive evaluations encompassing 11 floating-point formats, including posits, IEEE754, and recent formats like EXMY, as well as process nodes ranging from 180nm to 7nm, demonstrating the scalability of our approach. Notably, area and power reductions reach up to 9.18× and 61.28×, respectively, for Posit64 and IEEE754 double precision. When constraining designs to preserve performance, the best gains reach 3.12× and 8.03×, demonstrating the viability of the approach for parallel workloads and vector datapaths.