Treffer: Toward Attention-Based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

Title:

Toward Attention-Based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

Authors:

Philip Wiese, Gamze İslamoğlu, Moritz Scherer, Luka Macan, Victor Jean-Baptiste Jung, Alessio Burrello, Francesco Conti, Luca Benini

Source:

IEEE Design & Test. 42:63-72

Publication Status:

Preprint

Publisher Information:

Institute of Electrical and Electronics Engineers (IEEE), 2025.

Publication Year:

2025

Subject Terms:

FOS: Computer and information sciences, Neural Networks, TinyML, Deployment, Transformers, Accelerators, Computer Science - Machine Learning, TinyML, Transformers, Neural Networks, Deployment, Accelerators, Hardware Architecture (cs.AR), Computer Science - Hardware Architecture, 7. Clean energy, Neural networks, Machine Learning (cs.LG)

Document Type:

Fachzeitschrift Article

File Description:

application/pdf

ISSN:

2168-2364
2168-2356

DOI:

10.1109/mdat.2025.3527371

DOI:

10.48550/arxiv.2408.02473

DOI:

10.3929/ethz-b-000714939

Access URL:

http://arxiv.org/abs/2408.02473
https://ieeexplore.ieee.org/document/10833747
https://hdl.handle.net/11583/2996569
https://doi.org/10.1109/mdat.2025.3527371
https://hdl.handle.net/11585/1000948

Rights:

IEEE Copyright
CC BY

Accession Number:

edsair.doi.dedup.....5db62492710a5e9d5364326cf134d12f

Database:

OpenAIRE

Weitere Informationen

One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate Attention-based models in a tinyML power envelope with an octa-core cluster coupled with an accelerator for quantized Attention. Our deployment flow enables end-to-end 8-bit Transformer inference, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154 GOp/s (0.65 V, 22 nm FD-SOI technology).
Accepted for publication in the SI: tinyML (S1) issue of IEEE Design & Test

Treffer: Toward Attention-Based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

Weitere Informationen

Links

Zusatz-Funktionen