Treffer: Toward Attention-Based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow
Title:
Toward Attention-Based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow
Authors:
Source:
IEEE Design & Test. 42:63-72
Publication Status:
Preprint
Publisher Information:
Institute of Electrical and Electronics Engineers (IEEE), 2025.
Publication Year:
2025
Subject Terms:
FOS: Computer and information sciences, Neural Networks, TinyML, Deployment, Transformers, Accelerators, Computer Science - Machine Learning, TinyML, Transformers, Neural Networks, Deployment, Accelerators, Hardware Architecture (cs.AR), Computer Science - Hardware Architecture, 7. Clean energy, Neural networks, Machine Learning (cs.LG)
Document Type:
Fachzeitschrift
Article
File Description:
application/pdf
ISSN:
2168-2364
2168-2356
2168-2356
DOI:
10.1109/mdat.2025.3527371
DOI:
10.48550/arxiv.2408.02473
DOI:
10.3929/ethz-b-000714939
Access URL:
Rights:
IEEE Copyright
CC BY
CC BY
Accession Number:
edsair.doi.dedup.....5db62492710a5e9d5364326cf134d12f
Database:
OpenAIRE
Weitere Informationen
One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate Attention-based models in a tinyML power envelope with an octa-core cluster coupled with an accelerator for quantized Attention. Our deployment flow enables end-to-end 8-bit Transformer inference, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154 GOp/s (0.65 V, 22 nm FD-SOI technology).
Accepted for publication in the SI: tinyML (S1) issue of IEEE Design & Test