Treffer: Uncovering the limits of visual-language models in engineering knowledge representation.

Title:
Uncovering the limits of visual-language models in engineering knowledge representation.
Source:
Proceedings of the Design Society; Aug2025, Vol. 5, p3261-3270, 10p
Database:
Complementary Index

Weitere Informationen

Visual-Language (VL) models offer potential for advancing Engineering Design (ED) by integrating text and visuals from technical documents. We review VL applications across ED phases, highlighting three key challenges: (i) understanding how functional and structural information is complementarily expressed by text and images, (ii) creating large-scale multimodal design datasets and (iii) improving VL models' ability to represent ED knowledge. A dataset of 1.5 million text-image pairs and an evaluation dataset for cross-modal information retrieval were developed using patents. By Fine-tuning and testing the CLIP base model on these datasets, we identified significant limitations in VL models' capacity to capture fine-grained technical details required for precision-driven ED tasks. Based on these findings, we propose future research directions to advance VL models for ED applications. [ABSTRACT FROM AUTHOR]

Copyright of Proceedings of the Design Society is the property of Cambridge University Press and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)