Treffer: Multi-label advertising image classification using traditional deep neural networks and vision language models: dataset and annotation agreement method.
Weitere Informationen
Effectively classifying advertising images is crucial in targeting the right audience and maximizing marketing performance. To address this problem, this paper presents a multi-label advertising image classification study using popular deep-learning architectures. First, we compile a dedicated dataset for this task and evaluate the performance of traditional deep learning-based models based on the convolutional neural network (CNN) and vision transformer architectures. To ensure the quality of dataset annotations, we introduce an extended Krippendorf's Alpha (α) method based on the Jaccard index to provide a reliable measure of inter-annotation agreement which can address the missing annotations and multiple labels to establish the dataset's annotation consistency. Our results demonstrate that transformer-based architectures like ViT and Swin outperform the CNN-based model's baseline and differential learning rate settings. Through the visualization analysis of saliency maps, we gain insights into the model's decision-making processes and identify the factors influencing their predictions. Furthermore, we assess the impact of annotation quality on model performance, comparing models trained on different annotation reliability levels. Our results indicate that higher annotation consistency, as quantified by α-Jaccard, leads to improved model performance, emphasizing the importance of high-quality datasets in advertising image classification. Beyond traditional deep learning models, we explore the effectiveness of vision language models (VLMs) in this task by employing prompt engineering and comparing their performance with fine-tuned deep learning models. Our findings indicate that while VLMs provide richer contextual annotations, they suffer from over-classification tendencies, subjective biases, and significantly higher computational costs. In contrast, deep learning models remain a more efficient and scalable solution for structured, large-scale advertising classification tasks. Our study gives practical insights for designers and advertisers, demonstrating how deep learning architectures and VLMs can be applied to digital marketing to enhance advertising image classification, reduce testing costs and improve marketing efficiency. Furthermore, our dataset and findings serve as a benchmark for future research in advertising image classification and multimodal AI applications. [ABSTRACT FROM AUTHOR]
Copyright of Multimedia Tools & Applications is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)