Treffer: Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey.
Weitere Informationen
Background: Data Augmentation (DA) has become a critical approach in Time Series Classification (TSC), primarily for its capacity to expand training datasets, enhance model robustness, introduce diversity, and reduce overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative measures, and a dearth of accessible and user-oriented tools. Objectives: This study addresses these challenges through a comprehensive examination of DA methodologies within the TSC domain. Methods: Our research began with an extensive literature review spanning a decade, revealing significant gaps in existing surveys and necessitating a detailed analysis of over 100 scholarly articles to identify more than 60 distinct DA techniques. This rigorous review led to the development of a novel taxonomy tailored to the specific needs of DA in TSC, categorizing techniques into five primary categories to guide researchers in selecting appropriate methods with greater clarity. In response to the lack of comprehensive evaluations of foundational DA techniques, we conducted a thorough empirical study, testing nearly 20 DA strategies across 15 diverse datasets representing all types within the UCR time-series repository. To improve practical use, we have consolidated most of these methods into a unified Python Library, whose user-friendly interface facilitates experimenting with various augmentation techniques, offering practitioners and researchers a more convenient tool for innovation than currently available options. Results: Using ResNet and LSTM architectures, we employed a multifaceted evaluation approach, including metrics such as Accuracy, Method Ranking, and Residual Analysis, resulting in a benchmark accuracy of 84.98 ± 16.41% in ResNet and 82.41 ± 18.71% in LSTM. Our investigation underscored the inconsistent efficacies of DA techniques, for instance, methods like RGWs (with an average rank of 7.13 and average accuracy of 83.42 ± 17.53% in LSTM) and Random Permutation significantly improved model performance, whereas others, like EMD, were less effective. Furthermore, we found that the intrinsic characteristics of datasets significantly influence the success of DA methods, leading to targeted recommendations based on empirical evidence to help practitioners select the most suitable DA techniques for specific datasets. Conclusions: In essence, this research presents an integrative perspective on the contemporary landscape of data augmentation for time series classification, combining theoretical frameworks with empirical evidence. The revelations and resources introduced herein are positioned to catalyze continued progress in this domain, fortifying machine learning models against the challenges posed by data limitations, and enhancing their generalizability and robustness. [ABSTRACT FROM AUTHOR]
Copyright of Journal of Artificial Intelligence Research is the property of AI Access Foundation and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)