Result: Oversized ore segmentation using SAM-enhanced U-Net with self-supervised pre-training and semi-supervised self-training.
Further Information
• Collect a dataset comprising 32094 images for oversized ore segmentation. • Adopt semantic segmentation technology to extract ore contours. • Use transfer learning to improve the model feature extraction ability. • Apply self-supervised pre-training to utilize unlabeled images. • Employ semi-supervised self-training to generate pseudo labels. [Display omitted] Iron ore is a vital industrial material, and ensuring its safe production remains a major research priority for mining enterprises. A common challenge arises when oversized ores are transported on conveyor belts, where they can lead to blockages that, if not quickly detected and addressed, may escalate into more severe safety incidents. In recent years, machine vision technologies driven by deep learning have been utilized to automatically detect large pieces within ore clusters. Current approaches predominantly depend on supervised learning, which requires extensive manual labels to train neural networks. However, the dense distribution and complex shapes of ores make label generation both time-consuming and challenging. As a result, relying solely on a small amount of manual labels limits the model's accuracy and wastes a significant set of unlabeled images. This gap in utilizing unlabeled data and achieving higher accuracy necessitates an improved solution. To address this issue, we propose a SAM-enhanced U-Net model for oversized ore segmentation combining self-supervised pre-training (SSPT), which trains a model on substantial unlabeled data to learn prior knowledge, and semi-supervised self-training (SSST), which leverages a small set of labeled data along with a large volume of pseudo-labels generated by itself. Specifically, to obtain a more powerful feature extractor, the encoder of U-Net is firstly replaced with Vision Transformer Base (ViT-B) from the Segment Anything Model (SAM) based on transfer learning, constructing the SV-Unet. Secondly, to further enhance the backbone representation capability, SV-Unet undergoes an SSPT procedure using Masked AutoEncoder (MAE) on unlabeled ore images. Finally, SV-Unet is iteratively trained adopting an SSST strategy using artificial labels alongside qualified pseudo-labels. The experimental results show that the proposed method boosts the baseline model's average precision for mask (AP M) from 36.04 % to 65.08 %, primarily due to the increase in the utilization of ore images from 2043 to 31049, highlighting the effectiveness of reasonably leveraging large-scale unlabeled data to enhance model performance. [ABSTRACT FROM AUTHOR]