A Methodology for Advanced Manufacturing Defect Detection through Self-Supervised Learning on X-ray Images
- Intxausti, Eneko 1
- Skočaj, Danijel 2
- Cernuda, Carlos 1
- Zugasti, Ekhi 1
- 1 Electronics and Computer Science, Mondragon Unibertsitatea, Loramendi 4, 20500 Mondragon, Spain
- 2 Faculty of Computer and Information Science, University of Ljubljana, Večna Pot 113, 1000 Ljubljana, Slovenia
ISSN: 2076-3417
Año de publicación: 2024
Volumen: 14
Número: 7
Páginas: 2785
Tipo: Artículo
Otras publicaciones en: Applied Sciences
Resumen
In industrial quality control, especially in the field of manufacturing defect detection, deep learning plays an increasingly critical role. However, the efficacy of these advanced models is often hindered by their need for large-scale, annotated datasets. Moreover, these datasets are mainly based on RGB images, which are very different from X-ray images. Addressing this limitation, our research proposes a methodology that incorporates domain-specific self-supervised pretraining techniques using X-ray imaging to improve defect detection capabilities in manufacturing products. We employ two pretraining approaches, SimSiam and SimMIM, to refine feature extraction from manufacturing images. The pretraining stage is carried out using an industrial dataset of 27,901 unlabeled X-ray images from a manufacturing production line. We analyze the performance of the pretraining against transfer-learning-based methods in a complex defect detection scenario using a Faster R-CNN model. We conduct evaluations on both a proprietary industrial dataset and the publicly available GDXray dataset. The findings reveal that models pretrained with domain-specific X-ray images consistently outperform those initialized with ImageNet weights. Notably, Swin Transformer models show superior results in scenarios rich in labeled data, whereas CNN backbones are more effective in limited-data environments. Moreover, we underscore the enhanced ability of the models pretrained with X-ray images in detecting critical defects, crucial for ensuring safety in industrial settings. Our study offers substantial evidence of the benefits of self-supervised learning in manufacturing defect detection, providing a solid foundation for further research and practical applications in industrial quality control.
Información de financiación
Eneko intxausti, Carlos Cernuda, and Ekhi Zugasti are part of the Intelligent Systems for Industrial Systems research group of Mondragon Unibertsitatea (IT1676-22), supported by the Department of Education, Universities and Research of the Basque Country. They are also supported by the DREMIND project of the Basque Government under grant KK-2022/00049 from the ELKARTEK program.Financiadores
- Department of Education, Universities and Research of the Basque Country
-
ELKARTEK program
- KK-2022/00049
Referencias bibliográficas
- Kuo, (2022), Int. J. Adv. Manuf. Technol., 120, pp. 2457, 10.1007/s00170-022-08841-w
- LeCun, (2015), Nature, 521, pp. 436, 10.1038/nature14539
- Ferguson, M., Ak, R., Lee, Y.T.T., and Law, K.H. (2017, January 11–14). Automatic Localization of Casting Defects with Convolutional Neural Networks. Proceedings of the IEEE International Conference on Big Data (Big Data), Boston, MA, USA.
- Ferguson, (2018), Smart Sustain. Manuf. Syst., 2, pp. 20180033, 10.1520/SSMS20180033
- Du, (2021), J. Intell. Manuf., 32, pp. 141, 10.1007/s10845-020-01566-1
- Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Li, F.-F. (2009, January 20–25). ImageNet: A Large-Scale Hierarchical Image Database. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.
- Mery, (2002), IEEE Trans. Robot. Autom., 18, pp. 890, 10.1109/TRA.2002.805646
- Mery, D., and Arteta, C. (2017, January 24–31). Automatic Defect Recognition in X-ray Testing Using Computer Vision. Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA.
- Li, (2006), IEEE Trans. Ind. Electron., 53, pp. 1927, 10.1109/TIE.2006.885448
- Mery, (2006), Mater. Eval., 65, pp. 643
- Tsai, (2003), Image Vis. Comput., 21, pp. 307, 10.1016/S0262-8856(03)00007-6
- Zhao, (2015), Neurocomputing, 153, pp. 1, 10.1016/j.neucom.2014.11.057
- Du, (2019), NDT Int., 107, pp. 102144, 10.1016/j.ndteint.2019.102144
- Mery, (2021), Mach. Vis. Appl., 32, pp. 72, 10.1007/s00138-021-01195-5
- Mery, (2022), J. Nondestruct. Eval., 41, pp. 21, 10.1007/s10921-022-00851-8
- Parlak, (2023), Eng. Appl. Artif. Intell., 118, pp. 105636, 10.1016/j.engappai.2022.105636
- Wang, (2020), Comput. Electr. Eng., 88, pp. 106821, 10.1016/j.compeleceng.2020.106821
- (2022), J. Nondestruct. Eval., 41, pp. 11, 10.1007/s10921-021-00842-1
- Ren, (2017), IEEE Trans. Pattern Anal. Mach. Intell., 39, pp. 1137, 10.1109/TPAMI.2016.2577031
- Mery, (2015), J. Nondestruct. Eval., 34, pp. 42, 10.1007/s10921-015-0315-7
- Fleet, (2014), Computer Vision–ECCV 2014, Volume 8693, pp. 740, 10.1007/978-3-319-10602-1_48
- Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21–26). Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.
- He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2017, January 22–29). Mask R-Cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.
- Alzubaidi, (2023), J. Big Data, 10, pp. 46, 10.1186/s40537-023-00727-2
- Jing, (2021), IEEE Trans. Pattern Anal. Mach. Intell., 43, pp. 4037, 10.1109/TPAMI.2020.2992393
- Liu, (2021), IEEE Trans. Knowl. Data Eng., 35, pp. 857
- Van Den Oord, A., Kalchbrenner, N., and Kavukcuoglu, K. (2016, January 20–22). Pixel Recurrent Neural Networks. Proceedings of the International Conference on Machine Learning, New York, NY, USA.
- He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2020, January 13–19). Momentum Contrast for Unsupervised Visual Representation Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.
- Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020, January 13–18). A Simple Framework for Contrastive Learning of Visual Representations. Proceedings of the International Conference on Machine Learning, Virtual.
- Goodfellow, (2020), Commun. ACM, 63, pp. 139, 10.1145/3422622
- Wu, Z., Xiong, Y., Yu, S.X., and Lin, D. (2018, January 18–22). Unsupervised Feature Learning via Non-parametric Instance Discrimination. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.
- Robinson, J., Chuang, C.Y., Sra, S., and Jegelka, S. (2021). Contrastive Learning with Hard Negative Samples. arXiv.
- Kalantidis, (2020), Adv. Neural Inf. Process. Syst., 33, pp. 21798
- Grill, (2020), Adv. Neural Inf. Process. Syst., 33, pp. 21271
- Chen, X., and He, K. (2021, January 20–25). Exploring Simple Siamese Representation Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., and Polosukhin, I. (2017). Attention Is All You Need. arXiv.
- Khan, (2022), ACM Comput. Surv., 54, pp. 1, 10.1145/3505244
- Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10–17). Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada.
- Fan, H., Xiong, B., Mangalam, K., Li, Y., Yan, Z., Malik, J., and Feichtenhofer, C. (2021, January 11–17). Multiscale Vision Transformers. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.
- Li, Y., Wu, C.Y., Fan, H., Mangalam, K., Xiong, B., Malik, J., and Feichtenhofer, C. (2022, January 18–24). MViTv2: Improved Multiscale Vision Transformers for Classification and Detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.
- Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv.
- He, K., Chen, X., Xie, S., Li, Y., Dollár, P., and Girshick, R. (2022, January 18–24). Masked Autoencoders Are Scalable Vision Learners. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.
- Zhou, L., Liu, H., Bae, J., He, J., Samaras, D., and Prasanna, P. (2023, January 17–21). Self Pre-Training with Masked Autoencoders for Medical Image Classification and Segmentation. Proceedings of the IEEE 20th International Symposium on Biomedical Imaging (ISBI), Cartagena, Colombia.
- Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., and Hu, H. (2022, January 18–24). SimMIM: A Simple Framework for Masked Image Modeling. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.
- He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.
- Tian, (2020), Adv. Neural Inf. Process. Syst., 33, pp. 6827
- Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., and Dong, L. (2022, January 18–24). Swin Transformer v2: Scaling up Capacity and Resolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.
- Caron, (2020), Adv. Neural Inf. Process. Syst., 33, pp. 9912
- Navab, (2015), Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, Volume 9351, pp. 234