Enhancing Semantic Segmentation through Reinforced Active Learning: Combating Dataset Imbalances and Bolstering Annotation Efficiency
Source: By:Author(s)
DOI: https://doi.org/10.30564/jeis.v5i2.6063
Abstract:This research addresses the challenges of training large semantic segmentation models for image analysis, focusing on expediting the annotation process and mitigating imbalanced datasets. In the context of imbalanced datasets, biases related to age and gender in clinical contexts and skewed representation in natural images can affect model performance. Strategies to mitigate these biases are explored to enhance efficiency and accuracy in semantic segmentation analysis. An in-depth exploration of various reinforced active learning methodologies for image segmentation is conducted, optimizing precision and efficiency across diverse domains. The proposed framework integrates Dueling Deep Q-Networks (DQN), Prioritized Experience Replay, Noisy Networks, and Emphasizing Recent Experience. Extensive experimentation and evaluation of diverse datasets reveal both improvements and limitations associated with various approaches in terms of overall accuracy and efficiency. This research contributes to the expansion of reinforced active learning methodologies for image segmentation, paving the way for more sophisticated and precise segmentation algorithms across diverse domains. The findings emphasize the need for a careful balance between exploration and exploitation strategies in reinforcement learning for effective image segmentation.
References:[1] Szeliski, R., 2022. Computer vision: Algorithms and applications. Springer Nature: Berlin. [2] Li, H., Xiong, P., Fan, H., et al., 2019. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation [Internet]. Available from: https://openaccess.thecvf.com/content_CVPR_2019/papers/Li_DFANet_Deep_Feature_Aggregation_for_Real-Time_Semantic_Segmentation_CVPR_2019_paper.pdf [3] Liu, M., Yin, H., 2019. Feature pyramid encoding network for real-time semantic segmentation. arXiv preprint arXiv:1909.08599. DOI: https://doi.org/10.48550/arXiv.1909.08599 [4] Li, X., You, A., Zhu, Z., et al. (editors), 2020. Semantic flow for fast and accurate scene parsing. Computer Vision-ECCV 2020: 16th European Conference; 2020 Aug 23-38; Glasgow, UK. Cham: Springer International Publishing. p. 775-793. DOI: https://doi.org/10.1007/978-3-030-58452-8_45 [5] Yang, X., Wu, Y., Zhao, J., et al., 2020. Dense Dual-Path Network for Real-time Semantic Segmentation [Internet]. Available from: https://openaccess.thecvf.com/content/ACCV2020/papers/Yang_Dense_Dual-Path_Network_for_Real-time_Semantic_Segmentation_ACCV_2020_paper.pdf [6] Orsic, M., Kreso, I., Bevandic, P., et al., 2019. In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images [Internet]. Available from: https://openaccess.thecvf.com/content_CVPR_2019/papers/Orsic_In_Defense_of_Pre-Trained_ImageNet_Architectures_for_Real-Time_Semantic_Segmentation_CVPR_2019_paper.pdf [7] Zhang, H., Tang, W., Na, W., et al., 2020. Implementation of generative adversarial network-CLS combined with bidirectional long short-term memory for lithium-ion battery state prediction. Journal of Energy Storage. 31, 101489. DOI: https://doi.org/10.1016/j.est.2020.101489 [8] Zhang, H., Na, W., Kim, J. (editors), 2018. State-of-charge estimation of the lithium-ion battery using neural network based on an improved thevenin circuit model. 2018 IEEE Transportation Electrification Conference and Expo (ITEC); 2018 Jun 13-15; Long Beach, CA, USA. New York: IEEE. p. 342-346. DOI: https://doi.org/10.1109/ITEC.2018.8450162 [9] Zhang, H., Cheng, S., El Amm, C., et al., 2023. Efficient pooling operator for 3D morphable models. IEEE Transactions on Visualization and Computer Graphics. 1-9. DOI: https://doi.org/10.1109/TVCG.2023.3255820 [10] Han, D., Wang, S., Jiang, C., et al., 2015. Trends in biomedical informatics: Automated topic analysis of JAMIA articles. Journal of the American Medical Informatics Association. 22(6), 1153-1163. DOI: https://doi.org/10.1093/jamia/ocv157 [11] Han, D., Mulyana, B., Stankovic, V., et al., 2023. A survey on deep reinforcement learning algorithms for robotic manipulation. Sensors. 23(7), 3762. DOI: https://doi.org/10.3390/s23073762 [12] Sze, V., Chen, Y.H., Yang, T.J., et al., 2017. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE. 105(12), 2295-2329. DOI: https://doi.org/10.1109/JPROC.2017.2761740 [13] Wang, W., Fu, Y., Pan, Z., et al., 2020. Real-time driving scene semantic segmentation. IEEE Access. 8, 36776-36788. DOI: https://doi.org/10.1109/ACCESS.2020.2975640 [14] Papadeas, I., Tsochatzidis, L., Amanatiadis, A., et al., 2021. Real-time semantic image segmentation with deep learning for autonomous driving: A survey. Applied Sciences. 11(19), 8802. DOI: https://doi.org/10.3390/app11198802 [15] Mahe, H., Marraud, D., Comport, A.I. (editors), 2019. Real-time rgb-d semantic keyframe slam based on image segmentation learning from industrial cad models. 2019 19th International Conference on Advanced Robotics (ICAR); 2019 Dec 2-6; Belo Horizonte, Brazil. New York: IEEE. p. 147-154. DOI: https://doi.org/10.1109/ICAR46387.2019.8981549 [16] Bruce, J., Balch, T., Veloso, M. (editors), 2000. Fast and inexpensive color image segmentation for interactive robots. Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No. 00CH37113); 2000 Oct 31-Nov 5; Takamatsu, Japan. New York: IEEE. p. 2061-2066. DOI: https://doi.org/10.1109/IROS.2000.895274 [17] Du, X., Nie, Y., Wang, F., et al., 2022. AL-Net: Asymmetric lightweight network for medical image segmentation. Frontiers in Signal Processing. 2, 842925. DOI: https://doi.org/10.3389/frsip.2022.842925 [18] Lou, A., Guan, S., Loew, M., 2023. Cfpnet-m: A light-weight encoder-decoder based network for multimodal biomedical image real-time segmentation. Computers in Biology and Medicine. 154, 106579. DOI: https://doi.org/10.1016/j.compbiomed.2023.106579 [19] Cordts, M., Omran, M., Ramos, S., et al., 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding [Internet]. Available from: https://openaccess.thecvf.com/content_cvpr_2016/papers/Cordts_The_Cityscapes_Dataset_CVPR_2016_paper.pdf [20] Gal, Y., Islam, R., Ghahramani, Z., 2017. Deep Bayesian Active Learning with Image Data [Internet]. Available from: https://proceedings.mlr.press/v70/gal17a/gal17a.pdf [21] Chu, H.M., Lin, H.T. (editors), 2016. Can active learning experience be transferred? 2016 IEEE 16th International Conference on Data Mining (ICDM); 2016 Dec 12-15; Barcelona, Spain. New York: IEEE. p. 841-846. DOI: https://doi.org/10.1109/ICDM.2016.0100 [22] Hsu, W.N., Lin, H.T., 2015. Active learning by learning. Proceedings of the AAAI Conference on Artificial Intelligence. 29(1). DOI: https://doi.org/10.1609/aaai.v29i1.9597 [23] Yoo, D., Kweon, I.S., 2019. Learning Loss for Active Learning [Internet]. Available from: https://openaccess.thecvf.com/content_CVPR_2019/papers/Yoo_Learning_Loss_for_Active_Learning_CVPR_2019_paper.pdf [24] Lookman, T., Balachandran, P.V., Xue, D., et al., 2019. Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design. npj Computational Materials. 5(1), 21. DOI: https://doi.org/10.1038/s41524-019-0153-8 [25] Fasel, U., Kutz, J.N., Brunton, B.W., et al., 2022. Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control. Proceedings of the Royal Society A. 478(2260), 20210904. DOI: https://doi.org/10.1098/rspa.2021.0904 [26] Hu, Z., Bai, X., Zhang, R., et al., 2022. Lidal: Inter-frame uncertainty based active learning for 3d lidar semantic segmentation. European Conference on Computer Vision. 13687, 248-265. DOI: https://doi.org/10.1007/978-3-031-19812-0_15 [27] Lenczner, G., Chan-Hon-Tong, A., Le Saux, B., et al., 2022. Dial: Deep interactive and active learning for semantic segmentation in remote sensing. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 15, 3376-3389. DOI: https://doi.org/10.1109/JSTARS.2022.3166551 [28] Xie, B., Yuan, L., Li, S., et al., 2022. Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation [Internet]. Available from: https://openaccess.thecvf.com/content/CVPR2022/papers/Xie_Towards_Fewer_Annotations_Active_Learning_via_Region_Impurity_and_Prediction_CVPR_2022_paper.pdf [29] Dhiman, G., Kumar, A.V., Nirmalan, R., et al., 2023. Multi-modal active learning with deep reinforcement learning for target feature extraction in multi-media image processing applications. Multimedia Tools and Applications. 82(4), 5343-5367. DOI: https://doi.org/10.1007/s11042-022-12178-7 [30] Zhou, W., Li, J., Zhang, Q., 2022. Joint communication and action learning in multi-target tracking of UAV swarms with deep reinforcement learning. Drones. 6(11), 339. DOI: https://doi.org/10.3390/drones6110339 [31] Hu, M., Zhang, J., Matkovic, L., et al., 2023. Reinforcement learning in medical image analysis: Concepts, applications, challenges, and future directions. Journal of Applied Clinical Medical Physics. 24(2), e13898. DOI: https://doi.org/10.1002/acm2.13898 [32] Casanova, A., Pinheiro, P.O., Rostamzadeh, N., et al., 2020. Reinforced active learning for image segmentation. arXiv preprint arXiv:2002.06583. DOI: https://doi.org/10.48550/arXiv.2002.06583 [33] Wang, Z., Schaul, T., Hessel, M., et al., 2016. Dueling Network Architectures for Deep Reinforcement Learning [Internet]. Available from: https://proceedings.mlr.press/v48/wangf16.pdf [34] Schaul, T., Quan, J., Antonoglou, I., et al., 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952. DOI: https://doi.org/10.48550/arXiv.1511.05952 [35] Fortunato, M., Azar, M.G., Piot, B., et al., 2017. Noisy networks for exploration. arXiv preprint arXiv:1706.10295. DOI: https://doi.org/10.48550/arXiv.1706.10295 [36] Wang, C., Ross, K., 2019. Boosting soft actor-critic: Emphasizing recent experience without forgetting the past. arXiv preprint arXiv:1906.04009. DOI: https://doi.org/10.48550/arXiv.1906.04009 [37] Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al., 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. DOI: https://doi.org/10.48550/arXiv.1509.02971 [38] Tokic, M., Palm, G., 2011. Value-difference based exploration: Adaptive control between epsilon-greedy and softmax. Annual conference on artificial intelligence. Springer: Berlin. pp. 335-346. DOI: https://doi.org/10.1007/978-3-642-24455-1_33 [39] Brostow, G.J., Shotton, J., Fauqueur, J., et al. (editors), 2008. Segmentation and recognition using structure from motion point clouds. Computer Vision-ECCV 2008: 10th European Conference on Computer Vision; 2008 Oct 12-18; Marseille, France. Berlin: Springer. p. 44-57. DOI: https://doi.org/10.1007/978-3-540-88682-2_5 [40] Angluin, D., 1988. Queries and concept learning. Machine Learning. 2, 319-342. DOI: https://doi.org/10.1023/A:1022821128753 [41] King, R.D., Whelan, K.E., Jones, F.M., et al., 2004. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature. 427(6971), 247-252. DOI: https://doi.org/10.1038/nature02236 [42] Dagan, I., Engelson, S.P. (editors), 1995. Committee-based sampling for training probabilistic classifiers. Machine Learning Proceedings 1995; 1995 Jul 9-12; Tahoe, California. p. 150-157. DOI: https://doi.org/10.1016/B978-1-55860-377-6.50027-X [43] Krishnamurthy, V., 2002. Algorithms for optimal scheduling and management of hidden Markov model sensors. IEEE Transactions on Signal Processing. 50(6), 1382-1397. DOI: https://doi.org/10.1109/TSP.2002.1003062 [44] Lewis, D.D., 1995. A Sequential Algorithm for Training Text Classifiers: Corrigendum and Additional Data [Internet]. Available from: https://dl.acm.org/doi/pdf/10.1145/219587.219592 [45] Ren, P., Xiao, Y., Chang, X., et al., 2021. A survey of deep active learning. ACM Computing Surveys (CSUR). 54(9), 1-40. DOI: https://doi.org/10.1145/3472291 [46] Shui, C., Zhou, F., Gagné, C., et al., 2020. Deep Active Learning: Unified and Principled Method for Query and Training [Internet]. Available from: https://proceedings.mlr.press/v108/shui20a/shui20a.pdf [47] Settles, B., 2012. Active learning, volume 6 of synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool. [48] Settles, B., 2011. From Theories to Queries: Active Learning in Practice [Internet]. Available from: https://proceedings.mlr.press/v16/settles11a/settles11a.pdf [49] Hernández-Lobato, J.M., Adams, R., 2015. Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks [Internet]. Available from: http://proceedings.mlr.press/v37/hernandez-lobatoc15.pdf [50] Osugi, T., Kim, D., Scott, S. (editors), 2005. Balancing exploration and exploitation: A new algorithm for active machine learning. Fifth IEEE International Conference on Data Mining (ICDM'05); 2005 Nov 27-30; Houston, TX, USA. New York: IEEE. DOI: https://doi.org/10.1109/ICDM.2005.33 [51] Long, C., Hua, G., 2015. Multi-Class Multi-Annotator Active Learning with Robust Gaussian Process for Visual Recognition [Internet]. Available from: https://openaccess.thecvf.com/content_iccv_2015/papers/Long_Multi-Class_Multi-Annotator_Active_ICCV_2015_paper.pdf [52] Gong, J., Fan, Z., Ke, Q., et al., 2022. Meta Agent Teaming Active Learning for Pose Estimation [Internet]. Available from: https://openaccess.thecvf.com/content/CVPR2022/papers/Gong_Meta_Agent_Teaming_Active_Learning_for_Pose_Estimation_CVPR_2022_paper.pdf [53] Sadigh, D., Dragan, A.D., Sastry, S., et al., 2017. Active preference-based learning of reward functions. UC Berkeley: Berkeley. DOI: https://doi.org/10.15607/rss.2017.xiii.053 [54] Kunapuli, G., Odom, P., Shavlik, J.W., et al. (editors), 2013. Guiding autonomous agents to better behaviors through human advice. 2013 IEEE 13th International Conference on Data Mining; 2013 Dec 7-10; Dallas, TX, USA. New York: IEEE. p. 409-418. DOI: https://doi.org/10.1109/ICDM.2013.79 [55] Ezzeddine, A., Mourad, N., Araabi, B.N., et al., 2018. Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement. Expert Systems with Applications. 112, 331-341. DOI: https://doi.org/10.1016/j.eswa.2018.06.035 [56] Liu, M., Buntine, W., Haffari, G. (editors), 2018. Learning how to actively learn: A deep imitation learning approach. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2018 Jul 15-20; Melbourne, Australia. p. 1874-1883. DOI: https://doi.org/10.18653/v1/P18-1174 [57] Pang, K., Dong, M., Wu, Y., et al., 2018. Meta-learning transferable active learning policies by deep reinforcement learning. arXiv preprint arXiv:1806.04798. DOI: https://doi.org/10.48550/arXiv.1806.04798 [58] Contardo, G., Denoyer, L., Artières, T., 2017. A meta-learning approach to one-step active learning. arXiv preprint arXiv:1706.08334. DOI: https://doi.org/10.48550/arXiv.1706.08334 [59] Sener, O., Savarese, S., 2017. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489. DOI: https://doi.org/10.48550/arXiv.1708.00489 [60] Mittal, S., Niemeijer, J., Schäfer, J.P., et al., 2023. Revisiting deep active learning for semantic segmentation. arXiv preprint arXiv:2302.04075. DOI: https://doi.org/10.48550/arXiv.2302.04075 [61] Shannon, C.E., 1948. A mathematical theory of communication. The Bell System Technical Journal. 27(3), 379-423. DOI: https://doi.org/10.1002/j.1538-7305.1948.tb01338.x [62] Houlsby, N., Huszár, F., Ghahramani, Z., et al., 2011. Bayesian active learning for classification and preference learning. arXiv preprint arXiv:1112.5745. DOI: https://doi.org/10.48550/arXiv.1112.5745 [63] Kampffmeyer, M., Salberg, A.B., Jenssen, R., 2016. Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks [Internet]. Available from: https://www.cv-foundation.org//openaccess/content_cvpr_2016_workshops/w19/papers/Kampffmeyer_Semantic_Segmentation_of_CVPR_2016_paper.pdf [64] Jain, S.D., Grauman, K., 2016. Active Image Segmentation Propagation [Internet]. Available from: https://openaccess.thecvf.com/content_cvpr_2016/papers/Jain_Active_Image_Segmentation_CVPR_2016_paper.pdf [65] Vezhnevets, A., Buhmann, J.M., Ferrari, V. (editors), 2012. Active learning for semantic segmentation with expected change. 2012 IEEE Conference on Computer Vision and Pattern Recognition; 2012 Jun 16-21; Providence, RI, USA. New York: IEEE. p. 3162-3169. DOI: https://doi.org/10.1109/CVPR.2012.6248050 [66] Konyushkova, K., Sznitman, R., Fua, P., 2015. Introducing Geometry in Active Learning for Image Segmentation [Internet]. Available from: https://openaccess.thecvf.com/content_iccv_2015/papers/Konyushkova_Introducing_Geometry_in_ICCV_2015_paper.pdf [67] Aklilu, J., Yeung, S., 2022. ALGES: Active Learning with Gradient Embeddings for Semantic Segmentation of Laparoscopic Surgical Images [Internet]. Available from: https://proceedings.mlr.press/v182/aklilu22a [68] Shu, X., Yang, Y., Xie, R., et al., 2022. ALS: Active learning-based image segmentation model for skin lesion. DOI: http://dx.doi.org/10.2139/ssrn.4141765 [69] Golestaneh, S.A., Kitani, K.M., 2020. Importance of self-consistency in active learning for semantic segmentation. arXiv preprint arXiv:2008.01860. DOI: https://doi.org/10.48550/arXiv.2008.01860 [70] Mackowiak, R., Lenz, P., Ghori, O., et al., 2018. Cereals-cost-effective region-based active learning for semantic segmentation. arXiv preprint arXiv:1810.09726. DOI: https://doi.org/10.48550/arXiv.1810.09726 [71] Van Hasselt, H., Guez, A., Silver, D., 2016. Deep reinforcement learning with double q-learning. Proceedings of the AAAI Conference on Artificial Intelligence. 30(1). DOI: https://doi.org/10.1609/aaai.v30i1.10295 [72] Hasselt, H., 2010. Double Q-learning. Advances in Neural Information Processing Systems. 23, 2613-2621. [73] Babaeizadeh, M., Frosio, I., Tyree, S., et al., 2016. Reinforcement learning through asynchronous advantage actor-critic on a gpu. arXiv preprint arXiv:1611.06256. DOI: https://doi.org/10.48550/arXiv.1611.06256 [74] Richter, S.R., Vineet, V., Roth, S., et al. (editors), 2016. Playing for data: Ground truth from computer games. Computer Vision-ECCV 2016: 14th European Conference; 2016 Oct 11-14; Amsterdam, The Netherlands. Cham: Springer International Publishing. p. 102-118. DOI: https://doi.org/10.1007/978-3-319-46475-6_7