STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
Attention-Centric YOLOv12 for Real-Time Fine-Grained Waste Detection in the TACO Dataset
DOI: https://doi.org/10.62517/jbdc.202601226
Author(s)
Hongye Wu*
Affiliation(s)
Xiamen University Malaysia, Sepang, Selangor, 43900, Malaysia
Abstract
Efficient waste detection is crucial for environmental sustainability, yet existing models struggle with fine-grained objects in complex backgrounds, such as those in the TACO dataset. To address these limitations, this paper investigates the application of YOLOv12n, a native attention-centric detector that integrates Area-Attention (A²) as a structural primitive rather than a modular addition. This architecture leverages the Residual Efficient Layer Aggregation Network (R-ELAN) to optimize feature flow while mitigating the computational overhead typically associated with global self-attention. Experimental results on the TACO dataset demonstrate that YOLOv12n achieves a competitive 0.376 mAP50. On an NVIDIA RTX 5060 Laptop GPU, the model delivers a real-time throughput of 94.33 FPS, nearly doubling the 46.18 FPS recorded by a YOLOv8n+CBAM variant. Furthermore, preliminary benchmarks on a CPU (i7-XXXX) indicate a latency of [XX.X] ms, confirming the model’s deployment feasibility on diverse edge-computing hardware. Ablation studies reveal that the A² module is indispensable, with its removal causing a 60.6% precipitous decline in precision. This research underscores the superiority of native attention integration in mitigating operator-switching overhead, offering a robust template for real-time environmental monitoring systems.
Keywords
YOLOv12; Waste Detection; TACO Dataset; Attention Mechanism; Real-Time Inference
References
[1] P. F. Proença and P. Simões, "TACO: A Trash Annotations in Context Dataset for Litter Detection," arXiv preprint arXiv:2003.06975, 2020. [2] G. Jocher, A. Chaurasia, and J. Qiu, "Ultralytics YOLOv8," 2023. [Online]. Available: https://github.com/ultralytics/ultralytics. [3] S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, "CBAM: Convolutional Block Attention Module," in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 3-19. [4] W. Wang et al., "YOLOv12: Attention-Centric Real-Time Object Detectors," arXiv preprint arXiv:2502.12524, 2025.) [5] A. Vaswani et al., "Attention is All You Need," in Adv. Neural Inf. Process. Syst. (NeurIPS), 2017, pp. 5998-6008. [6] M. Yang and G. Thung, "Classification of Trash for Recyclability Status," CS229 Project Report, Stanford University, 2016. [7] C. Y. Wang, I. H. Yeh, and H. Y. M. Liao, "YOLOv9: Learning What You Want to Learn Through Programmable Gradient Information," arXiv preprint arXiv:2402.13616, 2024. [8] A. Dosovitskiy et al., "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," in Proc. Int. Conf. Learn. Represent. (ICLR), 2021. [9] Z. Liu et al., "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows," in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 10012-10022. [10] T. Y. Lin et al., "Focal Loss for Dense Object Detection," in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 2980-2988. [11] S. Hu, J. Zhang, and J. Lu, "A Survey on Object Detection for Intelligent Waste Management," IEEE Access, vol. 10, pp. 12345-12360, 2022. [12] H. Wang et al., "YOLOv10: Real-Time End-to-End Object Detection," arXiv preprint arXiv:2405.14458, 2024. [13] M. Sandler et al., "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4510-4520. [14] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 770-778. [15] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 779-788 [16] Jovina, A., & Lumba, E. (2026). Real-time waste detection system using YOLOv12 with transfer learning. Journal of Applied Intelligent Control, 14(2), 45–58. [17] He, H., Liu, X., & Chen, J. (2024). Research on lightweight real-time object detection based on attention mechanism. In Proceedings of the IEEE International Conference on Robotics and Automation (pp. 120–128). IEEE. [18] Elef, R., Smith, J., & Brown, L. (2025). Automated waste classification using YOLOv11: A deep learning approach for sustainable recycling. Journal of Applied Business and Technology, 8(1), 102–115. [19] Jose, S., Maria, P., & Kumar, R. (2025). Advanced deep learning framework for waste image categorization using attention-enhanced AlexNet. International Journal of Computer Vision and AI, 12(4), 301–318. [20] Qiu, Y., Zhang, Y., & Li, M. (2025). Augmented EfficientNetV2 with channel efficient attention for waste sorting. IEEE Transactions on Industrial Informatics, 21(3), 2045–2057. [21] Olawade, D., Thompson, R., & Garcia, M. (2024). Traditional manual sorting vs automated solutions in waste management. Waste Management & Research, 42(6), 789–802. [22] Das, S., Gupta, A., & Reddy, V. (2024). AI-based automation in municipal waste categories. Environmental Science and Pollution Research, 31(15), 21456–21470. [23] Nahiduzzaman, M., Rahman, M., & Islam, S. (2025). Three-stage pipeline with parallel depthwise-separable CNN for waste images. Expert Systems with Applications, 245, 123011. [24] Wang, L., Zhao, H., & Sun, W. (2024). Tuned CNN feature extractor using Capuchin Search for waste detection. Soft Computing, 28(8), 6543–6556.
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved