STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
Research on Decision Tree Classification Algorithm Based on K-Nearest Neighbor Algorithms Guidance
DOI: https://doi.org/10.62517/jbdc.202501201
Author(s)
Jianmei Chen, Xiaojun Ding*
Affiliation(s)
School of Computer Science and Engineering, Yulin Normal University, Yulin, Guangxi, China Corresponding Author.
Abstract
Decision trees and k-nearest neighbor algorithms are classic classification methods in machine learning. Decision trees clearly display classification logic in a tree structure and are highly interpretable, but they are prone to overfitting in high-dimensional data and ignoring local details; k-nearest neighbors capture features through voting, which is suitable for local patterns but lacks global grasp. The KNN_DT algorithm innovatively combines the advantages of both, with both local flexibility and global control. This report deeply analyzes the core ideas and principles of the KNN_DT algorithm, aiming to provide a solid theoretical foundation and reference for its research and application, and to promote efficient and accurate data processing and application transformation in various industries.
Keywords
Decision Trees; K-Nearest Neighbor Algorithms; Local Information; Global Information
References
[1] Krall, M.A., A.V. Gundlapalli and M.H. Samore, Chapter 13 - Big Data and Population-Based Decision Support, in Clinical Decision Support (Second Edition), R.A. Greenes, R.A. Greenes^Editors. 2014, Academic Press: Oxford. p. 363-381. [2] Zhao F, Zhang M, Zhou S, Lou Q. Detection of network security traffic anomalies based on machine learning KNN method. Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023 2024; 1:209-18. [3] Le, L., Y. Xie and V.V. Raghavan, KNN loss and deep KNN. Fundamenta informaticae, 2021. 182(2): p. 95-110. [4] Deng Z, Loyher P, Lazarov T, Li L, Shen Z, et al. The nuclear factor ID3 endows macrophages with a potent anti-tumour activity. Nature 2024; 626:864-73. [5] Febriani, S., Analisis Data Hasil Diagnosa Untuk Klasifikasi Gangguan Kepribadian Menggunakan Algoritma C4. 5. Jurnal Ilmu Data, 2022. 2(9). [6] Agustin, R. and S. Defit, Perbandingan Algoritma CART dan C. 4 5 Pada Citra Tandan Buah Sawit Untuk Mengetahui Tingkat Kematangan Dalam Penentuan Harga. Jurnal KomtekInfo, 2024: p. 263-273. [7] Del Río S, López V, Benítez JM, Herrera F. On the use of MapReduce for imbalanced big data using Random Forest. Inf Sci 2014; 285:112-37. 10.1016/j.ins.2014.03.043. [8] Hashemizadeh A, Maaref A, Shateri M, Larestani A, Hemmati-Sarapardeh A. Experimental measurement and modeling of water-based drilling mud density using adaptive boosting decision tree, support vector machine, and K-nearest neighbors: A case study from the South Pars gas field. J Pet Sci Eng 2021; 207:109132. [9] Ren J, Fort S, Liu J, Roy AG, Padhy S, et al., A simple fix to mahalanobis distance for improving near-ood detection. arXiv preprint arXiv:2106.09022, 2021. [10] Gutiérrez, G., R. Torres-Avilés and M. Caniupán, cKd-tree: A Compact Kd-tree. IEEE Access, 2024. 12: p. 28666-28676. [11] Zhang L, Wang G, Peng L, Peng W, Zhang J. Applying pareto frontier theory and ball tree algorithms to optimize growth boundaries for sustainable mountain cities. Journal of Urban Management, 2024. [12] Dinesh, P., A.S. Vickram and P. Kalyanasundaram. Medical image prediction for diagnosis of breast cancer disease comparing the machine learning algorithms: SVM, KNN, logistic regression, random forest and decision tree to measure accuracy. in AIP Conference Proceedings. 2024: AIP Publishing.
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved