Uncertainty-Aware Ensemble Models for Improved Defect Detection in Noisy Data
- 1 Department of Computer Science and Engineering, AI&ML, GMR Institute of Technology (GMRIT), Rajam, Andhra Pradesh, India
- 2 Department of Computer Science and Engineering, Aditya University, Aditya Nagar, Surampalem, Andhra Pradesh, India
- 3 CSE (AI and ML) Department, Pragati Engineering College, Surampalem, Andhra Pradesh, India
- 4 Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation (KLEF), K L University, Guntur, Andhra Pradesh, India
- 5 Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation (KLEF), K L University, Guntur, Andhra Pradesh, India
Abstract
Software defect prediction plays a crucial role in ensuring software quality and reliability, especially as modern systems become more complex and data rich. This study introduces an uncertainty-aware ensemble learning framework aimed at improving defect classification performance in noisy and imbalanced datasets, particularly those from the PROMISE and NASA KC1 repositories. The proposed model integrates multiple classifiers in a multi-learner ensemble structure to enhance generalization, improve true positive rates, and address the limitations of conventional single-model approaches. Key techniques include chi-square-based feature selection, ensemble pruning to avoid overfitting, and neural network-based classification through Extreme Learning Machines (ELMs). The methodology emphasizes the use of both homogeneous and heterogeneous ensembles, with training and prediction phases structured to handle data sparsity, high dimensionality, and class imbalance. Runtime experiments using decision trees, Naïve Bayes, and cost-sensitive learning demonstrated superior results for the ensemble model compared to traditional classifiers. Evaluation metrics such as accuracy, F-measure (0.9729), recall (0.7143), true positive rate (0.9857), and ROC AUC further validated the ensemble’s predictive robustness. Experimental results on the KC1 dataset showed that the proposed model outperformed baseline models in both accuracy and area under the ROC curve. Advanced data balancing techniques, including under-sampling, over-sampling, and active learning, were employed to improve the model’s ability to identify minority class instances. These findings suggest that uncertainty-aware ensemble approaches are effective tools for improving defect detection, particularly in noisy and imbalanced environments.
DOI: https://doi.org/10.3844/jcssp.2026.1396.1405
Copyright: © 2026 Madhavi Perla, Gadi Lava Raju, A Radha Krishna, E. Sree Devi, Bechoo Lal, Aruna Bhaskar K and Solleti Phani Kumar. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 39 Views
- 22 Downloads
- 0 Citations
Download
Keywords
- Software Defect Prediction
- Ensemble Learning
- Promise Dataset
- Neural Networks Class Imbalance
- ROC Curve
- Extreme Learning Machine (ELM)
- Naïve Bayes
- Feature Selection
- Software Quality