Current Issue
Romanian Journal of Information Technology and Automatic Control / Vol. 36, No. 1, 2026
Software defect prediction at file level: A hybrid deep learning approach combining source code metrics and semantic features
Esrael GEREMEW, Mekonnen WAGAW
The necessity for trustworthy Software Defect Prediction (SDP) models is highlighted by the increasing complexity of contemporary software systems. These models facilitate the effective use of scarce testing resources by early detection of potentially defective modules. Although deep learning has demonstrated promise in learning characteristics from source code, the efficacy of current methods is generally limited by their reliance on a particular sort of information, such as hand-crafted code metrics or semantic features from code structure. One of the biggest challenges is still integrating several data types into a single, discriminative feature set. In order to forecast file-level defects, this study presents a unique approach that blends semantic characteristics with source code metrics. In addition to Combined Defect Data Modelling (CDDM) we suggest Learning Hybrid Feature Representation (LHFR), a deep neural network model. LHFR combines a Multi-Layer Perceptron (MLP) to learn from manually constructed metrics with a Bidirectional Long Short-Term Memory (Bi-LSTM) network to extract semantic features from Abstract Syntax Trees (ASTs). With an average F-measure of 69.08%, LHFR outperforms models based solely on metrics or semantic characteristics when tested on 12 open-source Java projects. A new combined dataset, an improved feature set and a hybrid representation strategy that significantly enhances fault detection performance are among the contributions.
Keywords:
Software Defect Prediction, Deep learning, Source code metrics, Semantic features, Abstract Syntax Trees, Feature representation learning.
CITE THIS PAPER AS:
Esrael GEREMEW,
Mekonnen WAGAW,
"Software defect prediction at file level: A hybrid deep learning approach combining source code metrics and semantic features",
Romanian Journal of Information Technology and Automatic Control,
ISSN 1220-1758,
vol. 36(1),
pp. 77-90,
2026.
https://doi.org/10.33436/v36i1y202606