Welcome to the replication package for DEFault, a framework designed to improve the detection and diagnosis of faults in Deep Neural Networks (DNNs). This repository provides all the necessary code and data to reproduce the experiments from our paper accepted at ICSE - Research Track 2025:
"Improved Detection and Diagnosis of Faults in Deep Neural Networks using Hierarchical and Explainable Classification."
The pre-print of the paper is available in this repository as K_Pre-Print.pdf.
DEFault is a hierarchical classification framework that improves fault detection and diagnosis in DNNs by leveraging both static and dynamic analysis. It consists of three primary stages:
- Fault Detection - Identifies faulty DNN programs based on runtime features.
- Fault Categorization - Classifies detected faults into seven categories.
- Root Cause Analysis - Uses explainable AI (SHAP) to pinpoint the most influential static and dynamic features contributing to faults.
Illustrative Workflow
- Monitors runtime features such as loss trends, activation statistics, and gradient behaviors.
- Uses a trained classifier to detect if a DNN program contains faults.
- Categorizes detected faults into one or more of the following seven categories:
- Hyperparameter
- Loss
- Activation
- Layer
- Optimizer
- Weight
- Regularization
- Multiple binary classifiers are used for this classification.
- Utilizes SHAP for explainability.
- Identifies the most influential static and dynamic features responsible for the fault.
- Helps developers diagnose and fix the root cause effectively.
DEFault
├── 0_Artifact_Testing # Scripts for lightweight verification on sample DNN models
├── a_Data_Collection # Scripts for collecting and processing StackOverflow posts
├── b_Fault_Seeding # Scripts for fault injection (DeepCrime and EFI extension)
├── c_Feature_Extraction # Static & Dynamic feature extraction scripts
├── d_DEFault # Implementation of DEFault (Fault Detection, Categorization, RCA)
├── e_Evaluation # Evaluation scripts for real-world and seeded faults
├── f_Figures # Figures used in the paper
├── g_Dataset # Labeled datasets for training and testing
├── h_CohenKappaAnalysis # Scripts for dataset consistency validation (Cohen’s Kappa)
├── i_CaseStudy # Scripts for real-world case studies (e.g., PixelCNN)
├── j_HPC_Slurm # Slurm job script for Compute Canada
├── K_Pre-Print.pdf # Pre-print of the full paper
Tested on:
- Ubuntu 20.04 LTS or later
- HPC environments (e.g., Compute Canada, Graham Cluster)
Compatible with:
- Windows 10/11 (via WSL2)
- macOS Monterey (M1/M2 support may require additional setup)
Minimum:
- CPU: 4 cores
- RAM: 8 GB
- Disk: 10 GB
Recommended:
- GPU: NVIDIA with CUDA support
- HPC access (e.g., Compute Canada) for full experiments
-
Python: 3.10.16
-
Create a virtual environment:
python -m venv default_env
source default_env/bin/activate # macOS/Linux
default_env\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txt- Navigate to the evaluation scripts directory:
cd 0_Artifact_Testing/evaluation_scripts- Run the Fault Detection & Categorization (FD_FC) script:
python testForCaseStudy_FD_FC.pyExpected Output:
- Fault Detection (FD): Confirms if the PixelCNN model has faults.
- Fault Categorization (FC): Identifies the type of faults, including:
- Loss Function Fault
- Hyperparameter Fault
- Layer Fault
- Note: The model mistakenly identify an Optimization Fault due to feature overlap.
- Run the Root Cause Analysis (RCA) script:
python testForCaseStudy_RCA.pyExpected Output:
-
Identifies and ranks the potential root causes of the Layer Fault using static features:
- Top@1: CountDense: Check the configuration and number of Dense layers.
- Top@2: Min_Neurons: No specific fault message
- Top@3: CountConv2D: Inspect the configuration of 2D convolutional layers.
- Top@4: Countsoftmax: Look into the activation function Softmax and its placement.
- Top@5: Max_Neurons: Verify the maximum number of neurons in any single layer.
Download the dataset:
- DNN Programs: Download Link
- Evaluation Benchmark: Download Link
🔴 HPC Support Required: To run the complete experiment below, you must have access to an HPC environment. The steps below should be executed using run_script.slurm from j_HPC_Slurm directory, where all required dependencies will be installed inside the HPC environment.
- Data Collection & Fault Seeding:
# Run Part 1 (Deep Crime)
cd "b_Fault_Seeding/Part 1-DC"
python run_deepcrime_full.py
# Run Part 2 (Extended Fault Injection)
cd "b_Fault_Seeding/Part 2-EFI"
python main.py- Feature Extraction:
cd c_Feature_Extraction/Static
python Static_Feature_Extraction.pycd c_Feature_Extraction/Dynamic
python Dynamic_Feature_Extraction.py- Model Training:
cd d_DEFault/A_Detection
python Fault_Detection.py- Model Evaluation:
cd e_Evaluation
python Fault_Evaluation_Detection_Diagnosis.py- Case Studies:
cd i_CaseStudy
python Feature_Extraction_CaseStudy.py
python PixelCNN_Analysis.py@inproceedings{default2025,
author = {Sigma Jahan and Mehil B Shah and Parvez Mahbub and Mohammad Masudur Rahman},
title = {Improved Detection and Diagnosis of Faults in Deep Neural Networks using Hierarchical and Explainable Classification},
booktitle = {Proceedings of the International Conference on Software Engineering (ICSE)},
year = {2025},
publisher = {IEEE}
}- Sigma Jahan - Dalhousie University, sigma.jahan@dal.ca
- Mehil B Shah - Dalhousie University, shahmehil@dal.ca
- Parvez Mahbub - Dalhousie University, parvezmrobin@dal.ca
- Mohammad Masudur Rahman - Dalhousie University, masud.rahman@dal.ca
This project is licensed under the MIT License. See LICENSE for details.
