DEFault: A Framework for Fault Detection and Diagnosis in Deep Neural Networks

Welcome to the replication package for DEFault, a framework designed to improve the detection and diagnosis of faults in Deep Neural Networks (DNNs). This repository provides all the necessary code and data to reproduce the experiments from our paper accepted at ICSE - Research Track 2025:

"Improved Detection and Diagnosis of Faults in Deep Neural Networks using Hierarchical and Explainable Classification."

The pre-print of the paper is available in this repository as K_Pre-Print.pdf.

Introduction

DEFault is a hierarchical classification framework that improves fault detection and diagnosis in DNNs by leveraging both static and dynamic analysis. It consists of three primary stages:

Fault Detection - Identifies faulty DNN programs based on runtime features.
Fault Categorization - Classifies detected faults into seven categories.
Root Cause Analysis - Uses explainable AI (SHAP) to pinpoint the most influential static and dynamic features contributing to faults.

Illustrative Workflow

How DEFault Works

1. Fault Detection

Monitors runtime features such as loss trends, activation statistics, and gradient behaviors.
Uses a trained classifier to detect if a DNN program contains faults.

2. Fault Categorization

Categorizes detected faults into one or more of the following seven categories:
- Hyperparameter
- Loss
- Activation
- Layer
- Optimizer
- Weight
- Regularization
Multiple binary classifiers are used for this classification.

3. Root Cause Analysis

Utilizes SHAP for explainability.
Identifies the most influential static and dynamic features responsible for the fault.
Helps developers diagnose and fix the root cause effectively.

Repository Structure

DEFault
├── 0_Artifact_Testing       # Scripts for lightweight verification on sample DNN models
├── a_Data_Collection        # Scripts for collecting and processing StackOverflow posts
├── b_Fault_Seeding          # Scripts for fault injection (DeepCrime and EFI extension)
├── c_Feature_Extraction     # Static & Dynamic feature extraction scripts
├── d_DEFault                # Implementation of DEFault (Fault Detection, Categorization, RCA)
├── e_Evaluation             # Evaluation scripts for real-world and seeded faults
├── f_Figures                # Figures used in the paper
├── g_Dataset                # Labeled datasets for training and testing
├── h_CohenKappaAnalysis     # Scripts for dataset consistency validation (Cohen’s Kappa)
├── i_CaseStudy              # Scripts for real-world case studies (e.g., PixelCNN)
├── j_HPC_Slurm              # Slurm job script for Compute Canada
├── K_Pre-Print.pdf          # Pre-print of the full paper

Requirements

Operating System

Tested on:

Ubuntu 20.04 LTS or later
HPC environments (e.g., Compute Canada, Graham Cluster)

Compatible with:

Windows 10/11 (via WSL2)
macOS Monterey (M1/M2 support may require additional setup)

Hardware Requirements

Minimum:

CPU: 4 cores
RAM: 8 GB
Disk: 10 GB

Recommended:

GPU: NVIDIA with CUDA support
HPC access (e.g., Compute Canada) for full experiments

Software Requirements

Python: 3.10.16
Create a virtual environment:

python -m venv default_env
source default_env/bin/activate   # macOS/Linux
default_env\Scripts\activate      # Windows

Install dependencies:

pip install -r requirements.txt

Usage

Quick Start: Lightweight Verification

Navigate to the evaluation scripts directory:

cd 0_Artifact_Testing/evaluation_scripts

Run the Fault Detection & Categorization (FD_FC) script:

python testForCaseStudy_FD_FC.py

Expected Output:

Fault Detection (FD): Confirms if the PixelCNN model has faults.
Fault Categorization (FC): Identifies the type of faults, including:
- Loss Function Fault
- Hyperparameter Fault
- Layer Fault
Note: The model mistakenly identify an Optimization Fault due to feature overlap.

Run the Root Cause Analysis (RCA) script:

python testForCaseStudy_RCA.py

Expected Output:

Identifies and ranks the potential root causes of the Layer Fault using static features:
- Top@1: CountDense: Check the configuration and number of Dense layers.
- Top@2: Min_Neurons: No specific fault message
- Top@3: CountConv2D: Inspect the configuration of 2D convolutional layers.
- Top@4: Countsoftmax: Look into the activation function Softmax and its placement.
- Top@5: Max_Neurons: Verify the maximum number of neurons in any single layer.

Running the Complete Experiment

Download the dataset:

DNN Programs: Download Link
Evaluation Benchmark: Download Link

⚠️ Important Note

🔴 HPC Support Required: To run the complete experiment below, you must have access to an HPC environment. The steps below should be executed using run_script.slurm from j_HPC_Slurm directory, where all required dependencies will be installed inside the HPC environment.

Data Collection & Fault Seeding:

# Run Part 1 (Deep Crime) 
cd "b_Fault_Seeding/Part 1-DC"
python run_deepcrime_full.py
# Run Part 2 (Extended Fault Injection)
cd "b_Fault_Seeding/Part 2-EFI"
python main.py

Feature Extraction:

cd c_Feature_Extraction/Static
python Static_Feature_Extraction.py

cd c_Feature_Extraction/Dynamic
python Dynamic_Feature_Extraction.py

Model Training:

cd d_DEFault/A_Detection
python Fault_Detection.py

Model Evaluation:

cd e_Evaluation
python Fault_Evaluation_Detection_Diagnosis.py

Case Studies:

cd i_CaseStudy
python Feature_Extraction_CaseStudy.py
python PixelCNN_Analysis.py

How to Cite

@inproceedings{default2025,
  author    = {Sigma Jahan and Mehil B Shah and Parvez Mahbub and Mohammad Masudur Rahman},
  title     = {Improved Detection and Diagnosis of Faults in Deep Neural Networks using Hierarchical and Explainable Classification},
  booktitle = {Proceedings of the International Conference on Software Engineering (ICSE)},
  year      = {2025},
  publisher = {IEEE}
}

Authors

Sigma Jahan - Dalhousie University, sigma.jahan@dal.ca
Mehil B Shah - Dalhousie University, shahmehil@dal.ca
Parvez Mahbub - Dalhousie University, parvezmrobin@dal.ca
Mohammad Masudur Rahman - Dalhousie University, masud.rahman@dal.ca

License

This project is licensed under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEFault: A Framework for Fault Detection and Diagnosis in Deep Neural Networks

Table of Contents

Introduction

How DEFault Works

1. Fault Detection

2. Fault Categorization

3. Root Cause Analysis

Repository Structure

Requirements

Operating System

Hardware Requirements

Software Requirements

Usage

Quick Start: Lightweight Verification

Running the Complete Experiment

How to Cite

Authors

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
0_Artifact_Testing		0_Artifact_Testing
a_Data_Collection		a_Data_Collection
b_Fault_Seeding		b_Fault_Seeding
c_Feature_Extraction		c_Feature_Extraction
d_DEFault		d_DEFault
e_Evaluation		e_Evaluation
f_Figures		f_Figures
g_Dataset		g_Dataset
h_CohenKappaAnalysis		h_CohenKappaAnalysis
i_CaseStudy/DNNProgram_pixelcnn		i_CaseStudy/DNNProgram_pixelcnn
j_HPC_Slurm		j_HPC_Slurm
.gitignore		.gitignore
K_ResearchPaper_ICSE2025.pdf		K_ResearchPaper_ICSE2025.pdf
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DEFault: A Framework for Fault Detection and Diagnosis in Deep Neural Networks

Table of Contents

Introduction

How DEFault Works

1. Fault Detection

2. Fault Categorization

3. Root Cause Analysis

Repository Structure

Requirements

Operating System

Hardware Requirements

Software Requirements

Usage

Quick Start: Lightweight Verification

Running the Complete Experiment

How to Cite

Authors

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages