SleuthCred is a toolkit for scanning SMB shares to detect potential credentials and secrets, and optionally enriching matches with a lightweight machine-learning based detector. It combines categorized regex-based detection with contextual enrichment and scan statistics to help reduce false positives and prioritize findings.
This repository is intended for authorized security assessments, red-team engagements, and defensive research. Do not run it against systems you do not own or do not have explicit permission to test.
- Categorized regex patterns for:
- common hash formats (MD5, SHA variants, bcrypt, phpass, etc.)
- cloud and API keys (AWS, Google, Stripe, Slack, etc.)
- tokens (Bearer, Basic, JWT, base64-ish)
- service secrets and URLs with embedded credentials
- generic credentials (email, password fields, private keys)
- Optional ML-based enricher (
nxc_credential_detector) that:- computes token features (length, entropy, hex ratio, char classes)
- optionally loads a joblib/scikit-learn artifact for model-based scoring
- extracts nearby key/value context and produces a final verdict and score
- Context lines for matches (configurable)
- Robust SMB handling: reconnection, backoff, and sensible retry behavior
- Per-scan statistics and JSON results export
- File and folder filtering by extension, filename keywords, and size limits
- Configurable scan depth and file-size thresholds
modules/search_passwords.pyβ NXC module that implementsSMBCredentialSearcherand integrates scanning logic.modules/detector/nxc_credential_detector.pyβ Lightweight enricher/classifier that providesenrich_match(...). Optionally loads a joblib model artifact (MODEL_PATH).README.mdβ this file.LICENSEβ license file (e.g., MIT).requirements.txtβ (optional) list of Python dependencies.
- Python 3.8+ recommended
- Optional / suggested Python packages:
- impacket
- joblib (optional β required to load an ML artifact)
- scikit-learn==1.8.0 (if the model artifact uses sklearn objects)
- The NXC runtime environment that provides
nxc.protocols.smb.remotefile,nxc.paths.NXC_PATH, andnxc.helpers.misc.CATEGORY(if integrating as an NXC module).
Example installation (system-wide or venv):
pip install impacket joblib scikit-learnNote: joblib is optional. The enricher falls back to purely heuristic rules when the joblib artifact is not available.
- Clone the repository:
git clone https://github.com/interhack86/sleuthcred.git
cd sleuthcred- (Optional) Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate- Install dependencies:
pip install -r requirements.txtOr install core dependencies manually:
pip install impacket joblib scikit-learnIf packaging is desired, add packaging metadata and run:
pip install .- Deploy to NetExec: Copy the module and the detector to your local NetExec configuration directory:
cp -R modules/* ~/.nxc/modules/MODEL_PATHenvironment variable:- Path to the joblib-model artifact for the enricher (default:
model.joblib). - The artifact may be either:
- a dict with keys
"model","scaler"(optional), and"meta"(optional), or - a raw estimator (fallback).
- a dict with keys
- Path to the joblib-model artifact for the enricher (default:
CUSTOM_FOLDERconstant in the scanner module:- If set, forces the scanner to use that folder as the target folder.
- NXC module options (exposed via
NXCModule.options):SHAREβ target share name (e.g.,C$). If unset, all accessible shares are enumerated.FOLDERβ folder inside the share to start scanning (requiresSHARE).MAX_FILE_SIZEβ maximum file size (bytes) to scan (default: 2 * 1024 * 1024).DEPTHβ maximum recursion depth (default: 4).PATTERN_TYPESβ comma-separated list of pattern categories (hashes,aws,google,tokens,services,generic), orall.CONTEXT_LINESβ number of context lines to display with matches (default: 2).STATS_FLAGβ enable/disable printing statistics (true/false).DEBUGβ verbose mode (true/false).OUTPUT_FOLDERβ directory to save JSON results (default underNXC_PATH).PRINT_FALLBACKβ print fallback matches when enricher is not installed (true/false).
Integrate the module into your NXC environment and run it with module options. Example option format shown in the module docs:
nxc smb CDIR -u '' -p '' -M search_passwords
OR
nxc smb CDIR -u '' -p '' -M search_passwords -o SHARE=C$ -o FOLDER=Users -o MAX_FILE_SIZE=5242880 -o DEPTH=4 -o CONTEXT_LINES=2 -o DEBUG=true
When executed, the module will:
- enumerate shares (or use
SHAREif specified), - walk folders up to the configured depth,
- read files that match allowed extensions and are not filtered,
- run regex detectors and optionally call the enricher for each match,
- print highlights and save JSON results if
OUTPUT_FOLDERis set.
The enricher script supports a simple CLI mode for local testing. Use it by piping file content and providing the required arguments:
cat somefile.txt | python modules/detector/nxc_credential_detector.py <category> <pattern_name> <token> <share> <filepath>Example:
echo "password=mySecret123" | python modules/detector/nxc_credential_detector.py generic password mySecret123 SHARE path/to/fileThis prints a JSON enriched match object to stdout.
When OUTPUT_FOLDER is configured, the scanner writes a JSON file named:
<remote_host>_credentials.json
Structure includes:
targetβ remote hosttimestampβ scan timestatisticsβ aggregated scan counters and metadatasuspicious_filenamesβ findings based on filename heuristicscontent_matchesβ enriched matches grouped by fileall_matchesβ raw match records
- Model load fails:
- Ensure
joblibis installed andMODEL_PATHpoints to a valid artifact. - Artifact format: either the estimator directly, or a dict containing
"model"and optional"scaler"and"meta".
- Ensure
- Too many false positives:
- Enable
DEBUGto see why tokens were flagged. - Narrow
PATTERN_TYPESto specific categories. - Improve the ML artifact with additional labeled data.
- Enable
- SMB connection issues:
- Verify credentials and network connectivity.
- Check that the account has permission to list/read the target shares.
- Network instability may require raising timeouts or adjusting retry/backoff.
Use this tool only with explicit authorization. Scanning systems, collecting credentials, or exfiltrating secrets without permission may be illegal.
If you store results, secure them appropriately (disk encryption, access controls) and avoid sharing discovered secrets in public or insecure channels. Consider redaction or secure vaulting for discovered credentials.
Contributions are welcome. Suggested workflow:
- Fork the repository.
- Create a feature branch (
git checkout -b feature/xxx). - Add tests for new behavior.
- Open a pull request describing your changes and rationale.
Please include unit tests for any change in heuristics or ML behavior. Provide small sample inputs demonstrating fixes for false positives/negatives.
This project is licensed under the GNU General Public License v3.0 (GPL-3.0). See the LICENSE file in the repository for the full text:
LICENSE path in repo: @interhack86/sleuthcred/files/LICENSE (GPL v3)
You can also view the full license in the repository at:
LICENSE (root of the repository)
- Add unit tests for:
shannon_entropy,hex_ratio,extract_featuresclassify_token_simpleheuristicsenrich_matchintegration and JSON output
- Provide a packaged ML model artifact and an example
MODEL_PATH. - Add a Docker image with consistent runtime environment (impacket, joblib, scikit-learn).
- Add CI (linting, tests) and pre-commit hooks.
- Improve estimator loading to be lazy (load model on first use) and replace
print()with structured logging.
