Skip to content

feat: DataFog API v2 MVP — detection, enforcement, demo#14

Open
sidmohan0 wants to merge 50 commits intodevfrom
v2
Open

feat: DataFog API v2 MVP — detection, enforcement, demo#14
sidmohan0 wants to merge 50 commits intodevfrom
v2

Conversation

@sidmohan0
Copy link
Contributor

Summary

Complete MVP for DataFog API v2, closing all gaps between prototype and shippable product across 4 workstreams:

WS1 — PII Detection Parity (M1–M3)

  • Expanded scanner from 5 to 10+ entity types (email, phone, SSN, credit card, API key, IP address, date, zip code, person, organization, location)
  • Added post-match validation (Luhn for credit cards, range check for IPs)
  • Go-native heuristic NER engine for person/organization/location detection (no Python/cgo deps)
  • 6 anonymization strategies: mask, redact, tokenize, anonymize, replace/pseudonymize, hash (SHA-256)

WS2 — Enforcement Gap Closure (M4–M6)

  • Shim now actually applies transform plans on allow_with_redaction decisions (was previously allow-only)
  • Shared adapter registry (internal/adapters) with claude and codex as recognized adapters
  • Policy rules can target specific adapters via match.adapters field
  • GET /v1/events endpoint with time range, decision, adapter, and limit filters
  • Receipt store rotation with configurable max entries and auto-archive

WS3 — Interactive Demo (M7)

  • Real execution demo server behind --enable-demo flag
  • /demo/exec, /demo/write-file, /demo/read-file endpoints run through actual shim gate
  • Updated docs/demo.html with command execution, file write, and file read panels
  • All operations sandboxed to temp directory

WS4 — Platform Integration (M8)

  • Integrated setup scripts from v2-claude and v2-codex branches
  • Claude Code and OpenAI Codex hook wrappers (scripts/claude-datafog-setup.sh, scripts/codex-datafog-setup.sh)
  • Policy rules for agent-specific enforcement (allow help commands, deny dangerous shell exec)

Test plan

  • go vet ./... passes
  • go test ./... — all 10 packages green
  • gofmt clean
  • Scanner detects all 10+ entity types with correct spans
  • NER identifies person/org/location entities
  • Luhn validation rejects invalid credit card numbers
  • IPv4 validation rejects out-of-range octets
  • Transform applies all 6 modes correctly
  • Shim enforces redaction on allow_with_redaction for both file read and write
  • Adapter registry resolves claude/codex aliases
  • Policy matching respects adapter conditions
  • Events endpoint filters by time/decision/adapter
  • Receipt rotation archives when max entries exceeded
  • CI pipeline (gofmt, go vet, go test, gosec)

🤖 Generated with Claude Code

sidmohan0 and others added 20 commits February 23, 2026 14:11
The TestResolveTargetBinary/pathLookup test failed on Windows because
resolveTargetBinary appends .exe before exec.LookPath, but the test
created the lookup file without the extension. Also adds an OpenAPI 3.1
Scalar reference page for interactive local API exploration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reflects request Origin in Access-Control-Allow-Origin and handles
OPTIONS preflight so Scalar UI can reach the API from file:// or
other origins during local development.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ip_address, date, zip_code detection patterns alongside existing
email, phone, ssn, api_key, credit_card. Credit card now validated
with Luhn algorithm; IP addresses validated for 0-255 range per octet.
Introduces EntityPattern struct with optional Validate func for
post-match filtering. Updates policy engine to recognize new types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Heuristic-based NER engine using dictionary lookups, title-case
analysis, contextual triggers (Mr./Dr. for person, Inc./Corp for org,
in/at for location), and common first-name matching. Cascaded after
regex engine in ScanText. Toggle via NEREnabled flag. No Python, no
cgo, no external dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace mode generates entity-typed pseudonyms ([PERSON_A1B2C3]).
Hash mode outputs full SHA256 hex digest. Both are deterministic.
Updates policy engine, server validation, and transform engine to
support all 6 modes: mask, tokenize, anonymize, redact, replace, hash.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WriteFile now applies the transform plan to data before writing when
the decision is allow_with_redaction. ReadFile scans and redacts the
output. applyRedaction method scans content, applies transform steps,
and returns redacted bytes. Closes the critical enforcement gap where
the shim was ignoring transform plans.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move adapter registry to internal/adapters for shared access. Add
claude and codex as recognized adapters with aliases. Add Adapters
field to MatchCriteria so policy rules can target specific adapters.
Policy engine resolves aliases to canonical names during matching.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Events endpoint supports time range, decision type, adapter, and limit
query filters. NDJSONDecisionEventSink now implements EventReader with
Query method. Receipt store supports configurable max entries with
auto-rotation (archives old file with timestamp suffix).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Demo server exposes /demo/exec, /demo/write-file, /demo/read-file
endpoints that run through the actual shim gate with policy enforcement.
All file operations sandboxed to temp directory. Requires --enable-demo
flag or DATAFOG_ENABLE_DEMO env var. Updated demo.html with command
execution, file write, and file read panels.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-pick setup scripts, runbooks, and specs from v2-claude/v2-codex
branches. Add claude/codex adapter rules to policy.json alongside
existing rules. Policy version bumped to v2026-02-24-1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces the old demo panels with a step-by-step scenario explorer.
Users pick a scenario (deny, redact, allow, read-redact) and walk
through each stage of the enforcement pipeline one step at a time.

- Add /demo/seed endpoint that writes directly to sandbox bypassing
  the shim gate, so file-read scenarios show real redaction on output
- Serve demo.html from /demo via the API server (no CORS issues)
- Fix policy: split file.write/read into separate allow_with_redaction
  rules, add allow-shell fallback, remove overly strict AND entity reqs
- Policy version bumped to v2026-02-24-2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds spec for an optional sidecar NER service that provides
GLiNER2-grade entity detection (person, org, location) without
bloating the Go binary. Shim calls sidecar over HTTP when
DATAFOG_NER_ENDPOINT is set, falls back to heuristic NER otherwise.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant