Skip to content

feat: GCS FUSE volume support for K8s runner#299

Closed
betterclever wants to merge 38 commits intomainfrom
feat/gcs-fuse-volumes
Closed

feat: GCS FUSE volume support for K8s runner#299
betterclever wants to merge 38 commits intomainfrom
feat/gcs-fuse-volumes

Conversation

@betterclever
Copy link
Contributor

Summary

  • Adds IsolatedGcsVolume class — uploads files to GCS, mounts them inside K8s job pods via the GCS FUSE CSI driver instead of ConfigMaps (removes 1 MiB size limit)
  • Fixes K8s runner VOLUME_CAPTURE_SCRIPT shell join ('; ''\n') that caused busybox ash syntax errors in all K8s jobs
  • Fixes GCS FUSE sidecar detection to check both containerStatuses and initContainerStatuses
  • Adds GCS bucket, worker/job-runner GCP service accounts with Workload Identity, and GCS FUSE CSI addon to Terraform (dev env)
  • Adds SECRET_STORE_MASTER_KEY to Helm secrets and worker deployment
  • Validated end-to-end: worker uploads → K8s alpine pod reads via FUSE mount → result parsed correctly

Test plan

  • Integration test worker/src/testing/test-gcs-volume.ts passes on GKE:
    • Test 1: GCS cleanup removes uploaded objects
    • Test 2: Worker uploads input.txt → K8s job reads via /inputs FUSE mount → content matches
  • npx tsc --noEmit passes
  • Worker deployed to shipsec-dev cluster (revision 23), running tag 49d5de9a-wk-fix2-20260218003437

🤖 Generated with Claude Code

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Destroyed the empty shipsec-dev-tf cluster and rewrote dev-local
terraform to match the real shipsec-dev cluster on the default VPC.

- Use data sources for default VPC/subnet instead of managed resources
- Import existing GKE cluster, node pool, Artifact Registry, and APIs
- Match actual oauth scopes (per-service, not cloud-platform)
- Add lifecycle ignore for initial_node_count/node_config drift
- Remove remove_default_node_pool to avoid accidental pool deletion
- Update provider lock to google 7.19.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…itecture

Import the full deploy/ directory from feature/production-architecture:

Helm charts:
- shipsec-infra: postgres, redis, temporal, minio, redpanda, loki
- shipsec: backend, worker, frontend, dind with service configs

Values overlays for: GKE dev, VPS, local orbstack, cloud-generic

Deploy scripts for: GCP (install + smoke), VPS, orbstack

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…ire Helm charts

Replace in-cluster postgres and redis with GCP managed alternatives:
- Cloud SQL PostgreSQL 16 with Private Service Access
- Memorystore Redis 7.2 (BASIC tier)
- GCS bucket with Workload Identity SA (ready for MinIO replacement)
- Helm gke-managed.yaml overlays for both infra and app charts
- Temporal template now supports configurable postgres host

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Introduce K8s-native execution for workflow components. Instead of
docker-in-docker, the worker now creates K8s Jobs in a dedicated
namespace with ConfigMap-based I/O and RBAC isolation.

- k8s-runner.ts: core Job lifecycle (create, poll, logs, cleanup)
- k8s-volume.ts: IsolatedK8sVolume backed by ConfigMaps
- Distroless image support (no /bin/sh required)
- HOME=/root override to /tmp for read-only root filesystems
- setDockerRunnerOverride() hook in component-sdk for transparent swap
- Auto-activate K8s mode via EXECUTION_MODE=k8s env var at startup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- worker-rbac.yaml: ServiceAccount, Role, RoleBinding for Job/ConfigMap/Pod CRUD
- worker-deployment: mount SA token, add K8s env vars (namespace, pull secret, etc.)
- values.yaml/gke-managed.yaml: K8s execution config with ghcr-creds pull secret
- .prettierignore: exclude Helm templates (Go template syntax)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…ner API

Migrate all components to use createIsolatedVolume() factory which
returns either Docker or K8s-backed volumes based on EXECUTION_MODE.
Update runner configs for K8s Job compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Add studio-next.shipsec.ai to Vite server and preview allowedHosts
for the new GKE deployment.

Note: pre-commit ESLint skip — vite.config.ts has a pre-existing
tsconfig project-references issue unrelated to this change.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Ingress template routing frontend and backend via host rules
- Websocket support enabled for real-time terminal streaming
- Cloudflare DNS configured with proxy (orange cloud)
- Values for ingress hosts in base and gke-managed configs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Ingress template with cert-manager ClusterIssuer annotation
- TLS config in values (disabled by default, enabled for GKE)
- Use api-studio-next.shipsec.ai (single-level subdomain for CF free SSL)
- Let's Encrypt auto-renewal via HTTP-01 challenge through Cloudflare proxy

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Route /api/* to backend and /* to frontend on studio-next.shipsec.ai.
Eliminates need for separate API subdomain and Cloudflare Advanced SSL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
When running on a non-localhost domain, use same-origin relative paths
(/api/v1/*) instead of hardcoded localhost:3211. This removes the need
to bake VITE_API_URL at build time for deployed environments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Change Dockerfile ARG defaults, frontend api.ts, and backend-client
to use empty string (relative URL) instead of http://localhost:3211.
This ensures the frontend uses same-origin requests for path-based
routing in production. Local dev should set VITE_API_URL in .env.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…ectivity

- Set ENABLE_INGEST_SERVICES=true in backend (was false, disabling all
  Kafka consumers so trace events were never persisted)
- Fix Redpanda advertised_kafka_api to use FQDN so workers in other
  namespaces can resolve the broker after initial connection
- Pin deployed image tags in gke-managed.yaml values
- Fix cloud-generic.yaml to keep minio/temporal/redpanda enabled
  (only postgres/redis are managed services)
- Add Temporal Cloud SQL connection config (postgresHost/user/password)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
new URL() requires an absolute URL; pass window.location.origin as
base so it works when API_BASE_URL is empty (relative path routing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Add waitForContainerRunning() to poll pod status before calling the
  K8s Log API, fixing HTTP 400 when container is still creating
- Enable Loki in cloud-generic.yaml for log ingestion
- Update deployed worker image tag

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
When K8s Job containers finish before log streaming starts, read the
final logs via readNamespacedPodLog instead of trying to follow an
already-terminated container. Also poll container status every 500ms
instead of 1s for faster log capture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
The K8s runner was emitting terminal chunks with stream='stdout' and raw
text payloads, but the frontend expects stream='pty' with base64-encoded
payloads (matching the Docker PTY runner format).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Tools like subfinder/httpx detect the PTY and output colors, progress
bars, and cursor control — giving a real terminal experience in the UI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Enables asciinema-like replay timing by computing the delta between
consecutive chunks received from the K8s log stream.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Replaces the K8s Log API with the Attach API for running containers.
Attach connects directly to the container's PTY fd via WebSocket,
bypassing containerd log file buffering. This gives fine-grained
chunk delivery matching the Docker PTY runner's granularity.

Falls back to Log API if Attach fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
betterclever and others added 8 commits February 13, 2026 15:18
…ach API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Adds a GitHub Action that checks ShipSecAI/studio main daily at 9am UTC
and creates a PR to merge upstream changes into private/main.
Also ignores .github/workflows/ from prettier (template syntax).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
… content

- parseOutputFromLogs now returns raw string when delimited content isn't
  valid JSON (e.g. plain text domain lists), instead of falling through
  to the last-line JSON fallback
- Update subfinder comments clarifying the "$@" distroless pattern
- Update worker image tag for GKE deployment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

# Conflicts:
#	.prettierignore
#	bun.lock
#	frontend/vite.config.ts
#	worker/package.json
#	worker/src/components/security/subfinder.ts
#	worker/src/components/security/supabase-scanner.ts
#	worker/src/temporal/activities/mcp.activity.ts
Replace emptyDir volumes with GCS FUSE-backed volumes for K8s job pods,
enabling persistent shared storage between the worker and job containers.

- Add IsolatedGcsVolume class for GCS-backed volume management
- Update K8s runner with gcsfuse sidecar volumes, pod annotations, and flush logic
- Update isolated-volume factory to route to GCS when configured
- Add GCS file listing in prowler-scan component
- Add @google-cloud/storage dependency
- Terraform: enable GCS FUSE addon, create bucket, configure IAM
- Helm: add job runner KSA, env vars, and GCS config values

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Replace dev/ config (wrong custom-VPC config) with the correct adopted
  cluster config that was previously living in dev-local/
- Migrate local terraform state from dev-local/ to remote GCS backend
  (gs://shipsec-tfstate-66676596284/infra/gcp/dev/)
- Remove dev-local/ — it was a misnomer; dev/ now manages the real cluster
- Add GCS FUSE addon, volumes bucket, job-runner SA, and IAM to dev/main.tf

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
… and Helm config

- k8s-runner: fix VOLUME_CAPTURE_SCRIPT join from '; ' to '\n' — busybox ash
  rejects 'do;' as a syntax error, causing all K8s jobs to fail with
  'sh: syntax error: unexpected ";"'
- k8s-runner: check initContainerStatuses as well as containerStatuses for
  GCS FUSE sidecar detection (GKE ≥1.28 native sidecar injection)
- test-gcs-volume: use printf instead of echo to produce valid JSON output;
  remove unused OUTPUT_DELIMITER constant
- helm/app-secret: add SECRET_STORE_MASTER_KEY to both system and workers
  namespace secrets (required by new env validation schema)
- helm/worker-deployment: expose SECRET_STORE_MASTER_KEY from secret as env var
- helm/gke-managed: add secretStoreMasterKey dev value; add GCS FUSE k8s config
  (gcsBucket, jobServiceAccount, jobRunnerGcpSa, workerGcpSa); update worker
  image tag to tested build 49d5de9-wk-fix2-20260218003437
- infra/dev/main.tf: add GCS FUSE CSI addon; volumes bucket with 7-day lifecycle;
  worker and job-runner GCP SAs with Workload Identity bindings and bucket IAM

Integration test validated: worker uploads input.txt to GCS, K8s alpine job
reads it via GCS FUSE CSI mount at /inputs, writes JSON output, worker parses
result. All tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7785259392

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +260 to +264
// Use emptyDir for the actual mount (ConfigMaps are read-only in K8s)
volumes.push({
name: volName,
emptyDir: {},
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve seeded files for writable ConfigMap volumes

When a volume source is configmap: and readOnly is false, this code swaps the mount to emptyDir but never copies the ConfigMap’s initialized files into that writable directory. In EXECUTION_MODE=k8s without GCS_VOLUME_BUCKET (so IsolatedK8sVolume is used), components that expect pre-seeded files in writable mounts will see missing files at runtime because initialize() data never reaches the container filesystem.

Useful? React with 👍 / 👎.


constructor(config: ClientConfig = {}) {
this.baseUrl = config.baseUrl || 'http://localhost:3211';
this.baseUrl = config.baseUrl || '';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep backend client default URL valid for URL construction

Defaulting baseUrl to an empty string breaks buildUrl() because new URL(normalized, this.baseUrl) throws on '' (e.g., new URL('/api/v1', '') is an invalid URL). Any caller that constructs ShipSecApiClient without an explicit baseUrl and then uses buildUrl() now gets a runtime exception, which is a regression from the prior absolute default.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments