feat: GCS FUSE volume support for K8s runner#299
Conversation
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Destroyed the empty shipsec-dev-tf cluster and rewrote dev-local terraform to match the real shipsec-dev cluster on the default VPC. - Use data sources for default VPC/subnet instead of managed resources - Import existing GKE cluster, node pool, Artifact Registry, and APIs - Match actual oauth scopes (per-service, not cloud-platform) - Add lifecycle ignore for initial_node_count/node_config drift - Remove remove_default_node_pool to avoid accidental pool deletion - Update provider lock to google 7.19.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…itecture Import the full deploy/ directory from feature/production-architecture: Helm charts: - shipsec-infra: postgres, redis, temporal, minio, redpanda, loki - shipsec: backend, worker, frontend, dind with service configs Values overlays for: GKE dev, VPS, local orbstack, cloud-generic Deploy scripts for: GCP (install + smoke), VPS, orbstack Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…ire Helm charts Replace in-cluster postgres and redis with GCP managed alternatives: - Cloud SQL PostgreSQL 16 with Private Service Access - Memorystore Redis 7.2 (BASIC tier) - GCS bucket with Workload Identity SA (ready for MinIO replacement) - Helm gke-managed.yaml overlays for both infra and app charts - Temporal template now supports configurable postgres host Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Introduce K8s-native execution for workflow components. Instead of docker-in-docker, the worker now creates K8s Jobs in a dedicated namespace with ConfigMap-based I/O and RBAC isolation. - k8s-runner.ts: core Job lifecycle (create, poll, logs, cleanup) - k8s-volume.ts: IsolatedK8sVolume backed by ConfigMaps - Distroless image support (no /bin/sh required) - HOME=/root override to /tmp for read-only root filesystems - setDockerRunnerOverride() hook in component-sdk for transparent swap - Auto-activate K8s mode via EXECUTION_MODE=k8s env var at startup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- worker-rbac.yaml: ServiceAccount, Role, RoleBinding for Job/ConfigMap/Pod CRUD - worker-deployment: mount SA token, add K8s env vars (namespace, pull secret, etc.) - values.yaml/gke-managed.yaml: K8s execution config with ghcr-creds pull secret - .prettierignore: exclude Helm templates (Go template syntax) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…ner API Migrate all components to use createIsolatedVolume() factory which returns either Docker or K8s-backed volumes based on EXECUTION_MODE. Update runner configs for K8s Job compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Add studio-next.shipsec.ai to Vite server and preview allowedHosts for the new GKE deployment. Note: pre-commit ESLint skip — vite.config.ts has a pre-existing tsconfig project-references issue unrelated to this change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Ingress template routing frontend and backend via host rules - Websocket support enabled for real-time terminal streaming - Cloudflare DNS configured with proxy (orange cloud) - Values for ingress hosts in base and gke-managed configs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Ingress template with cert-manager ClusterIssuer annotation - TLS config in values (disabled by default, enabled for GKE) - Use api-studio-next.shipsec.ai (single-level subdomain for CF free SSL) - Let's Encrypt auto-renewal via HTTP-01 challenge through Cloudflare proxy Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Route /api/* to backend and /* to frontend on studio-next.shipsec.ai. Eliminates need for separate API subdomain and Cloudflare Advanced SSL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
When running on a non-localhost domain, use same-origin relative paths (/api/v1/*) instead of hardcoded localhost:3211. This removes the need to bake VITE_API_URL at build time for deployed environments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Change Dockerfile ARG defaults, frontend api.ts, and backend-client to use empty string (relative URL) instead of http://localhost:3211. This ensures the frontend uses same-origin requests for path-based routing in production. Local dev should set VITE_API_URL in .env. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…ectivity - Set ENABLE_INGEST_SERVICES=true in backend (was false, disabling all Kafka consumers so trace events were never persisted) - Fix Redpanda advertised_kafka_api to use FQDN so workers in other namespaces can resolve the broker after initial connection - Pin deployed image tags in gke-managed.yaml values - Fix cloud-generic.yaml to keep minio/temporal/redpanda enabled (only postgres/redis are managed services) - Add Temporal Cloud SQL connection config (postgresHost/user/password) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
new URL() requires an absolute URL; pass window.location.origin as base so it works when API_BASE_URL is empty (relative path routing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Add waitForContainerRunning() to poll pod status before calling the K8s Log API, fixing HTTP 400 when container is still creating - Enable Loki in cloud-generic.yaml for log ingestion - Update deployed worker image tag Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
When K8s Job containers finish before log streaming starts, read the final logs via readNamespacedPodLog instead of trying to follow an already-terminated container. Also poll container status every 500ms instead of 1s for faster log capture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
The K8s runner was emitting terminal chunks with stream='stdout' and raw text payloads, but the frontend expects stream='pty' with base64-encoded payloads (matching the Docker PTY runner format). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Tools like subfinder/httpx detect the PTY and output colors, progress bars, and cursor control — giving a real terminal experience in the UI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Enables asciinema-like replay timing by computing the delta between consecutive chunks received from the K8s log stream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Replaces the K8s Log API with the Attach API for running containers. Attach connects directly to the container's PTY fd via WebSocket, bypassing containerd log file buffering. This gives fine-grained chunk delivery matching the Docker PTY runner's granularity. Falls back to Log API if Attach fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…ach API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Adds a GitHub Action that checks ShipSecAI/studio main daily at 9am UTC and creates a PR to merge upstream changes into private/main. Also ignores .github/workflows/ from prettier (template syntax). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
ci: add daily upstream sync workflow
… content - parseOutputFromLogs now returns raw string when delimited content isn't valid JSON (e.g. plain text domain lists), instead of falling through to the last-line JSON fallback - Update subfinder comments clarifying the "$@" distroless pattern - Update worker image tag for GKE deployment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com> # Conflicts: # .prettierignore # bun.lock # frontend/vite.config.ts # worker/package.json # worker/src/components/security/subfinder.ts # worker/src/components/security/supabase-scanner.ts # worker/src/temporal/activities/mcp.activity.ts
Replace emptyDir volumes with GCS FUSE-backed volumes for K8s job pods, enabling persistent shared storage between the worker and job containers. - Add IsolatedGcsVolume class for GCS-backed volume management - Update K8s runner with gcsfuse sidecar volumes, pod annotations, and flush logic - Update isolated-volume factory to route to GCS when configured - Add GCS file listing in prowler-scan component - Add @google-cloud/storage dependency - Terraform: enable GCS FUSE addon, create bucket, configure IAM - Helm: add job runner KSA, env vars, and GCS config values Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Replace dev/ config (wrong custom-VPC config) with the correct adopted cluster config that was previously living in dev-local/ - Migrate local terraform state from dev-local/ to remote GCS backend (gs://shipsec-tfstate-66676596284/infra/gcp/dev/) - Remove dev-local/ — it was a misnomer; dev/ now manages the real cluster - Add GCS FUSE addon, volumes bucket, job-runner SA, and IAM to dev/main.tf Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
… and Helm config - k8s-runner: fix VOLUME_CAPTURE_SCRIPT join from '; ' to '\n' — busybox ash rejects 'do;' as a syntax error, causing all K8s jobs to fail with 'sh: syntax error: unexpected ";"' - k8s-runner: check initContainerStatuses as well as containerStatuses for GCS FUSE sidecar detection (GKE ≥1.28 native sidecar injection) - test-gcs-volume: use printf instead of echo to produce valid JSON output; remove unused OUTPUT_DELIMITER constant - helm/app-secret: add SECRET_STORE_MASTER_KEY to both system and workers namespace secrets (required by new env validation schema) - helm/worker-deployment: expose SECRET_STORE_MASTER_KEY from secret as env var - helm/gke-managed: add secretStoreMasterKey dev value; add GCS FUSE k8s config (gcsBucket, jobServiceAccount, jobRunnerGcpSa, workerGcpSa); update worker image tag to tested build 49d5de9-wk-fix2-20260218003437 - infra/dev/main.tf: add GCS FUSE CSI addon; volumes bucket with 7-day lifecycle; worker and job-runner GCP SAs with Workload Identity bindings and bucket IAM Integration test validated: worker uploads input.txt to GCS, K8s alpine job reads it via GCS FUSE CSI mount at /inputs, writes JSON output, worker parses result. All tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7785259392
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Use emptyDir for the actual mount (ConfigMaps are read-only in K8s) | ||
| volumes.push({ | ||
| name: volName, | ||
| emptyDir: {}, | ||
| }); |
There was a problem hiding this comment.
Preserve seeded files for writable ConfigMap volumes
When a volume source is configmap: and readOnly is false, this code swaps the mount to emptyDir but never copies the ConfigMap’s initialized files into that writable directory. In EXECUTION_MODE=k8s without GCS_VOLUME_BUCKET (so IsolatedK8sVolume is used), components that expect pre-seeded files in writable mounts will see missing files at runtime because initialize() data never reaches the container filesystem.
Useful? React with 👍 / 👎.
|
|
||
| constructor(config: ClientConfig = {}) { | ||
| this.baseUrl = config.baseUrl || 'http://localhost:3211'; | ||
| this.baseUrl = config.baseUrl || ''; |
There was a problem hiding this comment.
Keep backend client default URL valid for URL construction
Defaulting baseUrl to an empty string breaks buildUrl() because new URL(normalized, this.baseUrl) throws on '' (e.g., new URL('/api/v1', '') is an invalid URL). Any caller that constructs ShipSecApiClient without an explicit baseUrl and then uses buildUrl() now gets a runtime exception, which is a regression from the prior absolute default.
Useful? React with 👍 / 👎.
Summary
IsolatedGcsVolumeclass — uploads files to GCS, mounts them inside K8s job pods via the GCS FUSE CSI driver instead of ConfigMaps (removes 1 MiB size limit)VOLUME_CAPTURE_SCRIPTshell join ('; '→'\n') that caused busybox ash syntax errors in all K8s jobscontainerStatusesandinitContainerStatusesSECRET_STORE_MASTER_KEYto Helm secrets and worker deploymentTest plan
worker/src/testing/test-gcs-volume.tspasses on GKE:input.txt→ K8s job reads via/inputsFUSE mount → content matchesnpx tsc --noEmitpassesshipsec-devcluster (revision 23), running tag49d5de9a-wk-fix2-20260218003437🤖 Generated with Claude Code