Skip to content

feat: security hardening#256

Merged
HardMax71 merged 13 commits intomainfrom
feat/sec-hardening
Mar 3, 2026
Merged

feat: security hardening#256
HardMax71 merged 13 commits intomainfrom
feat/sec-hardening

Conversation

@HardMax71
Copy link
Owner

@HardMax71 HardMax71 commented Mar 2, 2026


Summary by cubic

Hardens auth, Kubernetes isolation, and frontend CSP, and shifts pod concurrency to Kubernetes with Kueue. Applies default‑deny networking, PSA labels, and quotas; secures the pre‑puller; enforces Redis auth; pins DB images; and auto‑cleans executor pods.

  • New Features

    • Auth: switch to pwdlib[bcrypt] for hashing and verification (seeding uses same hasher/rounds).
    • Kubernetes: startup applies default‑deny NetworkPolicy, ResourceQuota, and PSA restricted labels; installs/configures Kueue (ResourceFlavor, ClusterQueue, LocalQueue) and labels pods for executor‑queue; worker ensures namespace security and pre‑puller with restricted context.
    • Pods: support K8S_POD_RUNTIME_CLASS_NAME; enable user‑namespace isolation (host_users: false).
    • Frontend: nonce‑based CSP with nginx sub_filter; meta tag carries per‑request nonce; CodeMirror passes the nonce; output panel uses ansi_up with class‑based ANSI styles; loading UI gets ARIA attributes.
    • Infra: require Redis password (healthcheck uses auth); pin Mongo 8.0.17 and Redis 7.4.6‑alpine; release workflow passes Redis/Mongo secrets.
    • Pod monitor: delete pods on completion/failure/timeout; shorten watch timeout to 30s; safer publish error handling.
  • Migration

    • Install Kueue CRDs/controllers and create executor queues (ResourceFlavor, ClusterQueue, LocalQueue). In CI/automation, set KUEUE_VERSION or use the provided setup script.
    • Set REDIS_PASSWORD in secrets/.env; clients must use it. In CI/CD, set REDIS_PASSWORD, MONGO_ROOT_USER, and MONGO_ROOT_PASSWORD.
    • Update nginx to the template that injects CSP nonces and meta tag.
    • Ensure the cluster supports NetworkPolicy and Pod Security Admission; grant the worker permissions to patch namespace labels and manage policies/quotas/daemonsets.
    • Remove uses of K8S_MAX_CONCURRENT_PODS; concurrency is now enforced by Kueue + namespace ResourceQuota. Existing bcrypt hashes still work.

Written for commit 43da70d. Summary will update on new commits.

Summary by CodeRabbit

  • New Features

    • Nonce-based Content Security Policy and CSS-driven loading UI.
    • Kubernetes startup hardening with namespace security, pre-puller initialization, and automatic cleanup of completed worker pods.
  • Security

    • Redis now requires a password by default; new deployment secret keys added.
    • Password hashing backend upgraded for stronger hashing.
  • Updates

    • MongoDB and Redis images upgraded.
    • Improved ANSI color rendering in logs/output; frontend dependency updated.
  • Documentation

    • Expanded security, nginx, and operations guidance.

- passlib -> pwdlib
- preciser limit enforcements for k8s pods, also psa labels
- bumped up dependencies (last ones got CVEs inside), also added up passwords
Copilot AI review requested due to automatic review settings March 2, 2026 20:33
@coderabbitai
Copy link

coderabbitai bot commented Mar 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Kubernetes namespace hardening and RBAC updates, migrates password hashing from passlib to pwdlib, enables CSP nonce and ANSI-library changes in the frontend, requires Redis authentication and exposes deployment secrets, and adds pod cleanup on terminal events; documentation and CI updated accordingly.

Changes

Cohort / File(s) Summary
CI Deployment & Secrets
\.github/workflows/release-deploy.yml, docker-compose.yaml, backend/secrets.example.toml
Added REDIS_PASSWORD, MONGO_ROOT_USER, MONGO_ROOT_PASSWORD to deploy env and export in deploy script; enabled Redis auth in compose; bumped mongo/redis image tags; added sample REDIS_PASSWORD.
Password Hashing Migration
backend/app/core/security.py, backend/pyproject.toml, backend/scripts/seed_users.py
Replaced PassLib CryptContext with pwdlib PasswordHash (BcryptHasher); updated imports, call sites, dependency, and seed script to accept/use pwd_hasher.
Kubernetes Worker & Pod Spec
backend/app/services/k8s_worker/worker.py, backend/app/services/k8s_worker/pod_builder.py, backend/workers/run_k8s_worker.py, backend/app/settings.py
Added networking_v1 client, ensure_namespace_security and helpers for NetworkPolicy/ResourceQuota/PSA labels; switched to patch-based DaemonSet apply; added host_users=False and runtime_class_name to PodSpec; new K8S_POD_RUNTIME_CLASS_NAME setting; invoked at startup.
Pod Lifecycle Cleanup
backend/app/services/pod_monitor/monitor.py, backend/app/services/pod_monitor/config.py
Added terminal event set and logic to delete pods after terminal events; new _delete_pod helper and ApiException/EventType imports; reduced watch_timeout_seconds default from 300→30.
RBAC for Cluster Operations
cert-generator/setup-k8s.sh
Expanded ClusterRole/Role verbs to include patch/update for namespaces, daemonsets, networkpolicies and added resourcequotas CRUD verbs; added cleanup of stale executor pods.
Frontend CSP Nonce & Nginx
frontend/nginx.conf.template, frontend/public/index.html, frontend/src/components/editor/CodeMirrorEditor.svelte, docs/operations/nginx-configuration.md
Integrated request-scoped CSP nonce injection via nginx ($request_id) and sub_filter; added nonce meta and attributes in HTML; wired CodeMirror to use cspNonce; documented nonce injection.
ANSI Library & UI Changes
frontend/package.json, frontend/src/components/editor/OutputPanel.svelte, frontend/src/app.css, frontend/public/index.html
Replaced ansi-to-html with ansi_up; updated OutputPanel to new API and stricter sanitization; added ANSI utility CSS; replaced inline loader with CSS/ARIA loader.
Docs: Security & Policies
docs/SECURITY.md, docs/security/policies.md
Expanded frontend and runtime hardening guidance: CSP nonce, pod-level host_users/runtime_class_name, namespace NetworkPolicy, ResourceQuota, and PSA guidance; restructured security policy docs.
Tests
frontend/src/components/editor/__tests__/CodeMirrorEditor.test.ts
Added cspNonce mock to EditorView constructor mock to support CSP-related behavior in tests.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(46,134,193,0.5)
    participant Runner as RunK8sWorker
  end
  rect rgba(39,174,96,0.5)
    participant Worker as KubernetesWorker
  end
  rect rgba(241,196,15,0.5)
    participant NetAPI as networking_v1 API
    participant CoreAPI as core/v1 API
    participant AppsAPI as apps/v1 API
  end

  Runner->>Worker: start() / init()
  Worker->>NetAPI: patch NetworkPolicy(namespace, default-deny)
  NetAPI-->>Worker: patched
  Worker->>CoreAPI: patch ResourceQuota(namespace)
  CoreAPI-->>Worker: patched
  Worker->>CoreAPI: patch Namespace labels (PSA)
  CoreAPI-->>Worker: patched
  Worker->>AppsAPI: patch_namespaced_daemon_set(pre-puller)
  AppsAPI-->>Worker: patched
  Worker-->>Runner: namespace security ensured
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hid a nonce beneath the moonlit log,
I hopped through namespaces, snug as a frog,
Pwdlib hummed softly while Redis found its key,
Pods politely vanished when their work was free,
A rabbit cheers — secure, tidy, and spry!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: security hardening' accurately reflects the main objective of the pull request, which comprehensively addresses authentication, Kubernetes executor isolation, frontend CSP, and infrastructure security.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/sec-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 11 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/secrets.example.toml">

<violation number="1" location="backend/secrets.example.toml:18">
P2: Avoid committing a real/default Redis password in the example secrets template; use a CHANGE_ME placeholder to prevent insecure deployments reusing a weak default.</violation>
</file>

<file name="backend/app/services/k8s_worker/worker.py">

<violation number="1" location="backend/app/services/k8s_worker/worker.py:302">
P1: Security controls fail silently: if applying the NetworkPolicy, ResourceQuota, or PSA labels fails (e.g., due to RBAC misconfiguration), the error is only logged and the worker starts up without security controls. For a security-critical path, these exceptions should propagate so the worker either retries or refuses to start without its security posture in place. Consider re-raising after logging, or at minimum returning a status from each helper so `ensure_namespace_security` can raise if any control failed.</violation>
</file>

<file name="frontend/nginx.conf.template">

<violation number="1" location="frontend/nginx.conf.template:17">
P2: `style-src` still includes `'unsafe-inline'`, which negates hash-based CSP hardening for inline styles and keeps CSS injection risks open. If the intent is to harden CSP, drop `unsafe-inline` and rely on hashes/nonces for inline styles.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 12 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="docs/operations/nginx-configuration.md">

<violation number="1" location="docs/operations/nginx-configuration.md:259">
P3: Admonition content is no longer indented, so the warning block won’t render as a warning in MkDocs. Re-indent the lines under `!!! warning` to keep them part of the admonition.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on tightening security across the stack (frontend HTTP headers, local dev infra, authentication hashing, and Kubernetes execution isolation) to reduce attack surface and strengthen runtime controls.

Changes:

  • Hardened frontend CSP by removing unsafe-inline from script-src and adding script/style hashes.
  • Secured local Redis with requirepass, and pinned MongoDB/Redis image patch versions in docker-compose.
  • Added Kubernetes namespace security initialization (default-deny NetworkPolicy, ResourceQuota, PSA labels) and added optional RuntimeClass support for execution pods; migrated password hashing from passlib to pwdlib[bcrypt].

Reviewed changes

Copilot reviewed 20 out of 22 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
frontend/nginx.conf.template Updates CSP to use hashes and reduce inline script allowance.
docker-compose.yaml Pins DB/cache image versions; adds Redis auth and updates healthcheck accordingly.
backend/workers/run_k8s_worker.py Runs new namespace security bootstrap on worker startup.
backend/uv.lock Updates lockfile for dependency changes (adds pwdlib/bcrypt, removes passlib).
backend/secrets.example.toml Adds REDIS_PASSWORD example secret.
backend/scripts/seed_users.py Migrates seed script hashing from passlib to pwdlib bcrypt.
backend/pyproject.toml Replaces passlib with pwdlib[bcrypt] dependency.
backend/app/settings.py Adds K8S_POD_RUNTIME_CLASS_NAME and includes Redis password setting in Settings.
backend/app/services/k8s_worker/worker.py Adds namespace security enforcement (NetworkPolicy/Quota/PSA labels).
backend/app/services/k8s_worker/pod_builder.py Adds RuntimeClass support and sets host_users=False for additional pod isolation.
backend/app/core/security.py Migrates application password hashing/verification from passlib to pwdlib bcrypt.
Files not reviewed (1)
  • frontend/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (7)
docs/operations/nginx-configuration.md (1)

56-60: Consider documenting a production path with upstream TLS verification enabled.

This section is clear, but adding a “preferred production setup” (proxy_ssl_verify on + trusted CA) would strengthen the guidance and avoid normalizing verification-off configurations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/operations/nginx-configuration.md` around lines 56 - 60, Add a
recommended production configuration note that contrasts the existing insecure
example: explain enabling TLS verification by setting proxy_ssl_verify on and
configuring a trusted CA bundle via proxy_ssl_trusted_certificate (and
optionally proxy_ssl_verify_depth), and mention using real upstream hostnames
with proxy_pass (still set via envsubst) and preserving Host header with
proxy_set_header Host $host; include a short guidance to mount or reference the
trusted CA file into the container and avoid proxy_ssl_verify off in production.
frontend/src/components/editor/OutputPanel.svelte (1)

98-118: Precompute rendered ANSI HTML to avoid repeated conversion on re-renders.

Line 98 and Line 118 currently do conversion + sanitization in-template. Moving this to $derived keeps rendering lighter and easier to reason about.

♻️ Proposed refactor
+    const stdoutHtml = $derived.by(() =>
+        result?.stdout ? sanitize(ansiConverter.ansi_to_html(result.stdout)) : ''
+    );
+
+    const stderrHtml = $derived.by(() =>
+        result?.stderr ? sanitize(ansiConverter.ansi_to_html(result.stderr)) : ''
+    );
@@
-                            <pre class="output-pre custom-scrollbar" data-testid="output-pre">{`@html` sanitize(ansiConverter.ansi_to_html(result.stdout))}</pre>
+                            <pre class="output-pre custom-scrollbar" data-testid="output-pre">{`@html` stdoutHtml}</pre>
@@
-                                <pre class="text-xs text-red-600 dark:text-red-300 whitespace-pre-wrap break-words font-mono bg-transparent p-0 pr-8">{`@html` sanitize(ansiConverter.ansi_to_html(result.stderr))}</pre>
+                                <pre class="text-xs text-red-600 dark:text-red-300 whitespace-pre-wrap break-words font-mono bg-transparent p-0 pr-8">{`@html` stderrHtml}</pre>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/components/editor/OutputPanel.svelte` around lines 98 - 118, The
template is calling ansiConverter.ansi_to_html(...) and sanitize(...) inline for
result.stdout and result.stderr which repeats work on every render; precompute
sanitized HTML once using a reactive derived value (e.g., create reactive
variables like stdoutHtml and stderrHtml or use a Svelte derived store) that
call ansiConverter.ansi_to_html(result.stdout) and sanitize(...) (and likewise
for stderr) and then replace the inline {`@html` ...} uses with those precomputed
variables to avoid repeated conversion and sanitization; reference
ansiConverter, sanitize, result.stdout, result.stderr, and the new reactive
stdoutHtml/stderrHtml variables in the fix.
backend/scripts/seed_users.py (1)

26-26: Avoid hardcoding bcrypt cost in the seed script.

Using a fixed rounds value here can drift from the runtime setting and create weaker-or-inconsistent seeded credentials over time.

♻️ Proposed refactor
-pwd_hasher = PasswordHash((BcryptHasher(rounds=12),))
-
-
 async def upsert_user(
     db: AsyncDatabase[dict[str, Any]],
+    pwd_hasher: PasswordHash,
     username: str,
@@
 async def seed_users(settings: Settings) -> None:
     """Seed default users using provided settings for MongoDB connection."""
+    pwd_hasher = PasswordHash((BcryptHasher(rounds=settings.BCRYPT_ROUNDS),))
@@
     await upsert_user(
         db,
+        pwd_hasher,
         username="user",
@@
     await upsert_user(
         db,
+        pwd_hasher,
         username="admin",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/seed_users.py` at line 26, The seed script hardcodes bcrypt
cost in the PasswordHash instantiation (pwd_hasher =
PasswordHash((BcryptHasher(rounds=12),))) — change this to read the bcrypt
rounds from the application's runtime configuration or environment instead of
using the literal 12; locate PasswordHash and BcryptHasher usage in
seed_users.py and replace the hardcoded rounds with the shared config value
(e.g., import the bcrypt rounds constant or a get_bcrypt_rounds() helper from
the app settings or config module) and keep a sensible default fallback if the
config is missing.
backend/app/services/k8s_worker/worker.py (3)

296-303: Use structured logging fields in new security methods.

The new logs interpolate values via f-strings. Switch to bound/keyword fields (policy_name=..., namespace=..., reason=...) for safer and queryable logs.

As per coding guidelines: "Use only structured logging via injected structlog.stdlib.BoundLogger ... pass user-controlled data as keyword arguments, never interpolate into log messages".

Also applies to: 330-337, 350-352

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/k8s_worker/worker.py` around lines 296 - 303, Replace
f-string interpolation in the logging calls with structured keyword fields:
change self.logger.info(f"NetworkPolicy '{policy_name}' updated in namespace
{namespace}") to self.logger.info("NetworkPolicy updated",
policy_name=policy_name, namespace=namespace) and similarly for the create/info
and error paths—use self.logger.info("NetworkPolicy created",
policy_name=policy_name, namespace=namespace) and self.logger.error("Failed to
apply NetworkPolicy", policy_name=policy_name, namespace=namespace,
reason=e.reason) when catching ApiException from
networking_v1.create_namespaced_network_policy; make the same pattern for the
other logger usages referenced (the similar info/error calls around the other
NetworkPolicy operations) so all user-controlled values are passed as keyword
fields instead of interpolated into the message.

257-263: New docstrings should follow Google style sections.

The added method docstrings are descriptive but missing explicit Args/Returns/Raises sections required for backend Python files.

As per coding guidelines: "Use Google-style docstrings with Args/Returns/Raises sections".

Also applies to: 270-270, 305-306, 339-339

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/k8s_worker/worker.py` around lines 257 - 263, The
docstring that begins "Apply security controls to the executor namespace at
startup." is missing Google-style sections; update that function's docstring to
include explicit Args:, Returns:, and Raises: sections describing each parameter
(type and purpose), what the function returns (or None), and any exceptions it
may raise; apply the same change to the other new docstrings referenced in the
diff (the other startup/security-related docstrings), ensuring each uses
Google-style formatting and accurate parameter/return/exception descriptions so
linters and readers have clear API contracts.

341-343: Pin pod-security.kubernetes.io/enforce-version to a specific Kubernetes minor version.

Using latest can change enforcement behavior during cluster upgrades. Kubernetes best practices recommend pinning enforce-version to your current cluster's Kubernetes minor version (e.g., v1.35) to keep admission policy behavior stable. Set warn-version and audit-version to latest to track upcoming standard changes before they become blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/k8s_worker/worker.py` around lines 341 - 343, Update the
pod-security label values in the labels dict where these keys are set (look for
"pod-security.kubernetes.io/enforce",
"pod-security.kubernetes.io/enforce-version", "pod-security.kubernetes.io/warn"
in worker.py) by replacing the current enforce-version value "latest" with your
cluster's Kubernetes minor version string (e.g., "v1.35"); also set
"pod-security.kubernetes.io/warn-version" and
"pod-security.kubernetes.io/audit-version" to "latest" if you want to track
upcoming changes. Ensure you only change the label values (not key names) so
admission behavior is pinned for enforce-version while warn/audit remain at
latest.
backend/app/services/k8s_worker/pod_builder.py (1)

109-110: Gate host_users=False behind Kubernetes version/feature-gate support.

The hostUsers field requires Kubernetes v1.25+ with the UserNamespacesSupport feature gate enabled (disabled by default until v1.33). Setting host_users=False unconditionally will cause pod creation to fail on unsupported clusters (< v1.25) or clusters running v1.28–v1.32 without the feature gate explicitly enabled. Add a settings flag or cluster version check before including this field to prevent runtime pod creation failures in mixed or older cluster environments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/k8s_worker/pod_builder.py` around lines 109 - 110, The
pod spec currently sets host_users=False unconditionally which will break on
clusters that don't support UserNamespacesSupport; update the PodBuilder
creation logic (where host_users is passed, e.g., in the code referencing
host_users and self._settings.K8S_POD_RUNTIME_CLASS_NAME) to only include the
host_users field when the cluster supports user namespaces or when an explicit
settings flag is enabled (add a new setting like K8S_ENABLE_USER_NAMESPACES or
perform a cluster version/feature-gate check before setting host_users). Ensure
the code path that builds the pod spec omits the host_users parameter entirely
for unsupported clusters and document/validate the new setting in the settings
accessor used by PodBuilder.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/k8s_worker/worker.py`:
- Around line 291-303: The ApiException handler in the NetworkPolicy apply logic
swallows non-404 errors (in the block calling
networking_v1.read_namespaced_network_policy /
replace_namespaced_network_policy), which lets startup continue without required
hardening; after logging the error (self.logger.error(...)), re-raise the
exception so startup fails fast. Update the except ApiException as e branches to
call raise after the error log for non-404 cases; apply the same change to the
similar handlers around the other NetworkPolicy operations (the blocks at the
other mentions, e.g., where
replace_namespaced_network_policy/create_namespaced_network_policy are used).
- Around line 317-323: The ResourceQuota "hard" dict is using hardcoded "128Mi"
and 1 CPU assumptions and must be computed from the per-pod settings so quotas
track changes to K8S_POD_*; update the construction of the hard dict (the block
that references self._settings.K8S_MAX_CONCURRENT_PODS) to derive values by
multiplying per-pod request/limit settings (e.g.
self._settings.K8S_POD_CPU_REQUEST, self._settings.K8S_POD_CPU_LIMIT,
self._settings.K8S_POD_MEMORY_REQUEST_MB, self._settings.K8S_POD_MEMORY_LIMIT_MB
or the actual K8S_POD_* names present in settings) by
self._settings.K8S_MAX_CONCURRENT_PODS, format CPU values as strings like "1" or
"0.5" and memory as "{N}Mi", and replace the hardcoded "128Mi"/implicit CPU
values so the quota matches configured per-pod resources.
- Around line 292-295: The code calls read_namespaced_network_policy and
read_namespaced_resource_quota but discards the returned objects before calling
replace_namespaced_network_policy and replace_namespaced_resource_quota, which
causes Kubernetes to reject updates; fix by capturing the read results (e.g.,
existing_policy = await
self.networking_v1.read_namespaced_network_policy(name=policy_name,
namespace=namespace) and existing_quota = await
self.core_v1.read_namespaced_resource_quota(name=quota_name,
namespace=namespace)) and copy existing_policy.metadata.resource_version into
policy.metadata.resource_version and existing_quota.metadata.resource_version
into quota.metadata.resource_version before calling
replace_namespaced_network_policy and replace_namespaced_resource_quota so the
optimistic concurrency resourceVersion is preserved.

In `@backend/secrets.example.toml`:
- Line 18: Replace the usable default Redis password value "redispassword" in
the sample secrets with a non-usable placeholder to prevent accidental
deployment; update the REDIS_PASSWORD entry to something like
"CHANGE_ME_REDIS_PASSWORD" (or similar clearly-not-secret text) so consumers
must deliberately set a real secret before use and it’s obvious in the
REDIS_PASSWORD field.

In `@docker-compose.yaml`:
- Around line 74-82: The compose file currently falls back to a default by using
${REDIS_PASSWORD:-redispassword}; update both places (the redis-server command
string and the healthcheck redis-cli invocation) to use the variable without a
default (use ${REDIS_PASSWORD}) so startup will fail fast when the secret is not
provided rather than silently using a known password; ensure REDIS_PASSWORD is
documented/provided in the environment or .env for deployments.

In `@frontend/public/index.html`:
- Around line 34-37: The spinner lacks accessible semantics—update the container
with id "app-loading" (and/or the inner wrapper "app-loading-inner") to include
role="status" and aria-live="polite" (and optionally aria-busy="true"), and add
a hidden textual status node (e.g., a <span> with a visually-hidden class or
aria-label text like "Loading…") inside the container so screen readers announce
the state; keep the decorative spinner element (class "app-loading-spinner")
aria-hidden="true".

---

Nitpick comments:
In `@backend/app/services/k8s_worker/pod_builder.py`:
- Around line 109-110: The pod spec currently sets host_users=False
unconditionally which will break on clusters that don't support
UserNamespacesSupport; update the PodBuilder creation logic (where host_users is
passed, e.g., in the code referencing host_users and
self._settings.K8S_POD_RUNTIME_CLASS_NAME) to only include the host_users field
when the cluster supports user namespaces or when an explicit settings flag is
enabled (add a new setting like K8S_ENABLE_USER_NAMESPACES or perform a cluster
version/feature-gate check before setting host_users). Ensure the code path that
builds the pod spec omits the host_users parameter entirely for unsupported
clusters and document/validate the new setting in the settings accessor used by
PodBuilder.

In `@backend/app/services/k8s_worker/worker.py`:
- Around line 296-303: Replace f-string interpolation in the logging calls with
structured keyword fields: change self.logger.info(f"NetworkPolicy
'{policy_name}' updated in namespace {namespace}") to
self.logger.info("NetworkPolicy updated", policy_name=policy_name,
namespace=namespace) and similarly for the create/info and error paths—use
self.logger.info("NetworkPolicy created", policy_name=policy_name,
namespace=namespace) and self.logger.error("Failed to apply NetworkPolicy",
policy_name=policy_name, namespace=namespace, reason=e.reason) when catching
ApiException from networking_v1.create_namespaced_network_policy; make the same
pattern for the other logger usages referenced (the similar info/error calls
around the other NetworkPolicy operations) so all user-controlled values are
passed as keyword fields instead of interpolated into the message.
- Around line 257-263: The docstring that begins "Apply security controls to the
executor namespace at startup." is missing Google-style sections; update that
function's docstring to include explicit Args:, Returns:, and Raises: sections
describing each parameter (type and purpose), what the function returns (or
None), and any exceptions it may raise; apply the same change to the other new
docstrings referenced in the diff (the other startup/security-related
docstrings), ensuring each uses Google-style formatting and accurate
parameter/return/exception descriptions so linters and readers have clear API
contracts.
- Around line 341-343: Update the pod-security label values in the labels dict
where these keys are set (look for "pod-security.kubernetes.io/enforce",
"pod-security.kubernetes.io/enforce-version", "pod-security.kubernetes.io/warn"
in worker.py) by replacing the current enforce-version value "latest" with your
cluster's Kubernetes minor version string (e.g., "v1.35"); also set
"pod-security.kubernetes.io/warn-version" and
"pod-security.kubernetes.io/audit-version" to "latest" if you want to track
upcoming changes. Ensure you only change the label values (not key names) so
admission behavior is pinned for enforce-version while warn/audit remain at
latest.

In `@backend/scripts/seed_users.py`:
- Line 26: The seed script hardcodes bcrypt cost in the PasswordHash
instantiation (pwd_hasher = PasswordHash((BcryptHasher(rounds=12),))) — change
this to read the bcrypt rounds from the application's runtime configuration or
environment instead of using the literal 12; locate PasswordHash and
BcryptHasher usage in seed_users.py and replace the hardcoded rounds with the
shared config value (e.g., import the bcrypt rounds constant or a
get_bcrypt_rounds() helper from the app settings or config module) and keep a
sensible default fallback if the config is missing.

In `@docs/operations/nginx-configuration.md`:
- Around line 56-60: Add a recommended production configuration note that
contrasts the existing insecure example: explain enabling TLS verification by
setting proxy_ssl_verify on and configuring a trusted CA bundle via
proxy_ssl_trusted_certificate (and optionally proxy_ssl_verify_depth), and
mention using real upstream hostnames with proxy_pass (still set via envsubst)
and preserving Host header with proxy_set_header Host $host; include a short
guidance to mount or reference the trusted CA file into the container and avoid
proxy_ssl_verify off in production.

In `@frontend/src/components/editor/OutputPanel.svelte`:
- Around line 98-118: The template is calling ansiConverter.ansi_to_html(...)
and sanitize(...) inline for result.stdout and result.stderr which repeats work
on every render; precompute sanitized HTML once using a reactive derived value
(e.g., create reactive variables like stdoutHtml and stderrHtml or use a Svelte
derived store) that call ansiConverter.ansi_to_html(result.stdout) and
sanitize(...) (and likewise for stderr) and then replace the inline {`@html` ...}
uses with those precomputed variables to avoid repeated conversion and
sanitization; reference ansiConverter, sanitize, result.stdout, result.stderr,
and the new reactive stdoutHtml/stderrHtml variables in the fix.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4339989 and d46ae06.

⛔ Files ignored due to path filters (2)
  • backend/uv.lock is excluded by !**/*.lock
  • frontend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (20)
  • .github/workflows/release-deploy.yml
  • backend/app/core/security.py
  • backend/app/services/k8s_worker/pod_builder.py
  • backend/app/services/k8s_worker/worker.py
  • backend/app/settings.py
  • backend/pyproject.toml
  • backend/scripts/seed_users.py
  • backend/secrets.example.toml
  • backend/workers/run_k8s_worker.py
  • docker-compose.yaml
  • docs/SECURITY.md
  • docs/operations/nginx-configuration.md
  • docs/security/policies.md
  • frontend/nginx.conf.template
  • frontend/package.json
  • frontend/public/index.html
  • frontend/src/app.css
  • frontend/src/components/editor/CodeMirrorEditor.svelte
  • frontend/src/components/editor/OutputPanel.svelte
  • frontend/src/components/editor/__tests__/CodeMirrorEditor.test.ts

@codecov-commenter
Copy link

codecov-commenter commented Mar 2, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 34.61538% with 51 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/services/k8s_worker/worker.py 6.66% 42 Missing ⚠️
backend/app/services/pod_monitor/monitor.py 52.63% 9 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Flag Coverage Δ
backend-e2e 83.08% <19.44%> (-0.09%) ⬇️
backend-unit 67.83% <29.16%> (-0.07%) ⬇️
frontend-unit 86.86% <100.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
backend/app/core/security.py 96.15% <100.00%> (+0.04%) ⬆️
backend/app/services/k8s_worker/pod_builder.py 100.00% <100.00%> (ø)
backend/app/services/pod_monitor/config.py 100.00% <100.00%> (ø)
backend/app/services/runtime_settings.py 100.00% <ø> (ø)
backend/app/settings.py 97.95% <100.00%> (ø)
...tend/src/components/editor/CodeMirrorEditor.svelte 64.28% <100.00%> (+0.87%) ⬆️
frontend/src/components/editor/OutputPanel.svelte 89.33% <100.00%> (+3.03%) ⬆️
backend/app/services/pod_monitor/monitor.py 90.29% <52.63%> (-9.71%) ⬇️
backend/app/services/k8s_worker/worker.py 19.71% <6.66%> (+1.01%) ⬆️
Components Coverage Δ
Backend 90.01% <29.16%> (-0.11%) ⬇️
Frontend 86.86% <100.00%> (+0.07%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- index.html: aria labels for loading wheel
- worker: well-typed objs for k8s-client internal stuff
- seed user: moved pwd hasher creation to main, taking num of rounds from settings obj
- docs: update on diff between style-src-elem vs. style-src-attr
- docs: fix of identation in nginx-config doc
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
frontend/public/index.html (1)

41-48: Consider setting aria-busy="false" when hiding the loader.

The loader correctly hides via classList.add('app-loading-hidden'), but aria-busy remains true even after the app loads. While the element is hidden, updating the ARIA state ensures proper semantics if the element is ever shown again or inspected by assistive tech.

♿ Proposed fix
     <script nonce="**CSP_NONCE**">
         window.addEventListener('DOMContentLoaded', () => {
             setTimeout(() => {
                 const loader = document.getElementById('app-loading');
-                if (loader) loader.classList.add('app-loading-hidden');
+                if (loader) {
+                    loader.classList.add('app-loading-hidden');
+                    loader.setAttribute('aria-busy', 'false');
+                }
             }, 100);
         });
     </script>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/public/index.html` around lines 41 - 48, When hiding the loader
inside the DOMContentLoaded handler (the setTimeout block that queries the
element with id 'app-loading' and adds the 'app-loading-hidden' class), also
update its ARIA state by setting aria-busy to "false" (e.g.,
element.setAttribute('aria-busy', 'false')) so the loader's accessible state
reflects that loading has finished; locate the script that listens for
'DOMContentLoaded' and the code that modifies the 'app-loading' element and add
the aria update immediately after adding the class.
backend/scripts/seed_users.py (1)

27-36: Update upsert_user docstring to Google style (including new pwd_hasher arg).

The function docstring should include Args/Returns (and Raises if applicable), especially now that the signature changed.

Suggested docstring update
 async def upsert_user(
     db: AsyncDatabase[dict[str, Any]],
     pwd_hasher: PasswordHash,
     username: str,
@@
 ) -> None:
-    """Create user or update if exists."""
+    """Create a user or update an existing user.
+
+    Args:
+        db: MongoDB async database handle.
+        pwd_hasher: Password hasher used to hash plaintext passwords.
+        username: Username to upsert.
+        email: Email address for the user.
+        password: Plaintext password to hash and store.
+        role: Role assigned to the user.
+        is_superuser: Whether the user has superuser privileges.
+
+    Returns:
+        None
+    """

As per coding guidelines: Use Google-style docstrings with Args/Returns/Raises sections.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/seed_users.py` around lines 27 - 36, The docstring for
upsert_user is missing Google-style Args/Returns (and Raises if applicable) and
does not document the new pwd_hasher parameter; update the upsert_user docstring
to Google style by adding an Args section that documents db
(AsyncDatabase[dict[str, Any]]), pwd_hasher (PasswordHash), username (str),
email (str), password (str), role (str), and is_superuser (bool), a Returns
section indicating it returns None, and a Raises section for any exceptions the
function may propagate (e.g., database errors) so callers know what to expect.
backend/app/services/k8s_worker/worker.py (2)

384-386: Consider whether daemonset failures should halt startup.

The error handling logs but doesn't re-raise, allowing startup to continue without the image pre-puller. This contrasts with the security hardening methods which fail fast.

If the pre-puller is purely an optimization (faster cold starts), the current graceful degradation is reasonable. If image availability is required for execution, this should re-raise. The distinction from security methods suggests this is intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/k8s_worker/worker.py` around lines 384 - 386, The
exception handlers around applying the DaemonSet (catching ApiException as e and
the generic except Exception as e) currently log errors but swallow them,
allowing startup to continue; decide whether image pre-puller failures must be
fatal and implement accordingly: if image availability is required, re-raise the
caught exceptions (or raise a custom exception) from the except blocks in the
apply DaemonSet logic (referencing ApiException, daemonset_name and the generic
except Exception handler) so startup fails fast; if pre-puller is optional, add
an explicit comment and leave graceful degradation but change the log
level/message to clearly indicate non-fatal behavior.

309-315: Parsing lacks validation for settings configuration errors.

The .removesuffix() approach assumes CPU values end in "m" and memory in "Mi", matching the documented TOML defaults ("1000m", "128Mi"). However, there is no validation on these settings fields, so misconfiguration in the TOML file will cause silent bugs or runtime failures:

Misconfigured TOML value Behavior
"1000m" ✓ Works correctly
"1" Calculates 1m instead of 1000m (silent bug)
"0.5" Raises ValueError at runtime
"256M" Fails to parse as integer at runtime

Consider adding a Pydantic field validator to Settings to validate format and normalize units, ensuring configuration errors are caught early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/k8s_worker/worker.py` around lines 309 - 315, Add
validation and normalization for K8S_POD_CPU_REQUEST, K8S_POD_CPU_LIMIT,
K8S_POD_MEMORY_REQUEST, and K8S_POD_MEMORY_LIMIT on the Settings model instead
of using .removesuffix() raw in worker.py; implement Pydantic validators on
those fields (or a shared validator) to enforce allowed formats (e.g., CPU must
end with "m" or be a plain integer convertible to millicores, memory must end
with "Mi"), normalize and store them in a canonical form (e.g., "NNNm" and
"NNNMi") or expose numeric millicore/MeBi values, and raise clear
ValidationError on misconfiguration so the hard={...} block in k8s_worker.worker
(which references those settings names) can safely multiply parsed integers
without runtime ValueError or silent unit mistakes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/app/services/k8s_worker/worker.py`:
- Around line 384-386: The exception handlers around applying the DaemonSet
(catching ApiException as e and the generic except Exception as e) currently log
errors but swallow them, allowing startup to continue; decide whether image
pre-puller failures must be fatal and implement accordingly: if image
availability is required, re-raise the caught exceptions (or raise a custom
exception) from the except blocks in the apply DaemonSet logic (referencing
ApiException, daemonset_name and the generic except Exception handler) so
startup fails fast; if pre-puller is optional, add an explicit comment and leave
graceful degradation but change the log level/message to clearly indicate
non-fatal behavior.
- Around line 309-315: Add validation and normalization for K8S_POD_CPU_REQUEST,
K8S_POD_CPU_LIMIT, K8S_POD_MEMORY_REQUEST, and K8S_POD_MEMORY_LIMIT on the
Settings model instead of using .removesuffix() raw in worker.py; implement
Pydantic validators on those fields (or a shared validator) to enforce allowed
formats (e.g., CPU must end with "m" or be a plain integer convertible to
millicores, memory must end with "Mi"), normalize and store them in a canonical
form (e.g., "NNNm" and "NNNMi") or expose numeric millicore/MeBi values, and
raise clear ValidationError on misconfiguration so the hard={...} block in
k8s_worker.worker (which references those settings names) can safely multiply
parsed integers without runtime ValueError or silent unit mistakes.

In `@backend/scripts/seed_users.py`:
- Around line 27-36: The docstring for upsert_user is missing Google-style
Args/Returns (and Raises if applicable) and does not document the new pwd_hasher
parameter; update the upsert_user docstring to Google style by adding an Args
section that documents db (AsyncDatabase[dict[str, Any]]), pwd_hasher
(PasswordHash), username (str), email (str), password (str), role (str), and
is_superuser (bool), a Returns section indicating it returns None, and a Raises
section for any exceptions the function may propagate (e.g., database errors) so
callers know what to expect.

In `@frontend/public/index.html`:
- Around line 41-48: When hiding the loader inside the DOMContentLoaded handler
(the setTimeout block that queries the element with id 'app-loading' and adds
the 'app-loading-hidden' class), also update its ARIA state by setting aria-busy
to "false" (e.g., element.setAttribute('aria-busy', 'false')) so the loader's
accessible state reflects that loading has finished; locate the script that
listens for 'DOMContentLoaded' and the code that modifies the 'app-loading'
element and add the aria update immediately after adding the class.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d46ae06 and 9c774fc.

📒 Files selected for processing (4)
  • backend/app/services/k8s_worker/worker.py
  • backend/scripts/seed_users.py
  • docs/operations/nginx-configuration.md
  • frontend/public/index.html

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/app/services/k8s_worker/worker.py">

<violation number="1" location="backend/app/services/k8s_worker/worker.py:288">
P1: Loss of error isolation: the old code handled each security control independently (catch and log), so a failure in one wouldn't prevent the others. Now if `_ensure_executor_network_policy` throws (e.g., cluster doesn't support NetworkPolicy), `_ensure_executor_resource_quota` and `_apply_psa_labels` are skipped entirely. Consider wrapping each call in `ensure_namespace_security` with individual error handling, or restoring try/except in each method.</violation>

<violation number="2" location="backend/app/services/k8s_worker/worker.py:311">
P2: Brittle resource-value parsing: `removesuffix` + `int()` will crash with `ValueError` if settings don't use the exact expected suffix (e.g., CPU `"1"` instead of `"1000m"`, or memory `"1Gi"` instead of `"1024Mi"`). Consider a parsing helper that handles standard Kubernetes resource quantity formats, or at minimum validate the format at settings load time.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cert-generator/setup-k8s.sh`:
- Around line 145-148: The RBAC rules grant overly broad permissions for the
resources "networkpolicies" and "resourcequotas"; update the Role/ClusterRole
entries that list these resources so their verbs are reduced to only
["get","list","watch","patch"] (remove "delete" and any other unnecessary
verbs), ensuring the rules match the reconciliation code which uses server-side
apply (patch) only; locate the Role/ClusterRole block that references resources:
["networkpolicies"] and ["resourcequotas"] and replace the verbs array
accordingly.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c774fc and f9d32c2.

📒 Files selected for processing (1)
  • cert-generator/setup-k8s.sh

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/app/services/pod_monitor/monitor.py">

<violation number="1" location="backend/app/services/pod_monitor/monitor.py:191">
P1: Pod deletion can occur even when terminal event publication fails, causing potential event loss. Only delete after confirming all terminal events were published successfully.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/app/services/pod_monitor/monitor.py (1)

222-238: Use structured logging with keyword arguments instead of f-string interpolation.

Per coding guidelines, user-controlled or external data should be passed as keyword arguments to the logger rather than interpolated into log messages. While pod_name comes from Kubernetes metadata, using keyword arguments enables better log aggregation and filtering.

♻️ Proposed refactor for structured logging
     pod_name = pod.metadata.name
     try:
         await self._v1.delete_namespaced_pod(
             name=pod_name, namespace=pod.metadata.namespace, grace_period_seconds=0,
         )
-        self.logger.info(f"Deleted completed pod {pod_name}")
+        self.logger.info("Deleted completed pod", pod_name=pod_name)
     except ApiException as e:
         if e.status == 404:
-            self.logger.debug(f"Pod {pod_name} already deleted")
+            self.logger.debug("Pod already deleted", pod_name=pod_name)
         else:
-            self.logger.warning(f"Failed to delete pod {pod_name}: {e.reason}")
+            self.logger.warning("Failed to delete pod", pod_name=pod_name, reason=e.reason)

As per coding guidelines: "Use only structured logging via injected structlog.stdlib.BoundLogger... pass user-controlled data as keyword arguments, never interpolate into log messages."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/pod_monitor/monitor.py` around lines 222 - 238, In the
async method _delete_pod, replace f‑string logging with structured logging by
passing pod metadata and error details as keyword arguments to the injected
logger instead of interpolating into messages; update the logger.info call that
reports deletion, the logger.debug call for 404 cases, and the logger.warning
call in the ApiException branch to use messages like "Deleted completed pod",
"Pod already deleted", and "Failed to delete pod" respectively while supplying
pod_name=pod_name (and namespace=pod.metadata.namespace) and for exceptions
include error_reason=e.reason and error_status=e.status as keyword args so
structured loggers can index the fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/app/services/pod_monitor/monitor.py`:
- Around line 222-238: In the async method _delete_pod, replace f‑string logging
with structured logging by passing pod metadata and error details as keyword
arguments to the injected logger instead of interpolating into messages; update
the logger.info call that reports deletion, the logger.debug call for 404 cases,
and the logger.warning call in the ApiException branch to use messages like
"Deleted completed pod", "Pod already deleted", and "Failed to delete pod"
respectively while supplying pod_name=pod_name (and
namespace=pod.metadata.namespace) and for exceptions include
error_reason=e.reason and error_status=e.status as keyword args so structured
loggers can index the fields.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9d32c2 and de03298.

📒 Files selected for processing (1)
  • backend/app/services/pod_monitor/monitor.py

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cert-generator/setup-k8s.sh">

<violation number="1" location="cert-generator/setup-k8s.sh:93">
P1: The cleanup selector is too broad: `status.phase!=Running` also deletes valid `Pending` executor pods, which can interrupt active executions during startup. Restrict cleanup to terminal phases only.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
cert-generator/setup-k8s.sh (1)

147-151: ⚠️ Potential issue | 🟠 Major

Still over-permissive RBAC on policy/quota resources.

networkpolicies and resourcequotas still include destructive verbs beyond reconciliation needs. Keep these scoped to read/watch + patch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cert-generator/setup-k8s.sh` around lines 147 - 151, The RBAC rules for
resources "networkpolicies" and "resourcequotas" are over-permissive; update the
verbs arrays for the entries that reference resources: ["networkpolicies"] and
resources: ["resourcequotas"] so they only include read/watch plus patch (e.g.,
["get","list","watch","patch"]); remove create/update/delete from those verbs
and ensure the corresponding apiGroups entry is present/appropriate for each
resource if required.
🧹 Nitpick comments (1)
backend/app/services/k8s_worker/worker.py (1)

293-294: Use structured log fields instead of f-strings in new log lines.

Please switch these to static messages + keyword fields for consistency and safer log processing.

🔧 Suggested fix
-        self.logger.info(f"NetworkPolicy '{policy_name}' applied in namespace {namespace}")
+        self.logger.info("NetworkPolicy applied", policy_name=policy_name, namespace=namespace)
@@
-        self.logger.info(f"ResourceQuota '{quota_name}' applied in namespace {namespace}")
+        self.logger.info("ResourceQuota applied", quota_name=quota_name, namespace=namespace)
@@
-        self.logger.info(f"Pod Security Admission labels applied to namespace {namespace}")
+        self.logger.info("Pod Security Admission labels applied", namespace=namespace)
@@
-            self.logger.info(f"DaemonSet '{daemonset_name}' applied successfully")
+            self.logger.info("DaemonSet applied successfully", daemonset_name=daemonset_name, namespace=namespace)

As per coding guidelines "Use only structured logging via injected structlog.stdlib.BoundLogger ... pass user-controlled data as keyword arguments, never interpolate into log messages".

Also applies to: 324-325, 336-336, 404-404

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/k8s_worker/worker.py` around lines 293 - 294, Replace
f-string log interpolation with structured logging: change calls like
self.logger.info(f"NetworkPolicy '{policy_name}' applied in namespace
{namespace}") to a static message and pass dynamic values as keyword fields
(e.g., self.logger.info("NetworkPolicy applied", policy_name=policy_name,
namespace=namespace)). Do the same for the other similar uses of
self.logger.info/error in this file that interpolate policy_name, namespace,
pod_name, config_map_name, etc., ensuring all user-controlled data is passed as
keyword arguments rather than embedded in the message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/k8s_worker/worker.py`:
- Around line 311-314: The CPU/memory quota math assumes CPU strings end with
"m" and does int(removesuffix('m')) which breaks for values like "1" or "0.5";
implement a small parser (e.g., parse_cpu_to_milli) that converts any CPU string
to an integer milliCPU value: if it ends with "m" strip and int, else parse as
float and multiply by 1000 (round or int), then use
parse_cpu_to_milli(self._settings.K8S_POD_CPU_REQUEST) and
parse_cpu_to_milli(self._settings.K8S_POD_CPU_LIMIT) when computing
"requests.cpu" and "limits.cpu" (e.g., f"{parse_cpu_to_milli(...) * n}m"); keep
existing memory handling but consider a similar parser for
K8S_POD_MEMORY_REQUEST/K8S_POD_MEMORY_LIMIT if they might omit "Mi".

In `@cert-generator/setup-k8s.sh`:
- Around line 92-93: The current kubectl delete line (kubectl delete pods -n
integr8scode -l component=executor --field-selector=status.phase!=Running ...)
can remove non-terminal Pending pods; replace it with explicit deletes for
terminal phases only by invoking kubectl delete pods -n integr8scode -l
component=executor --field-selector='status.phase==Succeeded' ... and a second
kubectl delete pods -n integr8scode -l component=executor
--field-selector='status.phase==Failed' ..., each keeping --ignore-not-found and
the existing redirection/|| true so only Succeeded and Failed executor pods are
cleaned up and Pending/Running pods are preserved.

---

Duplicate comments:
In `@cert-generator/setup-k8s.sh`:
- Around line 147-151: The RBAC rules for resources "networkpolicies" and
"resourcequotas" are over-permissive; update the verbs arrays for the entries
that reference resources: ["networkpolicies"] and resources: ["resourcequotas"]
so they only include read/watch plus patch (e.g.,
["get","list","watch","patch"]); remove create/update/delete from those verbs
and ensure the corresponding apiGroups entry is present/appropriate for each
resource if required.

---

Nitpick comments:
In `@backend/app/services/k8s_worker/worker.py`:
- Around line 293-294: Replace f-string log interpolation with structured
logging: change calls like self.logger.info(f"NetworkPolicy '{policy_name}'
applied in namespace {namespace}") to a static message and pass dynamic values
as keyword fields (e.g., self.logger.info("NetworkPolicy applied",
policy_name=policy_name, namespace=namespace)). Do the same for the other
similar uses of self.logger.info/error in this file that interpolate
policy_name, namespace, pod_name, config_map_name, etc., ensuring all
user-controlled data is passed as keyword arguments rather than embedded in the
message.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de03298 and a0d2e91.

📒 Files selected for processing (3)
  • backend/app/services/k8s_worker/worker.py
  • backend/app/services/pod_monitor/config.py
  • cert-generator/setup-k8s.sh

…ated limits for system-wide k8s quota on cpu and memory
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 6 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="docs/security/policies.md">

<violation number="1" location="docs/security/policies.md:66">
P2: The ResourceQuota description is behaviorally incorrect: exceeding quota does not keep extra pods Pending; Kubernetes rejects creation with `403 Forbidden (exceeded quota)`. This can mislead operators during incident debugging.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cert-generator/setup-k8s.sh">

<violation number="1" location="cert-generator/setup-k8s.sh:95">
P1: Applying manifests from a remote URL without integrity verification introduces a supply-chain risk in cluster bootstrap. Prefer vendoring the manifest (or verifying checksum/signature before apply) so bootstrap does not trust mutable external content at runtime.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cert-generator/setup-k8s.sh">

<violation number="1" location="cert-generator/setup-k8s.sh:96">
P1: Avoid predictable `/tmp` filenames for downloaded manifests; use a secure unique temp file (e.g., `mktemp`) to prevent symlink/race attacks.</violation>
</file>

<file name="backend/app/services/pod_monitor/monitor.py">

<violation number="1" location="backend/app/services/pod_monitor/monitor.py:211">
P1: A Kafka publish failure now aborts the rest of pod-event processing, which can skip terminal pod cleanup and remaining event publishes.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 3, 2026

@HardMax71 HardMax71 merged commit a9b4b7a into main Mar 3, 2026
19 checks passed
@HardMax71 HardMax71 deleted the feat/sec-hardening branch March 3, 2026 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants