Skip to content

[fix] Resolve failing local web tests (oss / ee)#3950

Open
jp-agenta wants to merge 43 commits intofix/turnstile-loopholesfrom
fix/local-web-tests
Open

[fix] Resolve failing local web tests (oss / ee)#3950
jp-agenta wants to merge 43 commits intofix/turnstile-loopholesfrom
fix/local-web-tests

Conversation

@jp-agenta
Copy link
Member

No description provided.

@jp-agenta jp-agenta marked this pull request as ready for review March 10, 2026 15:52
@vercel
Copy link

vercel bot commented Mar 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Mar 12, 2026 8:39am

Request Review

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Mar 10, 2026
@jp-agenta jp-agenta changed the base branch from main to fix/turnstile-loopholes March 10, 2026 15:54
@dosubot dosubot bot added the bug Something isn't working label Mar 10, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

Railway Preview Environment

Preview URL https://gateway-production-1082.up.railway.app/w
Project agenta-oss-pr-3950
Image tag pr-3950-31ad3df
Status Deployed
Railway logs Open logs
Workflow logs View workflow run
Updated at 2026-03-12T08:47:22.767Z

Copilot AI review requested due to automatic review settings March 11, 2026 23:11
@dosubot dosubot bot added the Bug Report Something isn't working label Mar 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses stability and reliability issues across local/CI test execution (web Playwright + Python pytest) and Railway preview deployments, with supporting refactors to standardize runtime paths and environment handling across OSS/EE.

Changes:

  • Standardize Playwright runtime outputs/paths (results/reports/storage state/project metadata) and harden OTP/Testmail handling for web tests.
  • Expand/adjust CI workflows for preview environments and add/adjust test runners + reporting for API/SDK.
  • Improve Railway deployment scripts (compose-sourced infra images, safer secret defaults) and update docker-compose baselines (Postgres 17).

Reviewed changes

Copilot reviewed 80 out of 91 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
web/turbo.json Include DISABLE_PRETTIER in Turbo global env for cache correctness.
web/tests/utils/testmail/index.ts Refactor Testmail email/tag generation + timeout/logging improvements.
web/tests/tests/fixtures/user.fixture/authHelpers/utilities.ts Switch user email generation to runtime-aware helper.
web/tests/tests/fixtures/user.fixture/authHelpers/index.ts Make OTP UI automation more resilient and add flow logging.
web/tests/tests/fixtures/session.fixture/index.ts Use runtime Chromium launch options (allowed ports).
web/tests/tests/fixtures/base.fixture/providerHelpers/index.ts Read project metadata via runtime path helper.
web/tests/tests/fixtures/base.fixture/apiHelpers/index.ts Read project metadata via runtime path helper; minor formatting.
web/tests/playwright/scripts/run-tests.ts Formatting-only change for dimension flag regex.
web/tests/playwright/global-teardown.ts Use runtime paths; rename destructive teardown env var; update messaging.
web/tests/playwright/config/testTags.ts Collapse re-export type formatting.
web/tests/playwright/config/runtime.ts New centralized runtime path + Chromium launch option helpers.
web/tests/playwright.config.ts Route report/output/storageState/launchOptions through runtime helpers.
web/tests/README.md Update auth + teardown env var guidance; document email format.
web/tests/.gitignore Ignore new results/ and reports/ directories.
web/packages/eslint.config.mjs Allow disabling prettier rule/plugin via DISABLE_PRETTIER.
web/oss/tests/playwright/acceptance/testsset/index.ts Formatting-only change.
web/oss/tests/playwright/acceptance/smoke.spec.ts Formatting-only change.
web/oss/tests/playwright/acceptance/prompt-registry/index.ts Formatting-only change.
web/oss/tests/playwright/acceptance/playground/tests.ts Disable networkidle wait (commented) and formatting tweaks.
web/oss/tests/playwright/acceptance/app/test.ts Simplify response predicate formatting.
web/oss/tests/playwright/acceptance/.gitkeep Add placeholder file.
web/oss/tests/manual/cell-renderers/test-extract-chat-messages.ts Formatting-only change.
web/oss/src/lib/helpers/auth/turnstile.ts Add commented bypass test site key for Turnstile.
web/eslint.config.mjs Allow disabling prettier rule/plugin via DISABLE_PRETTIER; minor quoting changes.
web/ee/tests/playwright/acceptance/.gitkeep Add placeholder file.
sdk/oss/tests/pytest/unit/.gitkeep Add placeholder file.
sdk/oss/tests/pytest/acceptance/integrations/test_vault_secrets.py Add eventual-consistency polling helper for secrets list assertions.
sdk/oss/tests/pytest/acceptance/.gitkeep Add placeholder file.
sdk/agenta/sdk/assets.py Remove older model IDs from built-in model lists.
hosting/railway/oss/scripts/preview-resolve-env.sh New shared env-resolution script for preview deploys.
hosting/railway/oss/scripts/preview-create-or-update.sh Refactor to source shared env-resolution script; adjust variable usage.
hosting/railway/oss/scripts/lib.sh Add compose-image resolution helpers (service + redis).
hosting/railway/oss/scripts/deploy-from-images.sh Resolve Redis image from compose; remove baked-in placeholder auth/crypt envs.
hosting/railway/oss/scripts/configure.sh Replace placeholder key defaults with replace-me; refactor Postgres password resolution; allow optional Daytona key.
hosting/railway/oss/scripts/bootstrap.sh Resolve infra images from compose baseline via new helpers.
hosting/railway/oss/README.md Document compose-baseline image resolution and update workflow references.
hosting/docker-compose/oss/docker-compose.gh.yml Bump Postgres image from 16 to 17.
hosting/docker-compose/oss/docker-compose.gh.ssl.yml Bump Postgres image from 16 to 17.
hosting/docker-compose/oss/docker-compose.gh.local.yml Bump Postgres image from 16 to 17.
hosting/docker-compose/oss/docker-compose.dev.yml Bump Postgres image from 16 to 17.
hosting/docker-compose/ee/docker-compose.gh.local.yml Bump Postgres image from 16 to 17.
hosting/docker-compose/ee/docker-compose.dev.yml Bump Postgres image from 16 to 17.
docs/designs/web-tests/per-test-user-isolation.md New design doc for worker/test-scoped user/project isolation strategy.
docs/designs/testing/testing.running.specs.md Update workflow references for styling + unit checks.
docs/design/railway-preview-environments/status.md Update workflow inventory for preview automation.
docs/design/railway-preview-environments/plan.md Update plan to reference new workflow structure.
docs/design/playwright-oss-stabilization/status.md Rename teardown safety env var in guidance.
docs/design/playwright-oss-stabilization/research.md Rename teardown safety env var in guidance.
docs/design/playwright-oss-stabilization/qa.md Update auth guidance and teardown safety env var.
docs/design/playwright-oss-stabilization/context.md Rename teardown safety env var in guidance.
docs/design/playwright-oss-stabilization/backlog.md Rename teardown safety env var in backlog item text.
docs/community-topics.md Update Railway workflow references.
api/run-tests.py Add AGENTA_LICENSE envvar support and ensure junit/html reports are generated when not provided.
api/pytest.ini Remove default junit/html outputs from addopts (now handled by runner).
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_security.py New acceptance tests for embeds security/archived behavior (with TODO notes).
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_retrieve_resolve.py New acceptance tests for resolve=True on retrieve/query endpoints.
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_legacy.py New acceptance tests for legacy adapter embed resolution paths.
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_errors.py New acceptance tests covering embed resolution error policies and limits.
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_cross_entity.py New acceptance tests for cross-entity embeds (environments/workflows).
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds.py New baseline acceptance tests for embed resolution.
api/oss/tests/pytest/acceptance/.gitkeep Add placeholder file.
api/oss/src/utils/env.py Add commented Turnstile bypass site key for tests.
api/oss/src/utils/caching.py Adjust cache TTL constants/comments (L1 shorter).
api/oss/src/routers/projects_router.py Harden org lookup when fetching project auth context.
api/ee/tests/pytest/acceptance/.gitkeep Add placeholder file.
.windsurf/workflows/record-and-refactor-e2e-2.md Remove workflow doc.
.gitignore Align ignored test artifact dirs; add web/tests/reports; remove some generic results ignores.
.github/workflows/45-railway-cleanup.yml Rename workflow display name.
.github/workflows/44-railway-tests.yml New reusable workflow to run API/SDK/web tests against Railway preview deploys.
.github/workflows/43-railway-deploy.yml Convert to reusable deploy workflow with additional secrets and outputs.
.github/workflows/42-railway-build.yml Convert to reusable build workflow; support “skip build” when tag provided; remove direct deploy chaining.
.github/workflows/41-railway-setup.yml New reusable workflow to bootstrap Railway preview project/env.
.github/workflows/40-railway.yml Add grouping “header” workflow.
.github/workflows/33-update-api-docs.yml Rename workflow display name.
.github/workflows/32-generate-demo-traces.yml Rename workflow display name.
.github/workflows/31-sync-github-labels.yml Change to scheduled/dispatch run; remove PR/push triggers; disable dry-run.
.github/workflows/30-crons.yml Add grouping “header” workflow.
.github/workflows/14-check-pr-preview.yml New orchestration workflow chaining build → setup → deploy → tests.
.github/workflows/12-check-unit-tests.yml New workflow scaffolding for unit checks (SDK/API wired; web/services currently fail if tests exist).
.github/workflows/11-check-code-styling.yml New unified Ruff/Prettier/ESLint styling workflow.
.github/workflows/10-playwright-oss-tests.yml Remove old Playwright OSS workflow (replaced by new preview test workflow).
.github/workflows/10-checks.yml Add grouping “header” workflow.
.github/workflows/04-check-frontend-linting.yml Remove old frontend linting workflow (replaced by unified styling workflow).
.github/workflows/03-check-python-linting.yml Remove old python lint workflow (replaced by unified styling workflow).
.github/workflows/02-check-python-formatting.yml Remove old python format workflow (replaced by unified styling workflow).
.github/workflows/00-releases.yml Add grouping “header” workflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@junaway junaway marked this pull request as draft March 12, 2026 00:10
@junaway junaway marked this pull request as ready for review March 12, 2026 00:11
Copilot AI review requested due to automatic review settings March 12, 2026 00:11
@dosubot dosubot bot added the ci/cd label Mar 12, 2026
@junaway junaway marked this pull request as draft March 12, 2026 00:15
@junaway junaway marked this pull request as ready for review March 12, 2026 00:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 81 out of 92 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

run: cd web && pnpm install --frozen-lockfile

- name: Run Prettier formatting fix
run: cd web && pnpm run format-fix
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow is labeled as a code styling check, but it runs pnpm run format-fix (Prettier --write). That can silently modify the workspace and still exit 0, so the job may pass even when the PR is not formatted. Consider switching to a check-only command (e.g. pnpm run format / prettier --check) so CI fails when formatting is needed, or commit any formatting changes within the workflow and fail if the working tree becomes dirty.

Suggested change
run: cd web && pnpm run format-fix
run: |
cd web
pnpm run format-fix
# Fail if formatting changes are needed
git diff --quiet || { echo 'Prettier formatting changes detected. Please run "pnpm run format-fix" in the web directory and commit the changes.'; exit 1; }

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +96
- name: Run ESLint fix
run: cd web && pnpm run lint-fix
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This job runs pnpm run lint-fix (ESLint/Next lint with --fix), which can allow lint issues to be auto-fixed without failing the workflow or reflecting changes in the PR. For a CI check, prefer a non-fixing lint command so PRs fail when they introduce lint violations.

Suggested change
- name: Run ESLint fix
run: cd web && pnpm run lint-fix
- name: Run ESLint check
run: cd web && pnpm run lint

Copilot uses AI. Check for mistakes.
Comment on lines 23 to 27
export function createInitialUserState(project: Partial<WorkerInfo["project"]>): UserState {
const testmail = getTestmailClient()

// Create email with structured tag
const email = testmail.generateTestEmail({
const email = generateRuntimeTestEmail({
scope: project.name,
branch: process.env.BRANCH_NAME,
})
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateRuntimeTestEmail() can generate either a Testmail inbox address or a fallback @test.agenta.ai address depending on env. The docstring example above still shows the old Testmail format (...@namespace.testmail.app), which no longer matches the actual output; updating the example will avoid confusion when debugging auth flows.

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +129
const verifyEmailText = page.getByText("Verify your email")
const continueWithOtpButton = page.getByRole("button", {
name: "Continue with OTP",
})
const resendOtpLink = page.getByText("Resend one-time password")
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a block-scoped const continueWithOtpButton declared earlier in this function, and then another const continueWithOtpButton declared again inside the OTP branch. The shadowing makes it easy to accidentally reference the wrong locator when editing this flow; consider renaming the inner locator (or reusing the outer one) to avoid shadowing.

Copilot uses AI. Check for mistakes.
@junaway junaway marked this pull request as draft March 12, 2026 01:04
@junaway junaway marked this pull request as ready for review March 12, 2026 01:12
Copilot AI review requested due to automatic review settings March 12, 2026 01:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 81 out of 92 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 95 to 96
const timestamp = Date.now()
await uiHelpers.typeWithDelay('input[type="email"]', email)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const timestamp = Date.now() is now unused in this helper. This will likely trigger lint/TS unused-variable checks; remove it or use it consistently (e.g., for timestamp_from).

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +72
- name: Install dependencies
run: cd web && pnpm install --frozen-lockfile

- name: Run Prettier formatting fix
run: cd web && pnpm run format-fix
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow runs format-fix, which will auto-modify files and still exit 0. That makes the job ineffective as a PR gate (formatting problems won’t fail CI). Prefer a check-only command (e.g., prettier --check / pnpm run format-check) and fail if formatting is off.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Report Something isn't working bug Something isn't working ci/cd size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants