CI: ensure artifacts can be fetched successfully by leofang · Pull Request #1725 · NVIDIA/cuda-python

leofang · 2026-03-05T03:00:40Z

xref: #1714 (comment)

copy-pr-bot · 2026-03-05T03:00:45Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

leofang · 2026-03-05T15:59:06Z

/ok to test a403f90

github-actions · 2026-03-06T16:23:15Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

rwgk · 2026-03-06T19:08:15Z

I took a quick look at this PR, purely out of curiosity. After Cursor explained what this PR does I asked:

Hm, wouldn't it be safer to fail if the last run on the backport branch was not successful?

See below for the response triggered by that one question. Does the response make sense?

Initial Question

Asked about PR 1725 (commit cfbda9fd0c) which changed -s completed to -s success in GitHub Actions workflow files.

What the Original Change Did

The PR changed the gh run list command from:

-s completed → includes all finished runs (success, failure, cancelled, etc.)
-s success → only includes runs that completed successfully

This was done in three workflow files:

.github/workflows/build-wheel.yml
.github/workflows/test-wheel-linux.yml
.github/workflows/test-wheel-windows.yml

The purpose was to ensure backport CI only downloads artifacts from successful runs, not failed ones.

Follow-Up Discussion

Question: "Wouldn't it be safer to fail if the last run on the backport branch was not successful?"

Rationale: The current approach (-s success) would silently skip over failed runs and use artifacts from an older successful run. This could mask problems on the backport branch. It would be better to explicitly check the most recent run and fail if it's not successful.

Solution Implemented

Updated all three workflow files to:

Get the most recent run regardless of status (removed the -s success filter)
Check if the run is completed (not still in progress)
Check if the conclusion is "success"
Fail with clear error messages if either check fails

Benefits

Fails fast: If the backport branch has failing CI, backport CI fails immediately with a clear message
Prevents masking: Avoids silently using artifacts from an older successful run when the latest run failed
Better error messages: Explains what went wrong and what needs to be fixed

Changes Made

Diff: `.github/workflows/build-wheel.yml`

--- a/.github/workflows/build-wheel.yml
+++ b/.github/workflows/build-wheel.yml
@@ -369,9 +369,26 @@ jobs:
 
           OLD_BRANCH=$(yq '.backport_branch' ci/versions.yml)
           OLD_BASENAME="cuda-bindings-python${PYTHON_VERSION_FORMATTED}-cuda*-${{ inputs.host-platform }}*"
-          LATEST_PRIOR_RUN_ID=$(gh run list -b ${OLD_BRANCH} -L 1 -w "ci.yml" -s success -R NVIDIA/cuda-python --json databaseId | jq '.[]| .databaseId')
+          # Get the most recent run (regardless of status) and verify it's successful
+          LATEST_RUN_DATA=$(gh run list -b ${OLD_BRANCH} -L 1 -w "ci.yml" -R NVIDIA/cuda-python --json databaseId,conclusion,status)
+          LATEST_PRIOR_RUN_ID=$(echo "${LATEST_RUN_DATA}" | jq -r '.[0].databaseId // empty')
+          LATEST_RUN_CONCLUSION=$(echo "${LATEST_RUN_DATA}" | jq -r '.[0].conclusion // empty')
+          LATEST_RUN_STATUS=$(echo "${LATEST_RUN_DATA}" | jq -r '.[0].status // empty')
+          
           if [[ "$LATEST_PRIOR_RUN_ID" == "" ]]; then
-            echo "LATEST_PRIOR_RUN_ID not found!"
+            echo "No workflow runs found on branch ${OLD_BRANCH}!"
+            exit 1
+          fi
+          
+          if [[ "$LATEST_RUN_STATUS" != "completed" ]]; then
+            echo "The most recent CI run on branch ${OLD_BRANCH} is still ${LATEST_RUN_STATUS} (not completed)."
+            echo "Please wait for the CI to complete before running backport CI."
+            exit 1
+          fi
+          
+          if [[ "$LATEST_RUN_CONCLUSION" != "success" ]]; then
+            echo "ERROR: The most recent CI run on branch ${OLD_BRANCH} did not succeed (conclusion: ${LATEST_RUN_CONCLUSION})."
+            echo "Backport CI requires artifacts from a successful run. Please fix the CI failures on ${OLD_BRANCH} first."
             exit 1
           fi

Diff: `.github/workflows/test-wheel-linux.yml`

--- a/.github/workflows/test-wheel-linux.yml
+++ b/.github/workflows/test-wheel-linux.yml
@@ -151,9 +151,26 @@ jobs:
 
           OLD_BRANCH=${{ needs.compute-matrix.outputs.OLD_BRANCH }}
           OLD_BASENAME="cuda-bindings-python${PYTHON_VERSION_FORMATTED}-cuda*-${{ inputs.host-platform }}*"
-          LATEST_PRIOR_RUN_ID=$(gh run list -b ${OLD_BRANCH} -L 1 -w "ci.yml" -s success -R NVIDIA/cuda-python --json databaseId | jq '.[]| .databaseId')
+          # Get the most recent run (regardless of status) and verify it's successful
+          LATEST_RUN_DATA=$(gh run list -b ${OLD_BRANCH} -L 1 -w "ci.yml" -R NVIDIA/cuda-python --json databaseId,conclusion,status)
+          LATEST_PRIOR_RUN_ID=$(echo "${LATEST_RUN_DATA}" | jq -r '.[0].databaseId // empty')
+          LATEST_RUN_CONCLUSION=$(echo "${LATEST_RUN_DATA}" | jq -r '.[0].conclusion // empty')
+          LATEST_RUN_STATUS=$(echo "${LATEST_RUN_DATA}" | jq -r '.[0].status // empty')
+          
           if [[ "$LATEST_PRIOR_RUN_ID" == "" ]]; then
-            echo "LATEST_PRIOR_RUN_ID not found!"
+            echo "No workflow runs found on branch ${OLD_BRANCH}!"
+            exit 1
+          fi
+          
+          if [[ "$LATEST_RUN_STATUS" != "completed" ]]; then
+            echo "The most recent CI run on branch ${OLD_BRANCH} is still ${LATEST_RUN_STATUS} (not completed)."
+            echo "Please wait for the CI to complete before running backport CI."
+            exit 1
+          fi
+          
+          if [[ "$LATEST_RUN_CONCLUSION" != "success" ]]; then
+            echo "ERROR: The most recent CI run on branch ${OLD_BRANCH} did not succeed (conclusion: ${LATEST_RUN_CONCLUSION})."
+            echo "Backport CI requires artifacts from a successful run. Please fix the CI failures on ${OLD_BRANCH} first."
+            exit 1
+          fi

Diff: `.github/workflows/test-wheel-windows.yml`

--- a/.github/workflows/test-wheel-windows.yml
+++ b/.github/workflows/test-wheel-windows.yml
@@ -137,12 +137,28 @@ jobs:
         run: |
           $OLD_BRANCH = yq '.backport_branch' ci/versions.yml
           $OLD_BASENAME = "cuda-bindings-python${env:PYTHON_VERSION_FORMATTED}-cuda*-${{ inputs.host-platform }}*"
-          $runData = gh run list -b $OLD_BRANCH -L 1 -w "ci.yml" -s success -R NVIDIA/cuda-python --json databaseId | ConvertFrom-Json
+          # Get the most recent run (regardless of status) and verify it's successful
+          $runData = gh run list -b $OLD_BRANCH -L 1 -w "ci.yml" -R NVIDIA/cuda-python --json databaseId,conclusion,status | ConvertFrom-Json
           if (-not $runData -or $runData.Length -eq 0 -or -not $runData[0].databaseId -or [string]::IsNullOrEmpty($runData[0].databaseId)) {
-              Write-Host "LATEST_PRIOR_RUN_ID not found!"
+              Write-Host "No workflow runs found on branch $OLD_BRANCH!"
               exit 1
           }
+          
           $LATEST_PRIOR_RUN_ID = $runData[0].databaseId
+          $LATEST_RUN_STATUS = $runData[0].status
+          $LATEST_RUN_CONCLUSION = $runData[0].conclusion
+          
+          if ($LATEST_RUN_STATUS -ne "completed") {
+              Write-Host "The most recent CI run on branch $OLD_BRANCH is still $LATEST_RUN_STATUS (not completed)."
+              Write-Host "Please wait for the CI to complete before running backport CI."
+              exit 1
+          }
+          
+          if ($LATEST_RUN_CONCLUSION -ne "success") {
+              Write-Host "ERROR: The most recent CI run on branch $OLD_BRANCH did not succeed (conclusion: $LATEST_RUN_CONCLUSION)."
+              Write-Host "Backport CI requires artifacts from a successful run. Please fix the CI failures on $OLD_BRANCH first."
+              exit 1
+          }
 
           gh run download $LATEST_PRIOR_RUN_ID -p $OLD_BASENAME -R NVIDIA/cuda-python

Summary

The improvement ensures that backport CI will fail fast and clearly if the backport branch has failing CI, rather than silently using artifacts from an older successful run. This prevents problems from being masked and provides better visibility into CI health on backport branches.

ensure the backport CI can fetch artifacts successfully

a403f90

leofang requested a review from mdboom March 5, 2026 03:00

leofang self-assigned this Mar 5, 2026

leofang added bug Something isn't working CI/CD CI/CD infrastructure labels Mar 5, 2026

leofang added the to-be-backported Trigger the bot to raise a backport PR upon merge label Mar 5, 2026

leofang mentioned this pull request Mar 5, 2026

[backport] build(deps): bump the actions-monthly group with 12 updates #1714

Merged

leofang changed the title ~~CI: ensure the backport CI can fetch artifacts successfully~~ CI: ensure artifacts can be fetched successfully Mar 5, 2026

leofang mentioned this pull request Mar 5, 2026

[backport] CI: ensure artifacts can be fetched successfully #1726

Merged

leofang removed the to-be-backported Trigger the bot to raise a backport PR upon merge label Mar 5, 2026

This comment has been minimized.

Sign in to view

cpcloud approved these changes Mar 6, 2026

View reviewed changes

cpcloud merged commit cfbda9f into NVIDIA:main Mar 6, 2026
170 of 175 checks passed

leofang deleted the fix_fetching_prev branch March 7, 2026 05:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: ensure artifacts can be fetched successfully#1725

CI: ensure artifacts can be fetched successfully#1725
cpcloud merged 1 commit intoNVIDIA:mainfrom
leofang:fix_fetching_prev

leofang commented Mar 5, 2026

Uh oh!

copy-pr-bot bot commented Mar 5, 2026

Uh oh!

leofang commented Mar 5, 2026

Uh oh!

This comment has been minimized.

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

rwgk commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

leofang commented Mar 5, 2026

Uh oh!

copy-pr-bot bot commented Mar 5, 2026

Uh oh!

leofang commented Mar 5, 2026

Uh oh!

This comment has been minimized.

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

rwgk commented Mar 6, 2026

Initial Question

What the Original Change Did

Follow-Up Discussion

Solution Implemented

Benefits

Changes Made

Diff: .github/workflows/build-wheel.yml

Diff: .github/workflows/test-wheel-linux.yml

Diff: .github/workflows/test-wheel-windows.yml

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Diff: `.github/workflows/build-wheel.yml`

Diff: `.github/workflows/test-wheel-linux.yml`

Diff: `.github/workflows/test-wheel-windows.yml`