Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 0 additions & 69 deletions .github/workflows/Benchmarks.yml

This file was deleted.

10 changes: 9 additions & 1 deletion .github/workflows/CreateRelease.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,16 @@ jobs:

benchmarks:
needs: [build-guests]
uses: ./.github/workflows/Benchmarks.yml
strategy:
fail-fast: true
matrix:
hypervisor: [hyperv, 'hyperv-ws2025', mshv3, kvm]
cpu: [amd, intel]
uses: ./.github/workflows/dep_benchmarks.yml
secrets: inherit
with:
hypervisor: ${{ matrix.hypervisor }}
cpu: ${{ matrix.cpu }}
permissions:
contents: read

Expand Down
71 changes: 71 additions & 0 deletions .github/workflows/DailyBenchmarks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# yaml-language-server: $schema=https://json.schemastore.org/github-workflow.json

name: Daily Benchmarks

on:
schedule:
- cron: '0 0 * * *' # Runs at 00:00 UTC every day
workflow_dispatch: # Allow manual triggering

permissions:
contents: read
actions: read

jobs:
# Find the most recent successful run of this workflow so we can download
# its benchmark artifacts as a baseline for day-over-day comparison.
find-baseline:
runs-on: ubuntu-latest
outputs:
run-id: ${{ steps.find-run.outputs.run_id }}
steps:
- name: Find latest successful run
id: find-run
# gh run list returns runs sorted by creation date descending (implicit).
# On the first-ever run, this outputs empty and dep_benchmarks.yml
# will skip the baseline download (continue-on-error).
run: |
run_id=$(gh run list --repo "${{ github.repository }}" --workflow DailyBenchmarks.yml --status success --limit 1 --json databaseId --jq '.[0].databaseId // empty')
echo "run_id=$run_id" >> "$GITHUB_OUTPUT"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

# Build release guest binaries needed by the benchmark suite.
build-guests:
uses: ./.github/workflows/dep_build_guests.yml
secrets: inherit
with:
config: release

# Run benchmarks across all hypervisor/cpu combos, comparing against
# the previous day's results. Artifacts are retained for 90 days.
benchmarks:
needs: [build-guests, find-baseline]
strategy:
fail-fast: true
matrix:
hypervisor: [hyperv, 'hyperv-ws2025', mshv3, kvm]
cpu: [amd, intel]
uses: ./.github/workflows/dep_benchmarks.yml
secrets: inherit
with:
hypervisor: ${{ matrix.hypervisor }}
cpu: ${{ matrix.cpu }}
baseline_run_id: ${{ needs.find-baseline.outputs.run-id }}
retention_days: 90

# File a GitHub issue if any job fails.
notify-failure:
runs-on: ubuntu-latest
needs: [build-guests, benchmarks]
if: always() && (needs.build-guests.result == 'failure' || needs.benchmarks.result == 'failure')
permissions:
issues: write
steps:
- name: Checkout code
uses: actions/checkout@v6

- name: Notify Benchmark Failure
run: ./dev/notify-ci-failure.sh --title="Benchmark Failure - ${{ github.run_number }}" --labels="area/benchmarks,area/testing,lifecycle/needs-review,release-blocker"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
22 changes: 0 additions & 22 deletions .github/workflows/ValidatePullRequest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,27 +125,6 @@ jobs:
cpu: ${{ matrix.cpu }}
config: ${{ matrix.config }}

# Run benchmarks - release only, needs guest artifacts, runs in parallel with build-test
benchmarks:
needs:
- docs-pr
- build-guests
# Required because update-guest-locks is skipped on non-dependabot PRs,
# and a skipped dependency transitively skips all downstream jobs.
# See: https://github.com/actions/runner/issues/2205
if: ${{ !cancelled() && !failure() }}
strategy:
fail-fast: true
matrix:
hypervisor: [hyperv, 'hyperv-ws2025', mshv3, kvm]
cpu: [amd, intel]
uses: ./.github/workflows/dep_benchmarks.yml
secrets: inherit
with:
docs_only: ${{ needs.docs-pr.outputs.docs-only }}
hypervisor: ${{ matrix.hypervisor }}
cpu: ${{ matrix.cpu }}

fuzzing:
needs:
- docs-pr
Expand Down Expand Up @@ -187,7 +166,6 @@ jobs:
- code-checks
- build-test
- run-examples
- benchmarks
- fuzzing
- spelling
- license-headers
Expand Down
60 changes: 58 additions & 2 deletions .github/workflows/dep_benchmarks.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,28 @@
# yaml-language-server: $schema=https://json.schemastore.org/github-workflow.json

# Reusable workflow to run benchmarks on a single hypervisor/cpu combination.
#
# Baseline comparison:
# The workflow supports two mutually exclusive ways to load a baseline for
# Criterion to compare against:
#
# 1. baseline_run_id — Downloads benchmark artifacts from a previous workflow
# run (by run ID). Used by DailyBenchmarks.yml for day-over-day comparison.
#
# 2. baseline_tag — Downloads benchmark tarballs from a GitHub Release (by tag).
# If empty (the default), `gh release download` fetches from the latest
# stable release. Used by CreateRelease.yml.
#
# If baseline_run_id is set, baseline_tag is ignored.
# If neither is set, the latest stable release is used.
# Both downloads use continue-on-error so the first-ever run (no baseline
# available) succeeds without comparison.
#
# Artifact upload:
# Benchmark results are always uploaded as workflow artifacts named
# benchmarks_<OS>_<hypervisor>_<cpu>. The retention_days input controls
# how long they are kept (default: 5 days).

name: Run Benchmarks

on:
Expand All @@ -18,6 +41,21 @@ on:
description: CPU architecture for the build (passed from caller matrix)
required: true
type: string
baseline_tag:
description: Release tag to download baseline benchmarks from (e.g. dev-latest). Ignored if baseline_run_id is set. If empty, downloads from the latest stable release.
required: false
type: string
default: ""
baseline_run_id:
description: Workflow run ID to download baseline benchmark artifacts from. Takes precedence over baseline_tag.
required: false
type: string
default: ""
retention_days:
description: Number of days to retain benchmark artifacts
required: false
type: number
default: 5

env:
CARGO_TERM_COLOR: always
Expand Down Expand Up @@ -74,11 +112,29 @@ jobs:
- name: Build
run: just build release

- name: Download benchmarks from "latest"
run: just bench-download ${{ runner.os }} ${{ inputs.hypervisor }} ${{ inputs.cpu }} dev-latest # compare to prerelease
- name: Download baseline from previous run
if: ${{ inputs.baseline_run_id != '' }}
uses: actions/download-artifact@v8
with:
name: benchmarks_${{ runner.os }}_${{ inputs.hypervisor }}_${{ inputs.cpu }}
path: ./target/criterion/
run-id: ${{ inputs.baseline_run_id }}
github-token: ${{ secrets.GITHUB_TOKEN }}
continue-on-error: true

- name: Download baseline from release
if: ${{ inputs.baseline_run_id == '' }}
run: just bench-download ${{ runner.os }} ${{ inputs.hypervisor }} ${{ inputs.cpu }} ${{ inputs.baseline_tag }}
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
continue-on-error: true

- name: Run benchmarks
run: just bench-ci main

- uses: actions/upload-artifact@v7
with:
name: benchmarks_${{ runner.os }}_${{ inputs.hypervisor }}_${{ inputs.cpu }}
path: ./target/criterion/
if-no-files-found: error
retention-days: ${{ inputs.retention_days }}
10 changes: 5 additions & 5 deletions docs/benchmarking-hyperlight.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

Hyperlight uses the [Criterion](https://bheisler.github.io/criterion.rs/book/index.html) framework to run and analyze benchmarks. A benefit to this framework is that it doesn't require the nightly toolchain.

## When Benchmarks are ran
## When Benchmarks are run

1. Every time a branch gets a push
- Compares the current branch benchmarking results to the "dev-latest" release (which is the most recent push to "main" branch). This is done as part of `dep_rust.yml`, which is invoked by `ValidatePullRequest.yml`. These benchmarks are for the developer to compare their branch to main, and the results can only be seen in the GitHub action logs, and nothing is saved.
1. Daily (scheduled)
- Benchmarks run daily via `DailyBenchmarks.yml`, comparing results against the previous day's run. Results are stored as workflow artifacts with 90-day retention.

```
sandboxes/create_sandbox
Expand All @@ -15,9 +15,9 @@ Hyperlight uses the [Criterion](https://bheisler.github.io/criterion.rs/book/ind
```

2. For each release
- For each release, benchmarks are ran as part of the release pipeline in `CreateRelease.yml`, which invokes `Benchmarks.yml`. These benchmark results are compared to the previous release, and are uploaded as port of the "Release assets" on the GitHub release page.
- For each release, benchmarks are run as part of the release pipeline in `CreateRelease.yml`, which invokes `dep_benchmarks.yml`. These benchmark results are compared to the previous release, and are uploaded as part of the "Release assets" on the GitHub release page.

Currently, benchmarks are ran on windows, linux-kvm (ubuntu), and linux-hyperv (mariner). Only release builds are benchmarked, not debug.
Currently, benchmarks are run on windows, linux-kvm (ubuntu), and linux-hyperv (mariner). Only release builds are benchmarked, not debug.

## Criterion artifacts

Expand Down
Loading