[Evaluation] Fix red team status tracking, cache key mismatch, and evaluation error handling#45517
Open
slister1001 wants to merge 2 commits intomainfrom
Open
[Evaluation] Fix red team status tracking, cache key mismatch, and evaluation error handling#45517slister1001 wants to merge 2 commits intomainfrom
slister1001 wants to merge 2 commits intomainfrom
Conversation
…r handling Bug 1 - Status tracking: _determine_run_status now treats 'pending' and 'running' entries as 'failed' instead of 'in_progress'. By the time this method runs the scan is finished, so leftover 'pending' entries (from skipped risk categories or Foundry execution failures) indicate failure, not ongoing work. Bug 2 - Cache key mismatch: _execute_attacks_with_foundry now uses get_attack_objective_from_risk_category() to build the cache lookup key, matching the caching logic in _get_attack_objectives. Previously, ungrounded_attributes objectives were cached under 'isa' but looked up under 'ungrounded_attributes', causing them to be silently skipped. Bug 3 - Evaluation error handling: RAIServiceScorer now detects when the RAI evaluation service returns an error response (properties.outcome == 'error', e.g. ServiceInvocationException) and raises RuntimeError. This causes PyRIT to treat the score as UNDETERMINED instead of using the erroneous passed=False to incorrectly mark the attack as successful, which was inflating ASR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes three bugs found during the red team SDK bug bash:
- Run status stuck at
in_progress: Treats leftoverpendingandrunningstatuses asfailedsince the scan has already finished. ungrounded_attributessilently skipped: Fixes a cache key mismatch by usingget_attack_objective_from_risk_category()instead of the raw risk value for the baseline cache lookup key.ServiceInvocationExceptioninflating ASR: Detects error responses from the RAI evaluation service and raisesRuntimeErrorso scores are marked as UNDETERMINED rather than being incorrectly treated as attack success.
Changes:
- Updated
_determine_run_status()to collapsepending/runninginto the failure set - Fixed cache key construction in
_execute_attacks_with_foundry()to match the caching logic - Added error-outcome detection in
RAIServiceScorer._score_piece_async()to prevent false attack-success counts
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
_result_processor.py |
Treats pending/running as terminal failures in _determine_run_status() |
_red_team.py |
Uses get_attack_objective_from_risk_category() for consistent cache key lookup |
_rai_scorer.py |
Detects properties.outcome == "error" and raises RuntimeError for undetermined scoring |
CHANGELOG.md |
Documents the three bug fixes |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes three bugs discovered during the red team SDK bug bash:
Bug 1 - Run status stuck at in_progress:
_determine_run_status()now treats leftoverpendingandrunningentries asfailedinstead ofin_progress. By the time this method runs the scan is finished, sopendingentries (from skipped risk categories or Foundry execution failures) indicate failure, not ongoing work.Bug 2 - ungrounded_attributes silently skipped:
_execute_attacks_with_foundry()now usesget_attack_objective_from_risk_category()to build the cache lookup key, matching the caching logic in_get_attack_objectives(). Previously, objectives were cached underisabut looked up underungrounded_attributes, causing the category to appear to have 0 objectives despite the API returning 100.Bug 3 - ServiceInvocationException inflating ASR:
RAIServiceScorernow detects when the RAI evaluation service returns an error response (properties.outcome == 'error') and raisesRuntimeError, causing PyRIT to treat the score as UNDETERMINED. Previously, the erroneouspassed=Falsefrom error responses was incorrectly treated as attack success, inflating theprotected_materialASR from 0% to 50%.