[Evaluation] Recover partial red team results when Foundry execution raises by slister1001 · Pull Request #45541 · Azure/azure-sdk-for-python

slister1001 · 2026-03-05T22:30:25Z

When orchestrator.execute() raises (e.g., ConnectTimeout on 1 of 50 objectives), attempt to recover partial results from the orchestrator before falling back to the empty-result error path.

Previously, any single objective failure caused the entire risk category's results to be discarded (data_file set to empty string, 0 results returned). Now, completed objectives are processed through the normal FoundryResultProcessor pipeline and included in the final output.

The error is demoted from ERROR to WARNING when partial results are available, since it is not a total failure. The original full-failure path is preserved when get_attack_results() returns empty.

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

…raises When orchestrator.execute() raises (e.g., ConnectTimeout on 1 of 50 objectives), attempt to recover partial results from the orchestrator before falling back to the empty-result error path. Previously, any single objective failure caused the entire risk category's results to be discarded (data_file set to empty string, 0 results returned). Now, completed objectives are processed through the normal FoundryResultProcessor pipeline and included in the final output. The error is demoted from ERROR to WARNING when partial results are available, since it is not a total failure. The original full-failure path is preserved when get_attack_results() returns empty. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Improve resilience of red team evaluation runs by attempting to recover and process partial Foundry attack results when orchestrator.execute() raises, instead of discarding the entire risk category.

Changes:

On orchestrator.execute() exception, tries orchestrator.get_attack_results() to recover partial results.
Downgrades logging from ERROR to WARNING when partial results are recovered.
Preserves existing empty-result fallback behavior when no results can be recovered.

Copilot · 2026-03-05T22:35:39Z

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_execution_manager.py

+                    # Attempt to recover partial results before giving up
+                    partial_results = []
+                    try:
+                        partial_results = orchestrator.get_attack_results()
+                    except Exception:
+                        pass
+
+                    if partial_results:
+                        self.logger.warning(
+                            f"Partial failure executing attacks for {risk_value}: {e}. "
+                            f"Recovered {len(partial_results)} partial results."
+                        )


partial_results is computed but not used to drive downstream processing. If FoundryResultProcessor later calls orchestrator.get_attack_results() again (or expects state set by a successful execute()), the recovered results may be lost and the post-exception path could still behave like a full failure. Consider wiring the recovered partial_results into the processing path (e.g., pass into the processor or stash on the orchestrator) so the recovery is deterministic.

Copilot · 2026-03-05T22:35:40Z

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_execution_manager.py

+                    try:
+                        partial_results = orchestrator.get_attack_results()
+                    except Exception:
+                        pass


Swallowing all exceptions from get_attack_results() makes diagnosing recovery failures difficult. Consider at least logging at DEBUG level (optionally with exc_info=True) when partial-results recovery fails, so operators can distinguish 'no partial results' from 'recovery call failed'.

Suggested change

pass

self.logger.debug(

"Failed to recover partial attack results for %s",

risk_value,

exc_info=True,

)

Copilot · 2026-03-05T22:35:40Z

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_execution_manager.py

+                        self.logger.warning(
+                            f"Partial failure executing attacks for {risk_value}: {e}. "
+                            f"Recovered {len(partial_results)} partial results."
+                        )


In the partial-results path, the failure is only surfaced via logs and not recorded in the structured red_team_info data (unlike the full-failure fallback). If callers rely on structured output rather than logs, consider persisting a 'partial failure' indicator and the exception message alongside the recovered results so the outcome is observable without log access.

Suggested change

)

)

# Surface partial failure in structured red_team_info

if "Foundry" not in red_team_info:

red_team_info["Foundry"] = {}

existing_info = red_team_info["Foundry"].get(risk_value, {})

# Do not discard any existing structured data for this risk value

existing_info.setdefault("status", "partial_failure")

existing_info["partial_failure"] = True

existing_info["error"] = str(e)

red_team_info["Foundry"][risk_value] = existing_info

Copilot AI review requested due to automatic review settings March 5, 2026 22:30

slister1001 requested a review from a team as a code owner March 5, 2026 22:30

Copilot AI reviewed Mar 5, 2026

View reviewed changes

github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Mar 5, 2026

Copilot started reviewing on behalf of slister1001 March 5, 2026 23:25 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Evaluation] Recover partial red team results when Foundry execution raises#45541

[Evaluation] Recover partial red team results when Foundry execution raises#45541
slister1001 wants to merge 1 commit intoAzure:mainfrom
slister1001:fix/redteam-partial-result-recovery

slister1001 commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                        pass
+                        self.logger.debug(
+                            "Failed to recover partial attack results for %s",
+                            risk_value,
+                            exc_info=True,
+                        )

-                        )
+                        )
+                        # Surface partial failure in structured red_team_info
+                        if "Foundry" not in red_team_info:
+                            red_team_info["Foundry"] = {}
+                        existing_info = red_team_info["Foundry"].get(risk_value, {})
+                        # Do not discard any existing structured data for this risk value
+                        existing_info.setdefault("status", "partial_failure")
+                        existing_info["partial_failure"] = True
+                        existing_info["error"] = str(e)
+                        red_team_info["Foundry"][risk_value] = existing_info

Conversation

slister1001 commented Mar 5, 2026

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants