Skip to content

Preserve multimodal media in saved eval results#1015

Open
d42me wants to merge 2 commits intoPrimeIntellect-ai:mainfrom
d42me:fix/save-multimodal-base64-results
Open

Preserve multimodal media in saved eval results#1015
d42me wants to merge 2 commits intoPrimeIntellect-ai:mainfrom
d42me:fix/save-multimodal-base64-results

Conversation

@d42me
Copy link
Collaborator

@d42me d42me commented Mar 13, 2026

Summary

  • preserve multimodal images and audio in saved eval results instead of collapsing them to placeholders
  • normalize image data URLs into base64-backed input_image records and preserve structured input_audio payloads when valid
  • add regression coverage for data URL parsing, audio alias handling, typed message serialization, and prompt/completion save paths

Note

Medium Risk
Changes how prompts/completions are serialized when saving eval results, which can affect downstream consumers expecting prior placeholder text formats. New parsing/normalization logic for multimodal parts could drop or transform malformed media payloads differently than before.

Overview
Saved eval outputs now preserve structured multimodal content in prompt/completion (including image_url and input_audio) by switching states_to_outputs to use new serialize_messages_for_output rather than messages_to_printable placeholders.

message_utils adds multimodal-safe serialization helpers (including audio alias support and whitespace-compacting normalization) plus _parse_data_url validation utilities, and tests are expanded/renamed to cover data-url parsing, typed-message serialization, audio fallback behavior, and save-path regression for multimodal prompts/completions. A small reliability tweak also updates RLM sandbox cleanup to use shutil.rmtree(..., ignore_errors=True).

Written by Cursor Bugbot for commit f4dfaf1. This will update automatically on new commits. Configure here.

@d42me d42me requested a review from hallerite March 13, 2026 02:36
@d42me d42me requested review from mikasenghaas and snimu March 13, 2026 02:36
@d42me d42me force-pushed the fix/save-multimodal-base64-results branch from 2a656f0 to ce3efb6 Compare March 13, 2026 02:36
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

if not _BASE64_DATA_RE.fullmatch(compact_data):
return None

return media_type.lower(), compact_data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused production function with supporting regex constants

Low Severity

_parse_data_url and its three module-level compiled regexes (_DATA_URL_RE, _IMAGE_MEDIA_TYPE_RE, _BASE64_DATA_RE) are defined in production code but never called from any production code path. The function is only imported and invoked in the test file test_message_utils_multimodal.py. Neither _extract_image_part_for_output nor serialize_message_for_output calls it. Additionally, verifiers/clients/anthropic_messages_client.py already contains a parse_data_url function that appears to serve the same purpose, making this a potential duplication as well.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Member

@hallerite hallerite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments, but lgtm! pre-approved.


await asyncio.to_thread(shutil.rmtree, session.local_rollout_dir, True)
await asyncio.to_thread(
lambda: shutil.rmtree(session.local_rollout_dir, ignore_errors=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the lambda

return media_type.lower(), compact_data


def _extract_image_part_for_output(part: Mapping[str, Any]) -> dict[str, Any] | None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all this dict matching is annoying and it would be better to have an image type that we can use internally for better readability (which I have a PR for), but I think it's fine temporarily

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants