Preserve multimodal media in saved eval results#1015
Preserve multimodal media in saved eval results#1015d42me wants to merge 2 commits intoPrimeIntellect-ai:mainfrom
Conversation
2a656f0 to
ce3efb6
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| if not _BASE64_DATA_RE.fullmatch(compact_data): | ||
| return None | ||
|
|
||
| return media_type.lower(), compact_data |
There was a problem hiding this comment.
Unused production function with supporting regex constants
Low Severity
_parse_data_url and its three module-level compiled regexes (_DATA_URL_RE, _IMAGE_MEDIA_TYPE_RE, _BASE64_DATA_RE) are defined in production code but never called from any production code path. The function is only imported and invoked in the test file test_message_utils_multimodal.py. Neither _extract_image_part_for_output nor serialize_message_for_output calls it. Additionally, verifiers/clients/anthropic_messages_client.py already contains a parse_data_url function that appears to serve the same purpose, making this a potential duplication as well.
Additional Locations (1)
hallerite
left a comment
There was a problem hiding this comment.
left some comments, but lgtm! pre-approved.
|
|
||
| await asyncio.to_thread(shutil.rmtree, session.local_rollout_dir, True) | ||
| await asyncio.to_thread( | ||
| lambda: shutil.rmtree(session.local_rollout_dir, ignore_errors=True) |
| return media_type.lower(), compact_data | ||
|
|
||
|
|
||
| def _extract_image_part_for_output(part: Mapping[str, Any]) -> dict[str, Any] | None: |
There was a problem hiding this comment.
all this dict matching is annoying and it would be better to have an image type that we can use internally for better readability (which I have a PR for), but I think it's fine temporarily


Summary
Note
Medium Risk
Changes how prompts/completions are serialized when saving eval results, which can affect downstream consumers expecting prior placeholder text formats. New parsing/normalization logic for multimodal parts could drop or transform malformed media payloads differently than before.
Overview
Saved eval outputs now preserve structured multimodal content in
prompt/completion(includingimage_urlandinput_audio) by switchingstates_to_outputsto use newserialize_messages_for_outputrather thanmessages_to_printableplaceholders.message_utilsadds multimodal-safe serialization helpers (including audio alias support and whitespace-compacting normalization) plus_parse_data_urlvalidation utilities, and tests are expanded/renamed to cover data-url parsing, typed-message serialization, audio fallback behavior, and save-path regression for multimodal prompts/completions. A small reliability tweak also updates RLM sandbox cleanup to useshutil.rmtree(..., ignore_errors=True).Written by Cursor Bugbot for commit f4dfaf1. This will update automatically on new commits. Configure here.