feat(eval): decouple evaluation execution with remote eval support#1317
Draft
feat(eval): decouple evaluation execution with remote eval support#1317
Conversation
…rter Move the service prefix into _get_base_url() so that localhost URLs use /llmops_ while all other URLs use /llmopstenant_. This allows local development to route to the correct service endpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…porter When UIPATH_TRACE_BASE_URL is set, use it directly as the base URL instead of deriving it from UIPATH_URL. This allows full control over the trace endpoint without relying on the localhost/llmops_ heuristic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Simplify _get_base_url to only two paths: use UIPATH_TRACE_BASE_URL verbatim if set, otherwise derive from UIPATH_URL with llmopstenant_ appended. The localhost/llmops_ heuristic is no longer needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… eval support Add strategy pattern to support running evaluators either locally (default) or on a remote C# Agents backend via --remote-eval flag / UIPATH_REMOTE_EVAL env var. When remote, the CLI serializes traces and agent output, POSTs to /api/evaluate, polls for results, and skips duplicate Studio Web reporting. New files: - SerializableSpan/ReconstructedSpan models for trace serialization - RemoteEvaluationClient for backend communication - EvaluationStrategy with Local and Remote implementations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SerializableSpanandReconstructedSpanmodels for serializing/deserializing OpenTelemetry trace spans to JSON, enabling trace data to be sent to the backend for remote evaluationLocalEvaluationStrategyandRemoteEvaluationStrategyvia a strategy pattern, decoupling evaluator execution from the CLI processRemoteEvaluationClientthat submits evaluation payloads to the C# Agents backend (POST /evaluate) and polls for results (GET /evaluate/status/{id}) with exponential backoff--remote-evalCLI flag andUIPATH_REMOTE_EVALenv var to opt into remote evaluationskip_studio_web_reportingflag to avoid duplicate Studio Web reporting when the backend handles itTest plan
uipath eval agent.json(without--remote-eval) works identically to current behavioruipath eval agent.json --remote-evalsubmits to backend, polls, and displays resultsSerializableSpanround-trip:ReadableSpan→ serialize → deserialize →ReconstructedSpanskip_studio_web_reportingprevents duplicate API calls🤖 Generated with Claude Code