Hyperfleet-542 : Add OpenTelemetry instrumentation to Sentinel by tirthct · Pull Request #51 · openshift-hyperfleet/hyperfleet-sentinel

tirthct · 2026-02-25T00:49:13Z

Summary
Implements comprehensive OpenTelemetry tracing instrumentation in Sentinel to enable distributed tracing across the HyperFleet orchestration pipeline.

Span hierarchy:
sentinel.poll (root)

HTTP GET (API fetch)
- sentinel.evaluate (resource 1)
  - hyperfleet-clusters publish
- sentinel.evaluate (resource 2)
  - hyperfleet-clusters publish

Resolves
HYPERFLEET-542

Changed

Added pkg/telemetry/otel.go with configurable OpenTelemetry setup
Implemented required spans
Implemented W3C trace context propagation
Added environment variable support for OTEL configuration

Tested

Complete parent-child span relationships : tested
trace_id/span_id appear in structured logs : tested
required spans are created : tested
traceparent extension format : tested

Summary by CodeRabbit

New Features
- OpenTelemetry tracing: opt-in at startup, graceful init/shutdown, per-poll/per-resource/publish spans, CloudEvent traceparent propagation, and trace IDs surfaced in logs.
Configuration
- New messaging_system examples; logger config adds an OTel enable toggle.
Tests
- Expanded telemetry unit and integration tests validating spans, propagation, exporters, and end-to-end span hierarchies.
Chores
- Updated OpenTelemetry-related dependencies and instrumentation.

openshift-ci · 2026-02-25T00:49:23Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yingzhanredhat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-02-25T00:50:06Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4b3aba57-c094-42b4-b976-268ffecdb303

📥 Commits

Reviewing files that changed from the base of the PR and between ddc5222 and 80aaa6a.

📒 Files selected for processing (1)

pkg/telemetry/otel.go

Walkthrough

Adds OpenTelemetry tracing and observability across the service. Introduces pkg/telemetry with InitTraceProvider, Shutdown, StartSpan, and SetTraceContext; wires TracerProvider init/shutdown in cmd/sentinel driven by TRACING_ENABLED/OTEL_SERVICE_NAME; instruments the HTTP client with otelhttp; records sentinel.poll, sentinel.evaluate, and per-publish spans with attributes, errors, and CloudEvent traceparent attachment; extends LogConfig with OTel and SentinelConfig with MessagingSystem; and adds unit and integration tests validating tracer init, spans, propagation, and hierarchy.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant TP as TracerProvider
    participant Exporter as SpanExporter
    participant Sentinel as Sentinel
    participant Evaluator as Evaluator
    participant Publisher as Publisher
    participant Broker as Broker

    App->>TP: InitTraceProvider(serviceName)
    TP->>Exporter: configure exporter & sampler
    TP-->>App: return TracerProvider

    loop Poll cycle
        Sentinel->>Sentinel: Start "sentinel.poll" span
        Sentinel->>Evaluator: iterate resources
        Evaluator->>Evaluator: Start "sentinel.evaluate" span (resource attrs)
        Evaluator->>Publisher: create publish child span
        Publisher->>Broker: Publish(event, publishCtx)
        Publisher->>Exporter: Export spans
        Publisher-->>Evaluator: publish result
        Evaluator->>Evaluator: End "sentinel.evaluate" span
        Sentinel->>Sentinel: End "sentinel.poll" span
    end

    App->>TP: Shutdown(ctx)
    TP->>Exporter: Flush & shutdown

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Hyperfleet-361 : Use updated broker with logger context inheritance #45 — Related changes propagating context/trace into broker publishing and logging.
HYPERFLEET-326: Implement configurable logging per HyperFleet specification #34 — Related logging initialization and LogConfig changes overlapping cmd/sentinel and pkg/logger.
HYPERFLEET-101: Implement Prometheus metrics and observability for Sentinel #14 — Instrumentation of sentinel trigger flow (Prometheus metrics) touching the same sentinel paths.

Suggested labels

lgtm, approved

Suggested reviewers

yasun1

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 53.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically describes the main objective: adding OpenTelemetry instrumentation to Sentinel. It is concise, directly related to the primary change across all modified files, and provides meaningful context for understanding the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/sentinel/sentinel.go (1)

113-187: ⚠️ Potential issue | 🔴 Critical

Span leak: evalSpan.End() and publishSpan.End() are skipped on continue paths.

When event.SetData fails (Line 141) or Publish fails (Line 164), the continue statements skip evalSpan.End() at Line 187 and publishSpan.End() at Line 167 respectively. Leaked spans will accumulate in memory and produce incomplete traces.

The safest fix is to end spans via defer or explicit End() calls before each continue.

Proposed fix

 		// span: sentinel.evaluate
 		evalCtx, evalSpan := telemetry.StartSpan(ctx, "sentinel.evaluate",
 			attribute.String("hyperfleet.resource_type", resource.Kind),
 			attribute.String("hyperfleet.resource_id", resource.ID),
 		)
 
 		decision := s.decisionEngine.Evaluate(resource, now)
 
 		if decision.ShouldPublish {
 			pending++
 
 			// Add decision reason to context for structured logging
 			eventCtx := logger.WithDecisionReason(evalCtx, decision.Reason)
 
 			// Create CloudEvent
 			event := cloudevents.NewEvent()
 			event.SetSpecVersion(cloudevents.VersionV1)
 			event.SetType(fmt.Sprintf("com.redhat.hyperfleet.%s.reconcile", resource.Kind))
 			event.SetSource("hyperfleet-sentinel")
 			event.SetID(uuid.New().String())
 			if err := event.SetData(cloudevents.ApplicationJSON, map[string]interface{}{
 				"kind":       resource.Kind,
 				"id":         resource.ID,
 				"generation": resource.Generation,
 				"href":       resource.Href,
 				"reason":     decision.Reason,
 			}); err != nil {
 				s.logger.Errorf(eventCtx, "Failed to set event data resource_id=%s error=%v", resource.ID, err)
+				evalSpan.End()
 				continue
 			}
 
 			// span: publish (child of sentinel.evaluate)
 			publishCtx, publishSpan := telemetry.StartSpan(eventCtx, fmt.Sprintf("%s publish", topic),
 				attribute.String("hyperfleet.resource_type", resource.Kind),
 				attribute.String("hyperfleet.resource_id", resource.ID),
 				attribute.String("hyperfleet.decision_reason", decision.Reason),
 				attribute.String("messaging.system", "gcp_pubsub"),
 				attribute.String("messaging.operation.type", "publish"),
 				attribute.String("messaging.destination.name", topic),
 				attribute.String("messaging.message.id", event.ID()),
 			)
 
 			if publishSpan.SpanContext().IsValid() {
 				telemetry.SetTraceContext(&event, publishSpan)
 			}
 
 			// Publish to broker using configured topic
 			if err := s.publisher.Publish(publishCtx, topic, &event); err != nil {
 				// Record broker error
 				metrics.UpdateBrokerErrorsMetric(resourceType, resourceSelector, "publish_error")
 				s.logger.Errorf(eventCtx, "Failed to publish event resource_id=%s error=%v", resource.ID, err)
+				publishSpan.End()
+				evalSpan.End()
 				continue
 			}
 
 			publishSpan.End()

Alternatively, a cleaner approach would be to use a closure or restructure to use defer-like patterns within the loop body.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/sentinel/sentinel.go` around lines 113 - 187, The spans started by
telemetry.StartSpan (evalSpan and publishSpan) are leaked on early continue
paths (failures in event.SetData and s.publisher.Publish); fix by ensuring each
span is ended on all control paths—either call defer evalSpan.End() immediately
after creating evalSpan and defer publishSpan.End() right after creating
publishSpan (so they run even on continue), or explicitly call publishSpan.End()
and evalSpan.End() just before every continue in the error branches (references:
telemetry.StartSpan, evalSpan, publishSpan, event.SetData, s.publisher.Publish).

🧹 Nitpick comments (4)

internal/sentinel/sentinel_test.go (1)
444-448: Global tracer provider is set but never restored after test.

otel.SetTracerProvider(tp) modifies the global state. While tests in the same package run sequentially by default, this could affect subsequent tests that rely on the global provider state (or lack thereof). Consider saving and restoring the previous provider.
Suggested pattern
 	otel.SetTracerProvider(tp)
+	// Optionally restore previous provider on cleanup
+	// prevTP := otel.GetTracerProvider()
+	// defer otel.SetTracerProvider(prevTP)
This is optional since tests currently run sequentially and the deferred tp.Shutdown will clean up the provider.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/sentinel/sentinel_test.go` around lines 444 - 448, The test sets a
global provider via trace.NewTracerProvider(...) and otel.SetTracerProvider(tp)
but doesn't restore the previous global provider; capture the current provider
with otel.GetTracerProvider() before calling otel.SetTracerProvider(tp), and
defer otel.SetTracerProvider(previousProvider) so global state is restored; keep
the existing deferred tp.Shutdown() to shut down the new provider.
internal/client/client_test.go (1)
796-797: Ignored w.Write error return value.

Line 797 ignores the error from w.Write(...), unlike the rest of the test file which consistently handles write errors.
Suggested fix
-		w.Write([]byte(`{"error": "internal server error"}`))
+		if _, err := w.Write([]byte(`{"error": "internal server error"}`)); err != nil {
+			t.Logf("Error writing response: %v", err)
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/client/client_test.go` around lines 796 - 797, The test HTTP handler
ignores the error returned from w.Write(...) at the end of the handler; update
the handler to check the write error and fail the test if it occurs.
Specifically, replace the bare w.Write(...) call following
w.WriteHeader(http.StatusInternalServerError) with a checked write (capture the
return error from w.Write and call t.Fatalf or t.Errorf with the error), using
the same testing.T variable used elsewhere in the file so failures are reported
consistently.
test/integration/integration_test.go (1)
459-464: InitTraceProvider with stdout exporter will write span output to stdout during test runs.

Since no OTEL_EXPORTER_OTLP_ENDPOINT env var is set in the test, InitTraceProvider falls back to the stdout exporter, which will print formatted span data to stdout during test execution. This is noisy but functional. If this is intentional for debugging, it's fine; otherwise consider using an in-memory exporter as done in TestIntegration_EndToEndSpanHierarchy.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/integration_test.go` around lines 459 - 464, The test
currently calls telemetry.InitTraceProvider(ctx, "sentinel", "test", 1.0) which
falls back to the stdout exporter and prints spans to stdout; change the test to
use the in-memory exporter instead of stdout (or set
OTEL_EXPORTER_OTLP_ENDPOINT) by invoking the InitTraceProvider variant or option
the project uses for in-memory tracing (match the pattern used in
TestIntegration_EndToEndSpanHierarchy), and keep the deferred
telemetry.Shutdown(tp) call; update the call site in integration_test.go to use
that in-memory configuration so spans are not emitted to stdout during test
runs.
pkg/logger/logger.go (1)
46-50: Consider validating SamplingRate range.

SamplingRate is a float64 with no validation at the config level. While InitTraceProvider in pkg/telemetry/otel.go handles edge cases (≥1.0 → AlwaysSample, ≤0.0 → NeverSample), a negative or >1.0 value could be confusing to operators. Consider documenting the valid range or adding a validation method on OTelConfig.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/logger/logger.go` around lines 46 - 50, Add explicit validation for
OTelConfig.SamplingRate by implementing a validation method (e.g., func (c
*OTelConfig) Validate() error) that ensures SamplingRate is within the inclusive
range 0.0–1.0 and returns a clear error if out of range; reference the
OTelConfig struct and call this Validate method where configuration is loaded or
before InitTraceProvider is invoked so invalid values error early (include a
helpful message indicating the valid range).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/sentinel/main.go`:
- Around line 257-262: The telemetry shutdown is using the same shutdownCtx as
metricsServer.Shutdown and the log call uses the already-cancelled ctx; create a
separate context with its own timeout for telemetry (e.g., teleCtx via
context.WithTimeout) and call telemetry.Shutdown(teleCtx, tp) so it has its own
budget independent of metricsServer.Shutdown, ensure you cancel the teleCtx
after use, and change the error log invocation (log.Extra(...).Error) to use the
telemetry context (teleCtx) or the appropriate non-cancelled shutdown context so
the log is not emitted with a canceled ctx; reference telemetry.Shutdown,
metricsServer.Shutdown, shutdownCtx, tp, and log.Extra to locate and update the
code.
- Around line 153-169: The OTel settings never get set because initLogging uses
DefaultConfig() which leaves LogConfig.OTel.Enabled false; update initLogging to
populate logCfg.OTel.Enabled and logCfg.OTel.SamplingRate from configuration
sources (either the loaded SentinelConfig/YAML or environment variables) so the
check in main.go that inspects logCfg.OTel.Enabled can actually enable tracing;
specifically, within initLogging (and where config.LoadConfig/SentinelConfig is
processed) read an OTEL_ENABLED (boolean) env var or map the YAML fields into
LogConfig.OTel (including SamplingRate) and assign them to logCfg before main.go
calls telemetry.InitTraceProvider (referencing symbols: initLogging, LogConfig,
DefaultConfig, logCfg.OTel.Enabled, logCfg.OTel.SamplingRate,
config.LoadConfig).

In `@internal/sentinel/sentinel.go`:
- Around line 149-150: The attribute.String("messaging.system", "gcp_pubsub") is
hardcoded but should come from configuration; change the code that builds the
span/attributes (the attribute.String calls around
messaging.operation.type/publish in sentinel.go) to read the messaging system
from a configuration value or environment variable (e.g., cfg.MessagingSystem or
os.Getenv("MESSAGING_SYSTEM")) and use that value when constructing
attribute.String("messaging.system", <configuredValue>), with a sensible default
(e.g., "rabbitmq") if unset.

In `@pkg/telemetry/otel_test.go`:
- Around line 308-326: TestSetTraceContext_InvalidSpan currently ends a real
span which doesn't make SpanContext invalid; replace creation of the span with a
no-op/non-recording span (e.g., obtain span via
trace.SpanFromContext(context.Background())) so span.SpanContext().IsValid() is
false and the !IsValid() branch in SetTraceContext is exercised, then call
SetTraceContext(&event, span) and assert that no "traceparent" extension was
added to event.Extensions(); update imports accordingly to reference
go.opentelemetry.io/otel/trace if needed.
- Around line 42-77: TestInitTraceProvider_OTLPExporter currently points
OTEL_EXPORTER_OTLP_ENDPOINT at a non-existent collector and then calls Shutdown
which blocks/fails; fix by making the test use a local httptest server that
responds 200 to POST /v1/traces and set OTEL_EXPORTER_OTLP_ENDPOINT to that
server's URL (server.URL + "/v1/traces"), so InitTraceProvider, tracer use, and
the deferred Shutdown call complete reliably; update the test to start
httptest.NewServer before setting the env var and close the server at the end,
referencing TestInitTraceProvider_OTLPExporter, InitTraceProvider, and Shutdown
in your changes.
- Around line 65-70: Deferred cleanup closures calling Shutdown currently use
t.Fatal which is unsafe inside defers; change those deferred anonymous functions
(the ones invoking Shutdown(ctx, tp) and similar cleanup calls) to call t.Errorf
with the error when Shutdown returns non-nil instead of t.Fatal/t.Fatalf, so the
test reports the cleanup failure without invoking Fatal after the test body has
returned. Locate the three other deferred closures in this file that call
Shutdown or similar cleanup (same pattern as the closure around Shutdown at
lines noted in the review) and replace their t.Fatal/t.Fatalf calls with
t.Errorf while keeping the rest of the defer logic unchanged.

In `@pkg/telemetry/otel.go`:
- Line 117: Fix the typo in the doc comment for StartSpan: change "conext" to
"context" in the comment that reads "// StartSpan starts a span and enriches
conext with trace/span IDs for logging" so it correctly reads "// StartSpan
starts a span and enriches context with trace/span IDs for logging"; locate the
comment above the StartSpan function in otel.go and update the word only.
- Around line 22-29: The listed package-level string literals
(SAMPLER_ALWAYS_ON, SAMPLER_ALWAYS_OFF, SAMPLER_TRACE_ID_RATIO,
ENV_OTEL_TRACES_SAMPLER, ENV_OTEL_TRACES_SAMPLER_ARG,
ENV_OTEL_EXPORTER_OTLP_ENDPOINT) should be declared as constants instead of
variables; replace the var (...) block in pkg/telemetry/otel.go with a const
(...) block so these identifiers are immutable and cannot be reassigned
elsewhere in the package.
- Around line 139-145: The traceparent being set in SetTraceContext incorrectly
hardcodes the trace-flags byte to "01"; update SetTraceContext to read the
actual flags from span.SpanContext().TraceFlags() and format that byte as two
hex characters when building traceParent (keep the existing TraceID and SpanID
usage and the IsValid() guard). Use the span context's TraceFlags value
(converted to a uint8 and formatted with "%02x") instead of the fixed "-01" so
the traceparent accurately reflects whether the span was sampled.
- Around line 41-46: The code incorrectly passes a full URL from
ENV_OTEL_EXPORTER_OTLP_ENDPOINT into otlptracehttp.WithEndpoint (which expects
host:port), causing mis-parsing; fix by either removing the manual env var
handling and simply calling otlptracehttp.New(ctx) so the SDK reads
OTEL_EXPORTER_OTLP_ENDPOINT itself, or if you must pass it explicitly, replace
otlptracehttp.WithEndpoint(otlpEndpoint) with
otlptracehttp.WithEndpointURL(parsedURL) (use the URL form) and ensure
otlpEndpoint is a properly parsed *url.URL; update error logging in the
otlptracehttp.New/creation branch (log.Errorf) accordingly and remove the
redundant env read of ENV_OTEL_EXPORTER_OTLP_ENDPOINT.

In `@test/integration/integration_test.go`:
- Around line 816-817: The loop using pollSpans[:2] can panic if fewer than 2
spans were captured; update the iteration to safely bound the slice by computing
n := len(pollSpans) and if n > 2 set n = 2, then iterate over pollSpans[:n] (or
otherwise iterate up to min(2, len(pollSpans))). Apply this change around the
existing pollSpans variable and the for _, pollSpan := range pollSpans[:2] loop
so the test never slices beyond the available elements.

---

Outside diff comments:
In `@internal/sentinel/sentinel.go`:
- Around line 113-187: The spans started by telemetry.StartSpan (evalSpan and
publishSpan) are leaked on early continue paths (failures in event.SetData and
s.publisher.Publish); fix by ensuring each span is ended on all control
paths—either call defer evalSpan.End() immediately after creating evalSpan and
defer publishSpan.End() right after creating publishSpan (so they run even on
continue), or explicitly call publishSpan.End() and evalSpan.End() just before
every continue in the error branches (references: telemetry.StartSpan, evalSpan,
publishSpan, event.SetData, s.publisher.Publish).

---

Nitpick comments:
In `@internal/client/client_test.go`:
- Around line 796-797: The test HTTP handler ignores the error returned from
w.Write(...) at the end of the handler; update the handler to check the write
error and fail the test if it occurs. Specifically, replace the bare
w.Write(...) call following w.WriteHeader(http.StatusInternalServerError) with a
checked write (capture the return error from w.Write and call t.Fatalf or
t.Errorf with the error), using the same testing.T variable used elsewhere in
the file so failures are reported consistently.

In `@internal/sentinel/sentinel_test.go`:
- Around line 444-448: The test sets a global provider via
trace.NewTracerProvider(...) and otel.SetTracerProvider(tp) but doesn't restore
the previous global provider; capture the current provider with
otel.GetTracerProvider() before calling otel.SetTracerProvider(tp), and defer
otel.SetTracerProvider(previousProvider) so global state is restored; keep the
existing deferred tp.Shutdown() to shut down the new provider.

In `@pkg/logger/logger.go`:
- Around line 46-50: Add explicit validation for OTelConfig.SamplingRate by
implementing a validation method (e.g., func (c *OTelConfig) Validate() error)
that ensures SamplingRate is within the inclusive range 0.0–1.0 and returns a
clear error if out of range; reference the OTelConfig struct and call this
Validate method where configuration is loaded or before InitTraceProvider is
invoked so invalid values error early (include a helpful message indicating the
valid range).

In `@test/integration/integration_test.go`:
- Around line 459-464: The test currently calls telemetry.InitTraceProvider(ctx,
"sentinel", "test", 1.0) which falls back to the stdout exporter and prints
spans to stdout; change the test to use the in-memory exporter instead of stdout
(or set OTEL_EXPORTER_OTLP_ENDPOINT) by invoking the InitTraceProvider variant
or option the project uses for in-memory tracing (match the pattern used in
TestIntegration_EndToEndSpanHierarchy), and keep the deferred
telemetry.Shutdown(tp) call; update the call site in integration_test.go to use
that in-memory configuration so spans are not emitted to stdout during test
runs.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09a1a58 and dae968d.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (10)

cmd/sentinel/main.go
go.mod
internal/client/client.go
internal/client/client_test.go
internal/sentinel/sentinel.go
internal/sentinel/sentinel_test.go
pkg/logger/logger.go
pkg/telemetry/otel.go
pkg/telemetry/otel_test.go
test/integration/integration_test.go

cmd/sentinel/main.go

internal/sentinel/sentinel.go

pkg/telemetry/otel_test.go

pkg/telemetry/otel.go

test/integration/integration_test.go

coderabbitai

♻️ Duplicate comments (2)

cmd/sentinel/main.go (2)
160-176: ⚠️ Potential issue | 🔴 Critical

logCfg.OTel.Enabled is always false — OTel initialization is unreachable dead code.

initLogging constructs LogConfig via logger.DefaultConfig() (which sets OTel.Enabled: false) and only updates Level, Format, and Output. Neither OTEL_ENABLED nor OTEL_SAMPLING_RATE are ever read and applied to logCfg.OTel. As a result, the branch at Line 166 is never taken and tracing is never activated.

This was flagged in a previous review; the fix (reading OTEL_ENABLED/OTEL_SAMPLING_RATE from env vars in initLogging) has not been applied yet.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sentinel/main.go` around lines 160 - 176, The OTel branch is never
reached because initLogging uses logger.DefaultConfig() which leaves
logCfg.OTel.Enabled false and never reads OTEL_ENABLED/OTEL_SAMPLING_RATE;
update initLogging to read the OTEL_ENABLED and OTEL_SAMPLING_RATE environment
variables and apply them to the returned LogConfig (i.e., set
logCfg.OTel.Enabled based on OTEL_ENABLED parsing and set
logCfg.OTel.SamplingRate from OTEL_SAMPLING_RATE parsing), ensuring any parsing
errors are handled sensibly so the downstream check of logCfg.OTel.Enabled in
main.go will correctly enable telemetry; reference initLogging,
logger.DefaultConfig(), LogConfig and logCfg.OTel.Enabled/SamplingRate in your
changes.
290-295: ⚠️ Potential issue | 🟠 Major

OTel shutdown reuses the shared shutdownCtx, and the error log uses an already-cancelled ctx.

Two pre-existing issues remain unaddressed:

telemetry.Shutdown shares the same shutdownCtx (20 s) already partially consumed by healthServer.Shutdown + metricsServer.Shutdown. If those are slow, the OTel flush budget is eroded.

Line 293 passes the root ctx (cancelled at Line 279) to log.Extra(...).Error(...), so the log call is issued with a cancelled context.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sentinel/main.go` around lines 290 - 295, The OTel shutdown call reuses
the shared shutdownCtx and logs using the already-cancelled root ctx; change
telemetry.Shutdown(shutdownCtx, tp) to use its own fresh context and ensure the
log call uses a live context: create a new context with timeout (e.g., otelCtx,
otelCancel := context.WithTimeout(context.Background(), 20*time.Second)) and
defer otelCancel(), call telemetry.Shutdown(otelCtx, tp), and pass a
non-cancelled context (e.g., otelCtx or context.Background()) to
log.Extra(...).Error(...) so the shutdown has its own budget and the error is
logged with an active context.

🧹 Nitpick comments (1)

cmd/sentinel/main.go (1)
30-33: serviceName is a runtime-configurable value mixed into the build-info var block.

version, commit, and date are set by ldflags at build time and are effectively constants, while serviceName is mutated at runtime inside runServe (Line 162). Keeping them in the same block is misleading. Consider making serviceName a local variable in runServe or, if it needs to be the default value, a separate const.
♻️ Suggested refactor
 var (
 	version = "dev"
 	commit  = "none"
 	date    = "unknown"
-	serviceName = "sentinel"
 )
Then in runServe:
+	serviceName := "sentinel"
 	if envServiceName := os.Getenv("OTEL_SERVICE_NAME"); envServiceName != "" {
 		serviceName = envServiceName
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/sentinel/main.go` around lines 30 - 33, The serviceName variable should
not live in the same build-info var block as the ldflags-populated constants
(version, commit, date); move serviceName out of that var block and make it
either a local variable inside runServe (referencing runServe) if it is mutated
at runtime, or make it a separate const/var with a default value distinct from
version/commit/date if you need a package-level default. Update any references
to serviceName accordingly (e.g., where runServe mutates it) so you no longer
mix runtime-mutable state with the ldflags-injected build constants.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cmd/sentinel/main.go`:
- Around line 160-176: The OTel branch is never reached because initLogging uses
logger.DefaultConfig() which leaves logCfg.OTel.Enabled false and never reads
OTEL_ENABLED/OTEL_SAMPLING_RATE; update initLogging to read the OTEL_ENABLED and
OTEL_SAMPLING_RATE environment variables and apply them to the returned
LogConfig (i.e., set logCfg.OTel.Enabled based on OTEL_ENABLED parsing and set
logCfg.OTel.SamplingRate from OTEL_SAMPLING_RATE parsing), ensuring any parsing
errors are handled sensibly so the downstream check of logCfg.OTel.Enabled in
main.go will correctly enable telemetry; reference initLogging,
logger.DefaultConfig(), LogConfig and logCfg.OTel.Enabled/SamplingRate in your
changes.
- Around line 290-295: The OTel shutdown call reuses the shared shutdownCtx and
logs using the already-cancelled root ctx; change
telemetry.Shutdown(shutdownCtx, tp) to use its own fresh context and ensure the
log call uses a live context: create a new context with timeout (e.g., otelCtx,
otelCancel := context.WithTimeout(context.Background(), 20*time.Second)) and
defer otelCancel(), call telemetry.Shutdown(otelCtx, tp), and pass a
non-cancelled context (e.g., otelCtx or context.Background()) to
log.Extra(...).Error(...) so the shutdown has its own budget and the error is
logged with an active context.

---

Nitpick comments:
In `@cmd/sentinel/main.go`:
- Around line 30-33: The serviceName variable should not live in the same
build-info var block as the ldflags-populated constants (version, commit, date);
move serviceName out of that var block and make it either a local variable
inside runServe (referencing runServe) if it is mutated at runtime, or make it a
separate const/var with a default value distinct from version/commit/date if you
need a package-level default. Update any references to serviceName accordingly
(e.g., where runServe mutates it) so you no longer mix runtime-mutable state
with the ldflags-injected build constants.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dae968d and 3844006.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (2)

cmd/sentinel/main.go
go.mod

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/client/client_test.go`:
- Line 804: The test currently ignores the error returned by
NewHyperFleetClient; update the test to capture both return values (client, err
:= NewHyperFleetClient(...)) and assert/fail if err != nil (e.g., t.Fatalf or
require.NoError) before using client so any constructor failures surface
immediately; ensure you also handle the client variable in subsequent
assertions.
- Around line 795-800: In the httptest server handler (the anonymous
http.HandlerFunc passed to httptest.NewServer) replace the unsafe t.Fatalf call
with t.Errorf and then return to avoid invoking FailNow from a handler
goroutine; specifically change the error handling after w.Write to call
t.Errorf("Failed to write response: %v", err) followed by return instead of
t.Fatalf.
- Around line 670-676: Capture the previous global tracer provider before
calling otel.SetTracerProvider(tp) in both
TestNewHyperFleetClient_HTTPInstrumentation and
TestNewHyperFleetClient_HTTPInstrumentation_ErrorCase (e.g., prevTp :=
otel.GetTracerProvider()), then in the defer restore it (e.g., defer
otel.SetTracerProvider(prevTp)) and still shutdown tp as before; ensure you
reference the local tp shutdown logic (tp.Shutdown(ctx)) but add restoring the
previous provider to avoid test-state leakage.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3844006 and 5fb7281.

📒 Files selected for processing (1)

internal/client/client_test.go

internal/client/client_test.go

pkg/telemetry/otel.go

cmd/sentinel/main.go

internal/sentinel/sentinel.go

…OTel traceparent function and tests

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (3)

pkg/telemetry/otel.go (1)

93-93: ⚠️ Potential issue | 🟠 Major

Prefer parent-based ratio sampling to preserve incoming trace decisions.

Using trace.TraceIDRatioBased directly can resample child spans independently. Wrap it with trace.ParentBased(...) for distributed trace continuity.

💡 Proposed fix

-		sampler = trace.TraceIDRatioBased(samplingRate)
+		sampler = trace.ParentBased(trace.TraceIDRatioBased(samplingRate))

OpenTelemetry Go: For services receiving remote parent context, is ParentBased(TraceIDRatioBased(x)) recommended over TraceIDRatioBased(x) to preserve parent sampling decisions?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/otel.go` at line 93, Replace the direct TraceIDRatioBased
sampler assignment so child spans respect remote parent sampling: instead of
assigning sampler = trace.TraceIDRatioBased(samplingRate) wrap it with
trace.ParentBased(trace.TraceIDRatioBased(samplingRate)) (use the same
samplingRate variable) so parent-based sampling is preserved for incoming trace
contexts.

cmd/sentinel/main.go (2)

34-34: ⚠️ Potential issue | 🟡 Minor

Set the default OTel service name to hyperfleet-sentinel.

Keeping the default as sentinel can fragment service identity in tracing backends unless explicitly overridden.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/sentinel/main.go` at line 34, The default OTel serviceName value
currently set as serviceName = "sentinel" should be changed to
"hyperfleet-sentinel" to standardize tracing identity; update the assignment of
the serviceName variable in cmd/sentinel/main.go (the serviceName symbol) to use
"hyperfleet-sentinel" as the default so traces are attributed consistently
unless explicitly overridden.

301-305: ⚠️ Potential issue | 🟡 Minor

Use the telemetry shutdown context when logging shutdown errors.

The error log still uses ctx instead of otelShutdownCtx, which may already be canceled during shutdown.

💡 Proposed fix

-			if err := telemetry.Shutdown(otelShutdownCtx, tp); err != nil {
-				log.Extra("error", err).Error(ctx, "Failed to shutdown OpenTelemetry")
+			if err := telemetry.Shutdown(otelShutdownCtx, tp); err != nil {
+				log.Extra("error", err).Error(otelShutdownCtx, "Failed to shutdown OpenTelemetry")
 			}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/sentinel/main.go` around lines 301 - 305, The shutdown logging currently
uses the outer request context variable ctx which may be canceled; change the
log call to use the telemetry shutdown context instead: when calling
telemetry.Shutdown(otelShutdownCtx, tp) and it returns an error, pass
otelShutdownCtx into log.Extra(...).Error(...) (keeping the existing
log.Extra("error", err) and message) so the failure is logged with the
shutdown-specific context created by context.WithTimeout
(otelShutdownCtx/otelShutdownCancel).

🧹 Nitpick comments (1)

test/integration/integration_test.go (1)

902-909: Trace continuity assertion is too broad and can become flaky.

This currently counts trace IDs from all captured spans. If unrelated root spans are emitted, the check can fail even when poll/evaluate/publish hierarchy is correct. Prefer deriving continuity from poll traces (or poll descendants) only.

Suggested tightening

- // Verify trace continuity - spans should form coherent traces
- traceIDs := make(map[string]bool)
- for _, span := range spans {
- 	traceIDs[span.SpanContext.TraceID().String()] = true
- }
-
- if len(traceIDs) > len(pollSpans) {
- 	t.Errorf("Expected spans to belong to %d traces (one per poll), but found %d trace IDs", len(pollSpans), len(traceIDs))
- }
+ // Verify trace continuity for poll roots specifically (one trace per poll span)
+ pollTraceIDs := make(map[string]struct{})
+ for _, pollSpan := range pollSpans {
+ 	pollTraceIDs[pollSpan.SpanContext.TraceID().String()] = struct{}{}
+ }
+ if len(pollTraceIDs) != len(pollSpans) {
+ 	t.Errorf("Expected one unique trace per poll span, got %d unique traces for %d poll spans", len(pollTraceIDs), len(pollSpans))
+ }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/integration/integration_test.go` around lines 902 - 909, The current
check counts trace IDs from all captured spans which can include unrelated roots
and become flaky; change it to derive continuity only from the poll traces by
first building a set of expectedTraceIDs from pollSpans (use each
pollSpan.SpanContext.TraceID().String()) and then only collect/count trace IDs
from spans whose TraceID is in that expectedTraceIDs set (i.e., filter spans by
membership in the poll trace set) before asserting the number of distinct
poll-related traces matches len(pollSpans).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/config/config.go`:
- Around line 110-112: The code currently assigns MESSAGING_SYSTEM even when the
environment variable is present but empty; update the block that reads
os.LookupEnv("MESSAGING_SYSTEM") so it trims and checks the value is non-empty
before assigning to cfg.MessagingSystem (e.g., use strings.TrimSpace on the
looked-up value and only set cfg.MessagingSystem when the trimmed value != "");
this avoids clearing defaults or emitting empty messaging.system attributes.

In `@internal/sentinel/sentinel.go`:
- Around line 114-118: The telemetry spans created via telemetry.StartSpan
(e.g., evalSpan and publishSpan in the sentinel.evaluate flow) are not ended on
error paths that call continue; update each error branch that returns/continues
after calling resource.SetData or publisher.Publish to call the corresponding
span.End() (evalSpan.End() and, if publishSpan was created, publishSpan.End())
before continuing so all spans are closed; search for usages of SetData and
Publish within the sentinel evaluation loop (the branches around evalSpan and
publishSpan creation) and add the missing End() calls on every error path.

In `@pkg/telemetry/otel_test.go`:
- Around line 84-104: The test calls Shutdown twice (once in the deferred
anonymous func and once explicitly), causing flakiness; update
TestInitTraceProvider_OTLPExporter to call Shutdown only once by removing the
redundant call: either eliminate the explicit err = Shutdown(shutdownCtx, tp)
block (keeping the defer) or remove the defer and keep the explicit shutdown,
ensuring only a single call to Shutdown(ctx, tp) on the trace provider
(reference Shutdown and the deferred anonymous func that wraps Shutdown).

In `@test/integration/integration_test.go`:
- Around line 717-718: The test currently ignores the error returned by
client.NewHyperFleetClient; modify the setup to capture the error (e.g.,
hyperfleetClient, err := client.NewHyperFleetClient(...)) and fail fast if it is
non-nil by calling t.Fatalf or using the test assertion helper (e.g.,
require.NoError) so setup failures are explicit; update the code around
NewHyperFleetClient and any test setup that depends on hyperfleetClient to use
the checked variable.
- Around line 675-682: The test sets a new global OTEL tracer provider with
otel.SetTracerProvider(tp) but never restores the previous global provider,
leaking state to other tests; capture the existing provider via
otel.GetTracerProvider() before calling otel.SetTracerProvider(tp) and in the
deferred cleanup restore it by calling otel.SetTracerProvider(prevProvider) (and
then perform tp.Shutdown(cleanupCtx) as currently implemented), referencing
otel.SetTracerProvider, otel.GetTracerProvider, tp, and tp.Shutdown to locate
where to add the save/restore logic.

---

Duplicate comments:
In `@cmd/sentinel/main.go`:
- Line 34: The default OTel serviceName value currently set as serviceName =
"sentinel" should be changed to "hyperfleet-sentinel" to standardize tracing
identity; update the assignment of the serviceName variable in
cmd/sentinel/main.go (the serviceName symbol) to use "hyperfleet-sentinel" as
the default so traces are attributed consistently unless explicitly overridden.
- Around line 301-305: The shutdown logging currently uses the outer request
context variable ctx which may be canceled; change the log call to use the
telemetry shutdown context instead: when calling
telemetry.Shutdown(otelShutdownCtx, tp) and it returns an error, pass
otelShutdownCtx into log.Extra(...).Error(...) (keeping the existing
log.Extra("error", err) and message) so the failure is logged with the
shutdown-specific context created by context.WithTimeout
(otelShutdownCtx/otelShutdownCancel).

In `@pkg/telemetry/otel.go`:
- Line 93: Replace the direct TraceIDRatioBased sampler assignment so child
spans respect remote parent sampling: instead of assigning sampler =
trace.TraceIDRatioBased(samplingRate) wrap it with
trace.ParentBased(trace.TraceIDRatioBased(samplingRate)) (use the same
samplingRate variable) so parent-based sampling is preserved for incoming trace
contexts.

---

Nitpick comments:
In `@test/integration/integration_test.go`:
- Around line 902-909: The current check counts trace IDs from all captured
spans which can include unrelated roots and become flaky; change it to derive
continuity only from the poll traces by first building a set of expectedTraceIDs
from pollSpans (use each pollSpan.SpanContext.TraceID().String()) and then only
collect/count trace IDs from spans whose TraceID is in that expectedTraceIDs set
(i.e., filter spans by membership in the poll trace set) before asserting the
number of distinct poll-related traces matches len(pollSpans).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5fb7281 and 2d3f335.

📒 Files selected for processing (10)

cmd/sentinel/main.go
configs/dev-example.yaml
configs/gcp-pubsub-example.yaml
configs/rabbitmq-example.yaml
internal/client/client_test.go
internal/config/config.go
internal/sentinel/sentinel.go
pkg/telemetry/otel.go
pkg/telemetry/otel_test.go
test/integration/integration_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/client/client_test.go

internal/config/config.go

internal/sentinel/sentinel.go

pkg/telemetry/otel_test.go

test/integration/integration_test.go

… tracing standardization

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

cmd/sentinel/main.go (1)

301-305: ⚠️ Potential issue | 🟡 Minor

Use otelShutdownCtx for the shutdown error log context.

Line 304 logs with ctx even though telemetry shutdown is executed under otelShutdownCtx.

💡 Proposed fix

 		if tp != nil {
 			otelShutdownCtx, otelShutdownCancel := context.WithTimeout(context.Background(), 5*time.Second)
 			defer otelShutdownCancel()
 			if err := telemetry.Shutdown(otelShutdownCtx, tp); err != nil {
-				log.Extra("error", err).Error(ctx, "Failed to shutdown OpenTelemetry")
+				log.Extra("error", err).Error(otelShutdownCtx, "Failed to shutdown OpenTelemetry")
 			}
 		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/sentinel/main.go` around lines 301 - 305, The shutdown error log is using
the wrong context; change the log call to use otelShutdownCtx (created by
context.WithTimeout) instead of ctx so the telemetry.Shutdown error is logged
with the same context it runs under: locate otelShutdownCtx and
otelShutdownCancel and update the log.Extra(...).Error(...) invocation that
follows telemetry.Shutdown(otelShutdownCtx, tp) to pass otelShutdownCtx as the
context argument.

internal/sentinel/sentinel.go (1)

134-143: ⚠️ Potential issue | 🔴 Critical

End and mark evalSpan on SetData failure before continue.

On Line 142, continue skips Line 192, so evalSpan leaks on this error path and the span is not marked failed.

🐛 Proposed fix

 			if err := event.SetData(cloudevents.ApplicationJSON, map[string]interface{}{
 				"kind":       resource.Kind,
 				"id":         resource.ID,
 				"generation": resource.Generation,
 				"href":       resource.Href,
 				"reason":     decision.Reason,
 			}); err != nil {
+				evalSpan.RecordError(err)
+				evalSpan.SetStatus(codes.Error, "set event data failed")
 				s.logger.Errorf(eventCtx, "Failed to set event data resource_id=%s error=%v", resource.ID, err)
+				evalSpan.End()
 				continue
 			}

Also applies to: 192-192

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/sentinel/sentinel.go` around lines 134 - 143, When
event.SetData(...) fails in the block around event.SetData, ensure you mark and
end the `evalSpan` before returning/continuing: after logging the error in the
SetData error branch, record the error on `evalSpan` (e.g., RecordError /
SetStatus to error with the err message) and call `evalSpan.End()` prior to the
`continue` so the span is not leaked; apply the same fix to the other identical
error path near the later SetData call (the occurrence referenced at line ~192).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/sentinel/main.go`:
- Around line 151-155: The code currently ignores parse errors from
strconv.ParseBool for TRACING_ENABLED; modify the block around
otelEnabled/strconv.ParseBool so that when err != nil you either log a clear
error and exit (fail fast) or at minimum emit a warning to stderr/logger
indicating the invalid value and that tracing will remain at its default, e.g.
using log.Printf or fmt.Fprintf(os.Stderr, ...), referencing the TRACING_ENABLED
env var, the strconv.ParseBool call, and cfg.OTel.Enabled to show
current/default state; ensure successful ParseBool still sets cfg.OTel.Enabled
as before.

In `@pkg/telemetry/otel.go`:
- Around line 77-80: The code silently ignores strconv.ParseFloat errors when
reading ENV_OTEL_TRACES_SAMPLER_ARG; update the parsing branch (the block
referencing ENV_OTEL_TRACES_SAMPLER_ARG and samplingRate) to detect a parse
error and handle it explicitly by logging a clear warning or returning an error
that includes the environment variable name and the invalid value, and only
assign samplingRate on successful parse; ensure the log/returned message
references ENV_OTEL_TRACES_SAMPLER_ARG and the raw arg so misconfiguration is
visible.

---

Duplicate comments:
In `@cmd/sentinel/main.go`:
- Around line 301-305: The shutdown error log is using the wrong context; change
the log call to use otelShutdownCtx (created by context.WithTimeout) instead of
ctx so the telemetry.Shutdown error is logged with the same context it runs
under: locate otelShutdownCtx and otelShutdownCancel and update the
log.Extra(...).Error(...) invocation that follows
telemetry.Shutdown(otelShutdownCtx, tp) to pass otelShutdownCtx as the context
argument.

In `@internal/sentinel/sentinel.go`:
- Around line 134-143: When event.SetData(...) fails in the block around
event.SetData, ensure you mark and end the `evalSpan` before
returning/continuing: after logging the error in the SetData error branch,
record the error on `evalSpan` (e.g., RecordError / SetStatus to error with the
err message) and call `evalSpan.End()` prior to the `continue` so the span is
not leaked; apply the same fix to the other identical error path near the later
SetData call (the occurrence referenced at line ~192).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d3f335 and b3a9ae1.

📒 Files selected for processing (3)

cmd/sentinel/main.go
internal/sentinel/sentinel.go
pkg/telemetry/otel.go

cmd/sentinel/main.go

pkg/telemetry/otel.go

tirthct · 2026-02-26T15:35:28Z

/retest

…rect context

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

pkg/telemetry/otel.go (1)
87-96: ParentBased not applied for AlwaysSample and NeverSample cases.

When samplingRate >= 1.0 or <= 0.0, the sampler is set directly to AlwaysSample() or NeverSample() without wrapping in ParentBased(). This means parent sampling decisions are ignored in these cases, potentially breaking trace continuity across service boundaries.

For consistency with the architecture standard that specifies parentbased_traceidratio as the default strategy, consider wrapping all samplers:
♻️ Proposed fix
 	var sampler trace.Sampler
 	switch {
 	case samplingRate >= 1.0:
-		sampler = trace.AlwaysSample() // Sample all
+		sampler = trace.ParentBased(trace.AlwaysSample())
 	case samplingRate <= 0.0:
-		sampler = trace.NeverSample() // Sample none
+		sampler = trace.ParentBased(trace.NeverSample())
 	default:
 		sampler = trace.ParentBased(trace.TraceIDRatioBased(samplingRate))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/otel.go` around lines 87 - 96, The sampler logic sets sampler
directly to trace.AlwaysSample() or trace.NeverSample() for samplingRate >= 1.0
or <= 0.0, which bypasses parent-based behavior; change those branches to wrap
the base samplers with trace.ParentBased(...) so all cases—including
trace.AlwaysSample(), trace.NeverSample() and
trace.TraceIDRatioBased(samplingRate)—are applied via trace.ParentBased, keeping
the existing switch and the sampler variable but replacing direct calls with
trace.ParentBased(trace.AlwaysSample()), trace.ParentBased(trace.NeverSample()),
and trace.ParentBased(trace.TraceIDRatioBased(samplingRate)).
pkg/telemetry/otel_test.go (1)
181-185: Test assertion for expectedSample=false cases may be misleading.

For always_off (line 117-121), the test expects expectedSample: false but only checks IsValid() when expectedSample is true. The span context remains valid even when not sampled — NeverSample still produces a valid SpanContext, it just won't be exported.

To properly verify sampling behavior, check span.SpanContext().IsSampled() instead:
💡 Suggested enhancement
-			if tt.expectedSample {
-				if !span.SpanContext().IsValid() {
-					t.Error("Expected valid span context for sampling=true")
-				}
+			if span.SpanContext().IsSampled() != tt.expectedSample {
+				t.Errorf("Expected IsSampled()=%v, got %v", tt.expectedSample, span.SpanContext().IsSampled())
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/otel_test.go` around lines 181 - 185, The test currently only
checks span.SpanContext().IsValid() when tt.expectedSample is true, which
doesn't verify sampling; update the assertion to use
span.SpanContext().IsSampled() (and assert true/false according to
tt.expectedSample) so cases like the "always_off" sampler correctly validate
sampling behavior; locate the table-driven test and replace the IsValid-based
conditional with an assertion on SpanContext().IsSampled() for the test case
keyed by tt.expectedSample in the test function in pkg/telemetry/otel_test.go.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/sentinel/sentinel_test.go`:
- Around line 449-454: The deferred shutdown of the tracer provider uses
t.Fatalf which runs after the test body and can mislead test results; change the
deferred call in the shutdown closure (the anonymous func that calls
tp.Shutdown(ctx) on the trace.TracerProvider instance `tp` using `ctx`) to use
t.Errorf to report errors instead of t.Fatalf, so cleanup failures are reported
without aborting after test completion.
- Line 467: The test currently ignores the error returned by
NewHyperFleetClient; change the setup to capture the error (e.g.,
hyperfleetClient, err := client.NewHyperFleetClient(server.URL, 10*time.Second))
and fail fast on error (e.g., t.Fatalf or require.NoError(t, err)) so setup
failures are reported immediately instead of surfacing as secondary errors later
when hyperfleetClient is used.

In `@pkg/telemetry/otel.go`:
- Around line 77-83: The current logic checks ENV_OTEL_TRACES_SAMPLER_ARG and
logs "Invalid" in the else branch when the env var is empty, but it fails to
warn on strconv.ParseFloat errors; update the logic in the block that reads
ENV_OTEL_TRACES_SAMPLER_ARG so that: if arg == "" do nothing (keep default
samplingRate), if arg != "" attempt strconv.ParseFloat and on err != nil call
log.Warnf with a clear message including ENV_OTEL_TRACES_SAMPLER_ARG and the bad
arg value and that the default samplingRate will be used, and only set
samplingRate when ParseFloat succeeds; adjust references to
ENV_OTEL_TRACES_SAMPLER_ARG, arg, strconv.ParseFloat, samplingRate, and
log.Warnf accordingly.

---

Nitpick comments:
In `@pkg/telemetry/otel_test.go`:
- Around line 181-185: The test currently only checks
span.SpanContext().IsValid() when tt.expectedSample is true, which doesn't
verify sampling; update the assertion to use span.SpanContext().IsSampled() (and
assert true/false according to tt.expectedSample) so cases like the "always_off"
sampler correctly validate sampling behavior; locate the table-driven test and
replace the IsValid-based conditional with an assertion on
SpanContext().IsSampled() for the test case keyed by tt.expectedSample in the
test function in pkg/telemetry/otel_test.go.

In `@pkg/telemetry/otel.go`:
- Around line 87-96: The sampler logic sets sampler directly to
trace.AlwaysSample() or trace.NeverSample() for samplingRate >= 1.0 or <= 0.0,
which bypasses parent-based behavior; change those branches to wrap the base
samplers with trace.ParentBased(...) so all cases—including
trace.AlwaysSample(), trace.NeverSample() and
trace.TraceIDRatioBased(samplingRate)—are applied via trace.ParentBased, keeping
the existing switch and the sampler variable but replacing direct calls with
trace.ParentBased(trace.AlwaysSample()), trace.ParentBased(trace.NeverSample()),
and trace.ParentBased(trace.TraceIDRatioBased(samplingRate)).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3a9ae1 and 8c2c2bb.

📒 Files selected for processing (8)

cmd/sentinel/main.go
internal/client/client_test.go
internal/config/config.go
internal/sentinel/sentinel.go
internal/sentinel/sentinel_test.go
pkg/telemetry/otel.go
pkg/telemetry/otel_test.go
test/integration/integration_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

internal/client/client_test.go
internal/config/config.go

internal/sentinel/sentinel_test.go

pkg/telemetry/otel.go

internal/sentinel/sentinel.go

…r check for new hyperfleet client

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/telemetry/otel.go`:
- Around line 142-149: Add nil/zero guards in SetTraceContext: return early if
event is nil or span is nil/invalid before calling span.SpanContext(); check
span != nil (or equivalent) and use span.SpanContext().IsValid() only after
those checks to avoid panics from dereferencing nil spans or nil events, and
keep the existing traceparent formatting logic (traceParent := fmt.Sprintf(...)
and event.SetExtension("traceparent", traceParent)) inside the guarded block so
they only run when both inputs are valid.
- Around line 41-43: The current code calls
otlptracehttp.WithEndpointURL(otlpEndpoint) (inside otlptracehttp.New), which
treats the value as a full URL and prevents the SDK from appending the
/v1/traces path; remove that call and either let the SDK read
OTEL_EXPORTER_OTLP_ENDPOINT itself (omit the WithEndpointURL option) or, if
specifying programmatically, replace WithEndpointURL with
otlptracehttp.WithEndpoint(...) using a host:port value (and keep the call site
in otlptracehttp.New).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c2c2bb and 31b40d6.

📒 Files selected for processing (3)

internal/sentinel/sentinel.go
internal/sentinel/sentinel_test.go
pkg/telemetry/otel.go

🚧 Files skipped from review as they are similar to previous changes (2)

internal/sentinel/sentinel_test.go
internal/sentinel/sentinel.go

pkg/telemetry/otel.go

… check in SetTraceContext()

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/telemetry/otel.go`:
- Around line 41-42: The exporter selection currently only checks
ENV_OTEL_EXPORTER_OTLP_ENDPOINT; update the gating logic to also consider
ENV_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT and prefer the traces-specific variable
when present so otlptracehttp.New is attempted for traces; modify the check
around otlptracehttp.New (references: ENV_OTEL_EXPORTER_OTLP_ENDPOINT,
ENV_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, otlptracehttp.New, exporter, err) to
look for the traces endpoint first and fall back to the generic OTLP endpoint
before defaulting to stdout.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31b40d6 and f30c9bf.

📒 Files selected for processing (1)

pkg/telemetry/otel.go

pkg/telemetry/otel.go

rafabene · 2026-03-05T20:15:01Z

test/integration/integration_test.go

+
+	// Mock server that returns clusters requiring reconciliation
+	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		callCount++


Priority: Bug

callCount at line 726 is a plain int incremented from the HTTP handler
goroutine (line 731) without synchronization — httptest.NewServer runs each
handler in its own goroutine, so this is a data race. Running with -race would
flag it.

You already use atomic.Int32 for the same pattern in otel_test.go:72, so just
apply it here too:

Suggested change

callCount++

var callCount atomic.Int32

Ack, added similar implementation as the otel_test

rafabene · 2026-03-05T20:15:15Z

test/integration/integration_test.go

+	// Mock server that returns clusters requiring reconciliation
+	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		callCount++
+		if callCount > 2 {


Suggested change

if callCount > 2 {

if callCount.Add(1) > 2 {

…on tests

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/sentinel/sentinel.go (1)
205-207: ⚠️ Potential issue | 🟡 Minor

Use evalCtx for skipped-resource logs too.

Line 206 rebuilds skipCtx from ctx, so skipped-resource logs lose the span_id that telemetry.StartSpan attached to evalCtx. Build skipCtx from evalCtx here, the same way the publish path uses eventCtx, so the sentinel.evaluate span stays correlated with its logs.
♻️ Proposed fix
-			skipCtx := logger.WithDecisionReason(ctx, decision.Reason)
+			skipCtx := logger.WithDecisionReason(evalCtx, decision.Reason)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/sentinel/sentinel.go` around lines 205 - 207, The skipped-resource
log is built from ctx which drops the telemetry span; change the context used
when creating skipCtx to evalCtx so the span started by telemetry.StartSpan
remains attached—i.e., where you call logger.WithDecisionReason(ctx,
decision.Reason) create skipCtx from evalCtx instead (same pattern used for
eventCtx in the publish path) so sentinel.evaluate span stays correlated with
skipped-resource logs.

♻️ Duplicate comments (3)

pkg/telemetry/otel_test.go (2)
325-334: ⚠️ Potential issue | 🟡 Minor

Guard the traceparent assertions before slicing.

If the length check fails, traceParentStr[:3] and traceParentStr[len(traceParentStr)-3:] can still panic and hide the real assertion failure. Use strings.HasPrefix/HasSuffix or return immediately after the length mismatch.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/otel_test.go` around lines 325 - 334, The test currently slices
traceParentStr after an unchecked length assertion which can panic; change the
checks so the length guard prevents further slicing—either call t.Fatalf or
return immediately if len(traceParentStr) != 55, or replace the prefix/suffix
slices with strings.HasPrefix(traceParentStr, "00-") and
strings.HasSuffix(traceParentStr, "-01") to avoid slicing altogether; update the
assertions that reference traceParentStr[:3] and
traceParentStr[len(traceParentStr)-3:] accordingly.
21-38: ⚠️ Potential issue | 🟠 Major

Restore the previous global OTel state in cleanup.

InitTraceProvider sets the process-wide tracer provider and propagator, but this cleanup only shuts tp down. That leaves later tests pointing at mutated global state, which makes failures order-dependent. Capture the previous provider/propagator before initialization and restore both in cleanup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/otel_test.go` around lines 21 - 38,
TestInitTraceProvider_StdoutExporter leaves global OTel state mutated; before
calling InitTraceProvider, capture the existing global tracer provider and text
map propagator (e.g. via otel.GetTracerProvider() and
otel.GetTextMapPropagator()), then after initialization ensure the deferred
cleanup not only calls Shutdown(ctx, tp) but also restores the captured global
provider and propagator (set them back with otel.SetTracerProvider(...) and
otel.SetTextMapPropagator(...)) so later tests aren’t affected.
internal/sentinel/sentinel_test.go (1)
646-656: ⚠️ Potential issue | 🟠 Major

Restore the previous global tracer provider after the test.

otel.SetTracerProvider(tp) mutates process-wide state, but the defer only shuts tp down. Later tests can inherit a closed provider and become order-dependent. Capture otel.GetTracerProvider() before the set and restore it in the cleanup closure.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/sentinel/sentinel_test.go` around lines 646 - 656, Capture the
existing global tracer provider before calling otel.SetTracerProvider(tp) (e.g.
oldTP := otel.GetTracerProvider()), then update the deferred cleanup to first
restore the global provider with otel.SetTracerProvider(oldTP) and then shut
down tp using the existing shutdown logic (use the same ctx), so the test does
not leave a closed provider as the global one; reference otel.SetTracerProvider,
otel.GetTracerProvider, tp and the deferred shutdown closure when making the
change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/telemetry/otel.go`:
- Around line 81-105: The sampler construction must select the OTEL sampler
based on samplerType instead of always wrapping in trace.ParentBased; update the
logic that currently uses samplingRate and trace.ParentBased(...) to a switch on
samplerType: use trace.AlwaysSample() for "always_on", trace.NeverSample() for
"always_off", trace.TraceIDRatioBased(samplingRate) for "traceidratio" (ensuring
samplingRate is parsed and in [0,1]), and
trace.ParentBased(trace.TraceIDRatioBased(samplingRate)) for
"parentbased_traceidratio" (also validating samplingRate); keep the existing env
parsing for samplingRate but apply it only to the types that need a rate, and
replace occurrences of trace.ParentBased(trace.AlwaysSample()) /
trace.ParentBased(trace.NeverSample()) /
trace.ParentBased(trace.TraceIDRatioBased(...)) with the correct direct samplers
referenced by samplerType, samplerType, samplingRate, trace.AlwaysSample,
trace.NeverSample, and trace.TraceIDRatioBased.

---

Outside diff comments:
In `@internal/sentinel/sentinel.go`:
- Around line 205-207: The skipped-resource log is built from ctx which drops
the telemetry span; change the context used when creating skipCtx to evalCtx so
the span started by telemetry.StartSpan remains attached—i.e., where you call
logger.WithDecisionReason(ctx, decision.Reason) create skipCtx from evalCtx
instead (same pattern used for eventCtx in the publish path) so
sentinel.evaluate span stays correlated with skipped-resource logs.

---

Duplicate comments:
In `@internal/sentinel/sentinel_test.go`:
- Around line 646-656: Capture the existing global tracer provider before
calling otel.SetTracerProvider(tp) (e.g. oldTP := otel.GetTracerProvider()),
then update the deferred cleanup to first restore the global provider with
otel.SetTracerProvider(oldTP) and then shut down tp using the existing shutdown
logic (use the same ctx), so the test does not leave a closed provider as the
global one; reference otel.SetTracerProvider, otel.GetTracerProvider, tp and the
deferred shutdown closure when making the change.

In `@pkg/telemetry/otel_test.go`:
- Around line 325-334: The test currently slices traceParentStr after an
unchecked length assertion which can panic; change the checks so the length
guard prevents further slicing—either call t.Fatalf or return immediately if
len(traceParentStr) != 55, or replace the prefix/suffix slices with
strings.HasPrefix(traceParentStr, "00-") and strings.HasSuffix(traceParentStr,
"-01") to avoid slicing altogether; update the assertions that reference
traceParentStr[:3] and traceParentStr[len(traceParentStr)-3:] accordingly.
- Around line 21-38: TestInitTraceProvider_StdoutExporter leaves global OTel
state mutated; before calling InitTraceProvider, capture the existing global
tracer provider and text map propagator (e.g. via otel.GetTracerProvider() and
otel.GetTextMapPropagator()), then after initialization ensure the deferred
cleanup not only calls Shutdown(ctx, tp) but also restores the captured global
provider and propagator (set them back with otel.SetTracerProvider(...) and
otel.SetTextMapPropagator(...)) so later tests aren’t affected.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6f58494c-e397-4fff-bb86-a3b3ccc4244f

📥 Commits

Reviewing files that changed from the base of the PR and between f641fca and cb7cbb0.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (14)

cmd/sentinel/main.go
configs/dev-example.yaml
configs/gcp-pubsub-example.yaml
configs/rabbitmq-example.yaml
go.mod
internal/client/client.go
internal/client/client_test.go
internal/config/config.go
internal/health/health.go
internal/sentinel/sentinel.go
internal/sentinel/sentinel_test.go
pkg/telemetry/otel.go
pkg/telemetry/otel_test.go
test/integration/integration_test.go

✅ Files skipped from review due to trivial changes (1)

internal/health/health.go

🚧 Files skipped from review as they are similar to previous changes (6)

internal/client/client.go
configs/gcp-pubsub-example.yaml
configs/dev-example.yaml
test/integration/integration_test.go
configs/rabbitmq-example.yaml
internal/config/config.go

pkg/telemetry/otel.go

internal/sentinel/sentinel.go

internal/sentinel/sentinel_test.go

rafabene · 2026-03-10T14:02:00Z

pkg/telemetry/otel.go

+	var samplingRate = defaultSamplingRate
+
+	log := logger.NewHyperFleetLogger()
+


The tracing standard specifies OTLP gRPC as the default export protocol
(OTEL_EXPORTER_OTLP_PROTOCOL defaults to grpc, code samples use otlptracegrpc,
Kubernetes examples use port 4317). This PR uses otlptracehttp instead (line
42), which defaults to port 4318.

If the Collector is configured per the standard
(otel-collector.observability.svc:4317), trace export will fail silently
because the HTTP client will try to talk HTTP to a gRPC port.

The JIRA ticket says to follow the hyperfleet-api reference implementation —
worth checking which exporter it uses so the two components stay consistent.
If HTTP was chosen intentionally (e.g., for simpler TLS setup), it might be
worth documenting the deviation and also handling OTEL_EXPORTER_OTLP_PROTOCOL
to support both.

Hyperfleet-api does not have any implementation related to this for the tracing standards. Used the suggested gRPC here.

internal/config/config.go

cmd/sentinel/main.go

… TODO to set OTel enable to true in default config

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (2)

pkg/telemetry/otel_test.go (2)

296-305: ⚠️ Potential issue | 🟡 Minor

Avoid fixed-index slicing after a non-fatal length check.

If traceParentStr is shorter than expected, the t.Errorf on Line 297 does not stop execution and the next slices can panic, hiding the real failure.

💡 Proposed fix

+import "strings"
@@
-	if traceParentStr[:3] != "00-" {
-		t.Errorf("Expected traceparent to start with '00-', got %s", traceParentStr[:3])
+	if !strings.HasPrefix(traceParentStr, "00-") {
+		t.Errorf("Expected traceparent to start with '00-', got %s", traceParentStr)
 	}
 
-	if traceParentStr[len(traceParentStr)-3:] != "-01" {
-		t.Errorf("Expected traceparent to end with '-01', got %s", traceParentStr[len(traceParentStr)-3:])
+	if !strings.HasSuffix(traceParentStr, "-01") {
+		t.Errorf("Expected traceparent to end with '-01', got %s", traceParentStr)
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/otel_test.go` around lines 296 - 305, The test currently uses
t.Errorf for the traceParentStr length check then immediately slices
traceParentStr which can panic if shorter; change the length assertion to a
fatal assertion (use t.Fatalf) or return after reporting the error so execution
stops, or alternatively perform guarded checks (e.g., verify len(traceParentStr)
>= 55 before doing traceParentStr[:3] and traceParentStr[len-3:]) when
validating traceParentStr in the otel_test.go test where traceParentStr is
inspected for the "00-" prefix and "-01" suffix.

43-77: ⚠️ Potential issue | 🟠 Major

Use a real local OTLP endpoint in this test.

This regresses to the same fake-collector setup that makes Shutdown environment-dependent. With a fake host, the exporter can block/fail while flushing, so the test is no longer hermetic.

💡 Proposed fix

+	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		w.WriteHeader(http.StatusOK)
+	}))
+	defer server.Close()
+
-	err := os.Setenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://fake-otel-collector:4317")
+	err := os.Setenv("OTEL_EXPORTER_OTLP_ENDPOINT", server.URL)
 	if err != nil {
 		t.Fatalf("Failed to set OTEL_EXPORTER_OTLP_ENDPOINT: %v", err)
 	}
@@
-	err = Shutdown(ctx, tp)
+	shutdownCtx, cancel := context.WithTimeout(ctx, 2*time.Second)
+	defer cancel()
+	err = Shutdown(shutdownCtx, tp)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/otel_test.go` around lines 43 - 77, The test sets
OTEL_EXPORTER_OTLP_ENDPOINT to a fake host which can make Shutdown hang; update
TestInitTraceProvider_OTLPExporter to use a real local OTLP endpoint (e.g.
"http://localhost:4317") instead of "http://fake-otel-collector:4317" and add a
quick connectivity pre-check (dial or HTTP GET) that skips the test if the local
collector is not reachable so the test remains hermetic; modify references in
the test that call InitTraceProvider and Shutdown to rely on the new endpoint
and skip early when the pre-check fails.

🧹 Nitpick comments (2)

pkg/telemetry/otel_test.go (1)

180-186: Restore the previous global tracer provider in these tests.

Both tests replace the package-global provider with otel.SetTracerProvider(tp) and only shut tp down. Later tests then inherit this in-memory provider and its sampling/export settings.

💡 Proposed fix

+	previousProvider := otel.GetTracerProvider()
 	otel.SetTracerProvider(tp)
 	defer func(ctx context.Context, tp *trace.TracerProvider) {
 		err := Shutdown(ctx, tp)
 		if err != nil {
 			t.Errorf("Failed to shutdown trace provider : %v", err)
 		}
+		otel.SetTracerProvider(previousProvider)
 	}(ctx, tp)

Also applies to: 261-267

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/otel_test.go` around lines 180 - 186, The tests replace the
global tracer provider with otel.SetTracerProvider(tp) but never restore the
previous provider; capture the prior provider before calling
otel.SetTracerProvider(tp) (e.g., prev := otel.GetTracerProvider()), and in the
defer that currently calls Shutdown(ctx, tp) add a step to restore the global
provider by calling otel.SetTracerProvider(prev) after shutting down tp; ensure
the same change is applied to both occurrences (around
otel.SetTracerProvider(tp) / Shutdown(ctx, tp) blocks, including the second
instance at lines noted).

go.mod (1)

17-21: Align all OTel Go modules to the same release version.

The current pins mix v1.42.0 for otel/sdk/trace with v1.40.0 for the exporters. OpenTelemetry Go's versioning standards require these modules to be kept synchronized—they're released together as a unit and should not be intentionally pinned to different versions. This prevents incompatibilities and makes future upgrades clearer.
♻️ Suggested alignment
-	go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.40.0
-	go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.40.0
+	go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.42.0
+	go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.42.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` around lines 17 - 21, The go.mod pins mix OpenTelemetry module
versions (go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk,
go.opentelemetry.io/otel/trace at v1.42.0 while
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp and
go.opentelemetry.io/otel/exporters/stdout/stdouttrace are at v1.40.0); update
the exporter module versions to match the others (e.g., change
otlptrace/otlptracehttp and stdouttrace to v1.42.0) so all otel modules are
aligned, then run module tidy/get (go mod tidy or go get) to ensure dependency
graph is consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/config/config.go`:
- Around line 117-122: The YAML-loaded file-backed value for messaging_system
isn't being normalized, so a blank or whitespace value can overwrite the
default; ensure the code trims whitespace and ignores empty values before
assigning to cfg.MessagingSystem (same logic used for the MESSAGING_SYSTEM env
branch). After unmarshalling (or wherever file values are applied), call
strings.TrimSpace on the file-provided messagingSystem and only set
cfg.MessagingSystem if the trimmed value != "".

In `@internal/sentinel/sentinel_test.go`:
- Around line 650-661: There are two defer blocks both calling tp.Shutdown(ctx);
remove the duplicate and consolidate into a single defer that restores the
previous provider and calls tp.Shutdown(ctx) exactly once with proper error
handling. Specifically, keep the previousProvider capture (from
otel.GetTracerProvider()), call otel.SetTracerProvider(tp) as before, then
replace the two defers with one defer func() that calls tp.Shutdown(ctx), checks
the returned error and reports it via t.Errorf if non-nil, and finally calls
otel.SetTracerProvider(previousProvider); ensure you reference tp, ctx,
previousProvider, otel.SetTracerProvider and tp.Shutdown in the combined defer
so Shutdown is only invoked once.

In `@internal/sentinel/sentinel.go`:
- Around line 139-145: The skip-branch currently builds skipCtx from the parent
ctx so skipped-resource logs are correlated to sentinel.poll; instead use the
per-resource evaluation context (evalCtx) created by
telemetry.StartSpan("sentinel.evaluate") so skip logs are children of that
span—update the logic that constructs skipCtx / any logging calls in the skip
branch to use evalCtx (and keep setting attributes on evalSpan as before),
ensuring you still cancel/finish the span appropriately after logging; locate
this change around the decisionEngine.Evaluate call, the evalCtx/evalSpan
variables, and the code that previously created skipCtx.

In `@pkg/logger/logger.go`:
- Around line 46-49: LogConfig.OTel.Enabled is declared but unused; either wire
it into the trace/span enrichment or remove it. Update buildEntry (the function
that currently adds trace_id/span_id from context.Context) to check
cfg.OTel.Enabled (OTelConfig.Enabled) before extracting/injecting
trace_id/span_id so toggling the flag disables those fields, or remove
OTelConfig.Enabled from LogConfig and any related code paths (and tests) if you
prefer to keep behavior unchanged. Ensure references to OTelConfig, LogConfig,
and buildEntry are updated consistently.

---

Duplicate comments:
In `@pkg/telemetry/otel_test.go`:
- Around line 296-305: The test currently uses t.Errorf for the traceParentStr
length check then immediately slices traceParentStr which can panic if shorter;
change the length assertion to a fatal assertion (use t.Fatalf) or return after
reporting the error so execution stops, or alternatively perform guarded checks
(e.g., verify len(traceParentStr) >= 55 before doing traceParentStr[:3] and
traceParentStr[len-3:]) when validating traceParentStr in the otel_test.go test
where traceParentStr is inspected for the "00-" prefix and "-01" suffix.
- Around line 43-77: The test sets OTEL_EXPORTER_OTLP_ENDPOINT to a fake host
which can make Shutdown hang; update TestInitTraceProvider_OTLPExporter to use a
real local OTLP endpoint (e.g. "http://localhost:4317") instead of
"http://fake-otel-collector:4317" and add a quick connectivity pre-check (dial
or HTTP GET) that skips the test if the local collector is not reachable so the
test remains hermetic; modify references in the test that call InitTraceProvider
and Shutdown to rely on the new endpoint and skip early when the pre-check
fails.

---

Nitpick comments:
In `@go.mod`:
- Around line 17-21: The go.mod pins mix OpenTelemetry module versions
(go.opentelemetry.io/otel, go.opentelemetry.io/otel/sdk,
go.opentelemetry.io/otel/trace at v1.42.0 while
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp and
go.opentelemetry.io/otel/exporters/stdout/stdouttrace are at v1.40.0); update
the exporter module versions to match the others (e.g., change
otlptrace/otlptracehttp and stdouttrace to v1.42.0) so all otel modules are
aligned, then run module tidy/get (go mod tidy or go get) to ensure dependency
graph is consistent.

In `@pkg/telemetry/otel_test.go`:
- Around line 180-186: The tests replace the global tracer provider with
otel.SetTracerProvider(tp) but never restore the previous provider; capture the
prior provider before calling otel.SetTracerProvider(tp) (e.g., prev :=
otel.GetTracerProvider()), and in the defer that currently calls Shutdown(ctx,
tp) add a step to restore the global provider by calling
otel.SetTracerProvider(prev) after shutting down tp; ensure the same change is
applied to both occurrences (around otel.SetTracerProvider(tp) / Shutdown(ctx,
tp) blocks, including the second instance at lines noted).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1725aa91-c23e-4ef8-afc0-215bd3ed0f3a

📥 Commits

Reviewing files that changed from the base of the PR and between 7a30971 and f1152df.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (7)

go.mod
internal/config/config.go
internal/sentinel/sentinel.go
internal/sentinel/sentinel_test.go
pkg/logger/logger.go
pkg/telemetry/otel.go
pkg/telemetry/otel_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/telemetry/otel.go

internal/config/config.go

internal/sentinel/sentinel_test.go

internal/sentinel/sentinel.go

pkg/logger/logger.go

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

internal/sentinel/sentinel.go (1)

185-202: Use publishCtx for publish-path logs.

Once the child publish span is started, these log lines should use publishCtx; otherwise they keep the parent sentinel.evaluate span ID and lose direct log-to-span correlation for the publish operation.

♻️ Proposed fix

-				s.logger.Errorf(eventCtx, "Failed to publish event resource_id=%s error=%v", resource.ID, err)
+				s.logger.Errorf(publishCtx, "Failed to publish event resource_id=%s error=%v", resource.ID, err)
@@
-			s.logger.Infof(eventCtx, "Published event resource_id=%s ready=%t",
+			s.logger.Infof(publishCtx, "Published event resource_id=%s ready=%t",
 				resource.ID, resource.Status.Ready)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/sentinel/sentinel.go` around lines 185 - 202, The publish-path logs
are using eventCtx (parent evaluate span) instead of the child publishCtx,
breaking log-to-span correlation; update the two logging calls inside the
publish block—specifically the s.logger.Errorf call in the Publish error branch
and the s.logger.Infof success call after Publish—to pass publishCtx (not
eventCtx), keeping publishSpan and evalSpan End() behavior unchanged so the
publish logs correlate to the publishSpan.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/sentinel/sentinel_test.go`:
- Around line 707-748: The test only checks span names/counts; update it to
assert trace hierarchy and the HTTP client span: use exporter.GetSpans() to
ensure all returned spans share the same TraceID, find the span with Name
"sentinel.poll" and assert that spans "sentinel.evaluate" and
"hyperfleet-clusters publish" have ParentSpanID equal to
sentinel.poll.SpanContext.SpanID, and assert there is an otelhttp client span
named "HTTP GET" present and tied to the same TraceID (and, if applicable, that
it is a child of "sentinel.evaluate"); also keep the existing published
CloudEvent traceparent check against mockPublisher.publishedEvents to ensure
propagation.

---

Nitpick comments:
In `@internal/sentinel/sentinel.go`:
- Around line 185-202: The publish-path logs are using eventCtx (parent evaluate
span) instead of the child publishCtx, breaking log-to-span correlation; update
the two logging calls inside the publish block—specifically the s.logger.Errorf
call in the Publish error branch and the s.logger.Infof success call after
Publish—to pass publishCtx (not eventCtx), keeping publishSpan and evalSpan
End() behavior unchanged so the publish logs correlate to the publishSpan.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4e41bf4e-8399-4df4-90a5-8e88e44a786e

📥 Commits

Reviewing files that changed from the base of the PR and between f1152df and 37aeb5f.

📒 Files selected for processing (3)

internal/sentinel/sentinel.go
internal/sentinel/sentinel_test.go
pkg/logger/logger.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/logger/logger.go

internal/sentinel/sentinel_test.go

pkg/telemetry/otel.go

… add support for grpc and http both

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/sentinel/sentinel.go (1)

170-202: ⚠️ Potential issue | 🟡 Minor

Use publishCtx for the publish-path logs.

telemetry.StartSpan already returns a context enriched with the child publish span IDs. Logging with eventCtx keeps these messages attached to sentinel.evaluate, so the new publish span does not show up in structured logs.

🛠️ Suggested fix

 			if err := s.publisher.Publish(publishCtx, topic, &event); err != nil {
 				publishSpan.RecordError(err)
 				publishSpan.SetStatus(codes.Error, "publish failed")
 				// Record broker error
 				metrics.UpdateBrokerErrorsMetric(resourceType, resourceSelector, "publish_error")
-				s.logger.Errorf(eventCtx, "Failed to publish event resource_id=%s error=%v", resource.ID, err)
+				s.logger.Errorf(publishCtx, "Failed to publish event resource_id=%s error=%v", resource.ID, err)
 				publishSpan.End()
 				evalSpan.End()
 				continue
 			}
 
-			publishSpan.End()
-
 			// Record successful event publication
 			metrics.UpdateEventsPublishedMetric(resourceType, resourceSelector, decision.Reason)
 
-			s.logger.Infof(eventCtx, "Published event resource_id=%s ready=%t",
+			s.logger.Infof(publishCtx, "Published event resource_id=%s ready=%t",
 				resource.ID, resource.Status.Ready)
+			publishSpan.End()
 			published++

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/sentinel/sentinel.go` around lines 170 - 202, The publish-path logs
are using eventCtx instead of the child context returned by telemetry.StartSpan
(publishCtx), so the logs remain associated with the parent span; change the
logging calls around s.publisher.Publish to use publishCtx (not eventCtx) —
specifically replace s.logger.Errorf(eventCtx, ...) and s.logger.Infof(eventCtx,
...) with s.logger.Errorf(publishCtx, ...) and s.logger.Infof(publishCtx, ...)
respectively, keeping publishSpan.RecordError/SetStatus, publishSpan.End(),
evalSpan.End(), and metrics.Update* calls unchanged so the publish span trace
context is attached to those log entries.

♻️ Duplicate comments (1)

pkg/telemetry/otel.go (1)

77-102: ⚠️ Potential issue | 🟠 Major

parentbased_traceidratio still ignores the configured rate, and the other branches dropped ParentBased(...).

OTEL_TRACES_SAMPLER_ARG is only parsed inside the traceidratio branch, so the default/parentbased_traceidratio path always stays at 1.0. At the same time, always_on, always_off, and traceidratio now return standalone samplers, which breaks the parent-based behavior Sentinel relies on for trace continuity.

🛠️ Suggested fix

-	var sampler trace.Sampler
-	samplerType := strings.ToLower(os.Getenv(envOtelTracesSampler))
+	if arg := os.Getenv(envOtelTracesSamplerArg); arg != "" {
+		if parsedRate, err := strconv.ParseFloat(arg, 64); err == nil && parsedRate >= 0.0 && parsedRate <= 1.0 {
+			samplingRate = parsedRate
+		} else {
+			log.Warnf(ctx, "Invalid %s value=%q, using default: %v", envOtelTracesSamplerArg, arg, samplingRate)
+		}
+	}
+
+	var sampler trace.Sampler
+	samplerType := strings.ToLower(os.Getenv(envOtelTracesSampler))
 
 	switch samplerType {
 	case samplerAlwaysOn:
-		sampler = trace.AlwaysSample()
+		sampler = trace.ParentBased(trace.AlwaysSample())
 	case samplerAlwaysOff:
-		sampler = trace.NeverSample()
+		sampler = trace.ParentBased(trace.NeverSample())
 	case samplerTraceIdRatio:
-		// Parse the sampling rate for traceidratio
-		rate := defaultSamplingRate // default 1.0
-		if arg := os.Getenv(envOtelTracesSamplerArg); arg != "" {
-			if parsedRate, err := strconv.ParseFloat(arg, 64); err == nil && parsedRate >= 0.0 && parsedRate <= 1.0 {
-				rate = parsedRate
-			} else {
-				log.Warnf(ctx, "Invalid %s value=%q, using default: %v", envOtelTracesSamplerArg, arg, rate)
-			}
-		}
-		sampler = trace.TraceIDRatioBased(rate)
+		sampler = trace.ParentBased(trace.TraceIDRatioBased(samplingRate))
 	case parentBasedTraceIdRatio, "":
 		// Default per tracing standard
 		sampler = trace.ParentBased(trace.TraceIDRatioBased(samplingRate))

Based on learnings, HyperFleet tracing standards require Sentinel sampler branches to remain wrapped in trace.ParentBased(...).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/otel.go` around lines 77 - 102, The sampler selection currently
parses OTEL_TRACES_SAMPLER_ARG only inside the traceidratio case and returns
standalone samplers, which ignores configured rates for parentbased_traceidratio
and breaks ParentBased behavior; move parsing of envOtelTracesSamplerArg (into a
rate variable, defaulting to samplingRate or defaultSamplingRate) before the
switch so all branches can use the same rate, and ensure every non-error branch
wraps its choice with trace.ParentBased(...): use
trace.ParentBased(trace.TraceIDRatioBased(rate)) for traceidratio and
parentBasedTraceIdRatio/"" branches, and wrap
trace.AlwaysSample()/trace.NeverSample() as
trace.ParentBased(trace.AlwaysSample()) / trace.ParentBased(trace.NeverSample())
so ParentBased continuity (trace.ParentBased, trace.TraceIDRatioBased,
trace.AlwaysSample, trace.NeverSample, samplerType, envOtelTracesSamplerArg,
samplingRate) is preserved.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@internal/sentinel/sentinel.go`:
- Around line 170-202: The publish-path logs are using eventCtx instead of the
child context returned by telemetry.StartSpan (publishCtx), so the logs remain
associated with the parent span; change the logging calls around
s.publisher.Publish to use publishCtx (not eventCtx) — specifically replace
s.logger.Errorf(eventCtx, ...) and s.logger.Infof(eventCtx, ...) with
s.logger.Errorf(publishCtx, ...) and s.logger.Infof(publishCtx, ...)
respectively, keeping publishSpan.RecordError/SetStatus, publishSpan.End(),
evalSpan.End(), and metrics.Update* calls unchanged so the publish span trace
context is attached to those log entries.

---

Duplicate comments:
In `@pkg/telemetry/otel.go`:
- Around line 77-102: The sampler selection currently parses
OTEL_TRACES_SAMPLER_ARG only inside the traceidratio case and returns standalone
samplers, which ignores configured rates for parentbased_traceidratio and breaks
ParentBased behavior; move parsing of envOtelTracesSamplerArg (into a rate
variable, defaulting to samplingRate or defaultSamplingRate) before the switch
so all branches can use the same rate, and ensure every non-error branch wraps
its choice with trace.ParentBased(...): use
trace.ParentBased(trace.TraceIDRatioBased(rate)) for traceidratio and
parentBasedTraceIdRatio/"" branches, and wrap
trace.AlwaysSample()/trace.NeverSample() as
trace.ParentBased(trace.AlwaysSample()) / trace.ParentBased(trace.NeverSample())
so ParentBased continuity (trace.ParentBased, trace.TraceIDRatioBased,
trace.AlwaysSample, trace.NeverSample, samplerType, envOtelTracesSamplerArg,
samplingRate) is preserved.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 20c6dc44-18b4-482e-9ac7-ee70a0597c51

📥 Commits

Reviewing files that changed from the base of the PR and between 37aeb5f and 9df1167.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (4)

go.mod
internal/sentinel/sentinel.go
pkg/logger/logger_test.go
pkg/telemetry/otel.go

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/telemetry/otel.go`:
- Around line 58-61: Update the error log in the OTLP exporter creation so it
doesn't hardcode "gRPC exporter": in the function that creates the exporter (the
block where the exporter is constructed and where log.Errorf(ctx, "Failed to
create OTLP gRPC exporter: %v", err) is called), change the message to include
the actual exporter kind (e.g., use an exporterKind variable set to "gRPC" or
"HTTP") or use a generic message such as "Failed to create OTLP exporter" and
append the error; ensure you reference the same err variable and replace the
hardcoded string in the log.Errorf call accordingly.
- Around line 107-113: The parentBasedTraceIdRatio and default branches are
using an unparsed samplingRate (still defaultSamplingRate) and thus ignore
OTEL_TRACES_SAMPLER_ARG; update the logic that handles samplerType to parse
OTEL_TRACES_SAMPLER_ARG into samplingRate (e.g., parse/validate the value into
the samplingRate variable, handling errors and clamping to [0,1]) before
constructing the sampler, and then use that parsed samplingRate in
trace.ParentBased(trace.TraceIDRatioBased(samplingRate)) for the
parentBasedTraceIdRatio case and the default case (reference symbols:
samplerType, parentBasedTraceIdRatio, samplingRate, trace.ParentBased,
trace.TraceIDRatioBased); also emit a warning via log.Warnf if the arg is
invalid and fall back to defaultSamplingRate.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aa2d26ec-cd5a-4874-b9e8-b7d4bec66ce9

📥 Commits

Reviewing files that changed from the base of the PR and between 9df1167 and ddc5222.

📒 Files selected for processing (1)

pkg/telemetry/otel.go

pkg/telemetry/otel.go

rafabene · 2026-03-11T13:07:08Z

pkg/telemetry/otel.go

+			exporter, err = otlptracegrpc.New(ctx)
+		}
+		if err != nil {
+			log.Errorf(ctx, "Failed to create OTLP gRPC exporter: %v", err)


Priority: Bug

The error message on line 58 always says "Failed to create OTLP gRPC exporter" even when the
HTTP exporter was the one that failed. The if err != nil block handles errors from all
branches of the protocol switch above, but the message is hardcoded to say "gRPC".

Suggested fix — either move the error check inside each case, or use a generic message:

Suggested change

log.Errorf(ctx, "Failed to create OTLP gRPC exporter: %v", err)

log.Errorf(ctx, "Failed to create OTLP exporter (protocol=%s): %v", protocol, err)

rafabene · 2026-03-11T13:09:45Z

go.mod

 	github.com/testcontainers/testcontainers-go/modules/rabbitmq v0.40.0
+	go.opentelemetry.io/otel v1.42.0
+	go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.42.0
+	go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.40.0


Priority: Pattern

go.opentelemetry.io/otel/exporters/stdout/stdouttrace is at v1.40.0 while the rest of the OTel
modules are at v1.42.0 (go.mod line 19). The OTel Go project recommends keeping all modules
in lockstep to avoid subtle behavior mismatches.

go get go.opentelemetry.io/otel/exporters/stdout/stdouttrace@v1.42.0 go mod tidy

rafabene · 2026-03-11T13:11:11Z

internal/sentinel/sentinel_test.go

+	registry := prometheus.NewRegistry()
+	metrics.NewSentinelMetrics(registry, "test")
+
+	cfg := &config.SentinelConfig{


Priority: Pattern

TestTrigger_CreatesRequiredSpans (line 683) creates a SentinelConfig without MessagingSystem,
which means the publish span will have messaging.system="". Since this test validates span
attributes, it should set the field explicitly (or use NewSentinelConfig()) so the test also
validates that messaging.system is correctly propagated.

cfg := &config.SentinelConfig{ ResourceType: "clusters", Topic: "hyperfleet-clusters", MaxAgeNotReady: 10 * time.Second, MaxAgeReady: 30 * time.Minute, MessagingSystem: "rabbitmq", }

Hyperfleet-542 : Add OpenTelemetry instrumentation to Sentinel

dae968d

openshift-merge-robot added the needs-rebase label Feb 25, 2026

openshift-ci bot requested review from Mischulee and yasun1 February 25, 2026 00:49

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

Hyperfleet-542 : Merged from master

3844006

openshift-merge-robot removed the needs-rebase label Feb 25, 2026

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

Hyperfleet-542 : Fix linter error

5fb7281

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

internal/client/client_test.go Show resolved Hide resolved

internal/client/client_test.go Show resolved Hide resolved

internal/client/client_test.go Outdated Show resolved Hide resolved

rafabene reviewed Feb 25, 2026

View reviewed changes

pkg/telemetry/otel.go Outdated Show resolved Hide resolved

rafabene reviewed Feb 25, 2026

View reviewed changes

cmd/sentinel/main.go Outdated Show resolved Hide resolved

rafabene reviewed Feb 25, 2026

View reviewed changes

internal/sentinel/sentinel.go Show resolved Hide resolved

Hyperfleet-542 : Configure OTEL enablement, messaging system and fix …

2d3f335

…OTel traceparent function and tests

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

Hyperfleet-542 : Fix sampler imeplementation and log error as per the…

b3a9ae1

… tracing standardization

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

cmd/sentinel/main.go Show resolved Hide resolved

pkg/telemetry/otel.go Outdated Show resolved Hide resolved

tirthct added 4 commits February 26, 2026 10:06

Hyperfleet-542 : Fix additional defer and empty string checks

7248e50

Merge branch 'main' into hyperfleet-542

37efa69

Hyperfleet-542 : Fix linter error

ff3b948

Hyperfleet-542 : Added error validation to sampling rate and pass cor…

8c2c2bb

…rect context

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

internal/sentinel/sentinel_test.go Outdated Show resolved Hide resolved

internal/sentinel/sentinel_test.go Outdated Show resolved Hide resolved

pkg/telemetry/otel.go Outdated Show resolved Hide resolved

rafabene reviewed Feb 26, 2026

View reviewed changes

internal/sentinel/sentinel.go Show resolved Hide resolved

Hyperfleet-542 : Added decision reason attribute to evalSpan and erro…

31b40d6

…r check for new hyperfleet client

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

pkg/telemetry/otel.go Outdated Show resolved Hide resolved

pkg/telemetry/otel.go Outdated Show resolved Hide resolved

Hyperfleet-542 : Removed WithEndpointURL() implementation and add nil…

f30c9bf

… check in SetTraceContext()

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

pkg/telemetry/otel.go Outdated Show resolved Hide resolved

rafabene reviewed Mar 5, 2026

View reviewed changes

Hyperfleet-542 : Add a default case for samplerType and fix integrati…

cb7cbb0

…on tests

openshift-merge-robot added the needs-rebase label Mar 9, 2026

Hyperfleet-542 : Merge from main

7a30971

openshift-merge-robot removed the needs-rebase label Mar 9, 2026

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

pkg/telemetry/otel.go Outdated Show resolved Hide resolved

rafabene reviewed Mar 10, 2026

View reviewed changes

internal/sentinel/sentinel.go Show resolved Hide resolved

rafabene reviewed Mar 10, 2026

View reviewed changes

internal/sentinel/sentinel_test.go Show resolved Hide resolved

rafabene reviewed Mar 10, 2026

View reviewed changes

internal/config/config.go Outdated Show resolved Hide resolved

rafabene reviewed Mar 10, 2026

View reviewed changes

cmd/sentinel/main.go Show resolved Hide resolved

Hyperfleet-542 : Use grpc instead of http, fix failing test and add a…

f1152df

… TODO to set OTel enable to true in default config

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

internal/config/config.go Show resolved Hide resolved

internal/sentinel/sentinel_test.go Outdated Show resolved Hide resolved

internal/sentinel/sentinel.go Show resolved Hide resolved

pkg/logger/logger.go Show resolved Hide resolved

Hyperfleet-542 : Set trace and span id only when otel is enabled

37aeb5f

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

internal/sentinel/sentinel_test.go Show resolved Hide resolved

Hyperfleet-542 : Fix failing test

0b458d5

rafabene reviewed Mar 10, 2026

View reviewed changes

pkg/telemetry/otel.go Outdated Show resolved Hide resolved

tirthct added 3 commits March 10, 2026 16:20

Hyperfleet-542 : Minor log fix

33fb2e5

Merge branch 'main' into hyperfleet-542

5504575

Hyperfleet-542 : go mod tidy

41a6de5

rafabene reviewed Mar 10, 2026

View reviewed changes

pkg/telemetry/otel.go Outdated Show resolved Hide resolved

tirthct added 2 commits March 10, 2026 20:32

Hyperfleet-542 : Merge two switch statements for defining sampler

9df1167

Hyperfleet-542 : Merge two switch statements for defining sampler and…

ddc5222

… add support for grpc and http both

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

pkg/telemetry/otel.go Show resolved Hide resolved

pkg/telemetry/otel.go Show resolved Hide resolved

Hyperfleet-542 : Fix sampling rate bug

80aaa6a

rafabene reviewed Mar 11, 2026

View reviewed changes

		var samplingRate = defaultSamplingRate

		log := logger.NewHyperFleetLogger()

	log.Errorf(ctx, "Failed to create OTLP gRPC exporter: %v", err)
	log.Errorf(ctx, "Failed to create OTLP exporter (protocol=%s): %v", protocol, err)

Conversation

tirthct commented Feb 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Feb 25, 2026

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tirthct commented Feb 26, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tirthct commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading