Skip to content

HYPERFLEET-536: Optimize Sentinel with server-side condition-based filtering#60

Draft
rafabene wants to merge 6 commits intoopenshift-hyperfleet:mainfrom
rafabene:HYPERFLEET-536
Draft

HYPERFLEET-536: Optimize Sentinel with server-side condition-based filtering#60
rafabene wants to merge 6 commits intoopenshift-hyperfleet:mainfrom
rafabene:HYPERFLEET-536

Conversation

@rafabene
Copy link
Contributor

@rafabene rafabene commented Mar 4, 2026

Summary

  • Replace single fetch-all query with two targeted TSL queries for server-side condition-based filtering, reducing network traffic and API load
  • Query 1 fetches not-ready resources (status.conditions.Ready='False'), Query 2 fetches stale ready resources (Ready='True' with last_updated_time older than max_age_ready)
  • Results are merged and deduplicated by resource ID; graceful degradation continues with not-ready results if stale query fails

Changes

File Description
internal/client/client.go Added additionalFilters ...string to FetchResources, new buildSearchString function
internal/sentinel/sentinel.go New fetchFilteredResources method (two queries), mergeResources dedup helper
internal/client/client_test.go 3 new tests: TestBuildSearchString, TestFetchResources_WithAdditionalFilters, TestFetchResources_WithConditionFilterOnly
internal/sentinel/sentinel_test.go New mockServerForConditionQueries helper, 3 new tests + 7 existing tests updated for two-query pattern
test/integration/integration_test.go All 4 integration tests updated for condition-based search pattern

Trade-off

Resources that recently became Ready may have generation-based reconciliation delayed up to max_age_ready (30min), since they are only fetched when stale. This is acceptable because Ready=True already indicates the adapter has processed the resource.

Test plan

  • Unit tests pass (make test)
  • Lint clean (make lint)
  • Integration tests (make test-integration) — requires Docker/Podman environment
  • Architecture docs update (sentinel.md, architecture-summary.md) — follow-up PR in openshift-hyperfleet/architecture

JIRA: HYPERFLEET-536

Summary by CodeRabbit

  • New Features

    • Resource fetching now accepts additional condition-based filters that are combined with label selectors into a single search query.
    • Two-query polling: fetches not-ready and stale-ready resources, merges results with deduplication (first-source precedence), and gracefully falls back to partial results if the stale query fails.
    • Event building preserves context across the new two-query workflow.
  • Tests

    • Added unit and integration tests for search-string composition, multi-filter queries, two-query behavior, merging/deduplication, poll-cycle handling, event payloads, and context propagation.

…ltering

Replace single fetch-all query with two targeted TSL queries to reduce
network traffic and API load:

1. Not-ready resources: status.conditions.Ready='False'
2. Stale ready resources: Ready='True' with last_updated_time older than max_age_ready

Results are merged and deduplicated by resource ID before evaluation.
Includes graceful degradation if the stale query fails.
@openshift-ci openshift-ci bot requested review from Mischulee and ciaranRoche March 4, 2026 18:35
@openshift-ci
Copy link

openshift-ci bot commented Mar 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mbrudnoy for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

FetchResources was extended to accept variadic additionalFilters and now builds a composite TSL search string via buildSearchString which merges label selectors and extra filters. fetchResourcesOnce signature was updated to receive additionalFilters and forwards the composed search to cluster/node-pool fetches. Sentinel was changed to perform two condition-based queries per poll (not-ready and stale-ready), merge results with first-slice precedence via mergeResources, and gracefully degrade if the stale query fails. Tests and integration harnesses were updated to cover composite search strings, two-query behavior, merging, and poll-cycle semantics.

Sequence Diagram(s)

sequenceDiagram
    participant Sentinel as Sentinel
    participant Client as HyperFleet Client
    participant Server as HyperFleet Server
    participant Merge as mergeResources
    participant Eval as Decision Evaluator
    participant Pub as Broker Publisher

    Sentinel->>Client: FetchResources(labelSelector, additionalFilters)
    Client->>Server: GET /resources?search=<labels AND Ready='False' [+ extra filters]>
    Server-->>Client: Not-ready resources (A)
    Client->>Server: GET /resources?search=<labels AND Ready='True' AND age filter [+ extra filters]>
    Server-->>Client: Stale-ready resources (B) or error
    Client->>Merge: mergeResources(A, B)
    Merge->>Merge: deduplicate (first-slice precedence)
    Merge-->>Sentinel: merged resources
    Sentinel->>Eval: evaluate decisions
    Eval->>Pub: publish events
    Pub-->>Sentinel: publish results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • ciaranRoche
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: optimizing Sentinel through server-side condition-based filtering using two targeted TSL queries instead of a single fetch-all approach.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
internal/sentinel/sentinel_test.go (1)

125-132: Make unexpected search queries fail fast in the test helper.

The default branch currently returns merged clusters, which can hide query-construction regressions. Prefer returning 400 (or explicit test failure) for unknown search patterns.

Suggested fix
 		switch {
 		case strings.Contains(search, "Ready='False'"):
 			clusters = notReadyClusters
 		case strings.Contains(search, "Ready='True'"):
 			clusters = staleClusters
 		default:
-			clusters = append(notReadyClusters, staleClusters...)
+			w.WriteHeader(http.StatusBadRequest)
+			if err := json.NewEncoder(w).Encode(map[string]string{
+				"error": fmt.Sprintf("unexpected search query: %s", search),
+			}); err != nil {
+				t.Logf("Error encoding bad request response: %v", err)
+			}
+			return
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/sentinel/sentinel_test.go` around lines 125 - 132, The test helper's
switch on the variable search currently falls through to merging clusters in the
default branch, hiding bad query changes; replace the default branch so it fails
fast by calling t.Fatalf("unexpected search query: %q", search) (or otherwise
return an explicit 400/error) instead of appending notReadyClusters and
staleClusters; update the switch in sentinel_test.go that references search,
clusters, notReadyClusters and staleClusters so unknown search patterns trigger
the test failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/sentinel/sentinel.go`:
- Around line 220-224: The code in trigger() treats any error from
s.client.FetchResources (staleResources) as a graceful success by returning
notReadyResources, nil; change this so context cancellation/timeouts are
propagated: after calling s.client.FetchResources(ctx, rt, labelSelector,
staleFilter) check if errors.Is(err, context.Canceled) || errors.Is(err,
context.DeadlineExceeded) and if so return nil, err (propagate the context
error) instead of returning notReadyResources, nil; for non-context errors keep
the existing graceful-degradation behavior. Ensure you import the errors package
and reference s.client.FetchResources, staleResources and trigger() when making
the change.
- Around line 233-244: mergeResources currently dedupes solely on Resource.ID so
entries with empty IDs collapse; change the dedupe key logic in mergeResources
to treat empty IDs as distinct by using a fallback unique key (e.g., prefix +
the index or source) when resource.ID == "" (for example "__empty_a_{i}" for
items from slice a and "__empty_b_{i}" for items from slice b) so each empty-ID
resource is preserved; update the seen map population and existence check to use
this computed key (and add fmt import if you use fmt.Sprintf).

In `@test/integration/integration_test.go`:
- Around line 122-133: The test mutates pollCycleCount and receivedSearchParams
from HTTP handler goroutines without synchronization, causing data races; fix by
introducing synchronization (e.g., a sync.Mutex or sync.RWMutex or atomic
counters) and use it wherever these shared variables are written or read: wrap
increments to pollCycleCount (referenced in the server handler closure) with
atomic.AddInt32 or lock/unlock around the mutation, and protect accesses and
appends to receivedSearchParams (the slice/map populated in the handler and
later asserted in the test) with the same mutex (lock in handler when mutating,
lock or RLock in the test when reading/asserting). Ensure all other handler/test
sites noted in the comment (the other handler blocks that touch these variables)
use the same synchronization primitives.

---

Nitpick comments:
In `@internal/sentinel/sentinel_test.go`:
- Around line 125-132: The test helper's switch on the variable search currently
falls through to merging clusters in the default branch, hiding bad query
changes; replace the default branch so it fails fast by calling
t.Fatalf("unexpected search query: %q", search) (or otherwise return an explicit
400/error) instead of appending notReadyClusters and staleClusters; update the
switch in sentinel_test.go that references search, clusters, notReadyClusters
and staleClusters so unknown search patterns trigger the test failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c9cf8e0d-0369-4d07-9924-e89505ac244c

📥 Commits

Reviewing files that changed from the base of the PR and between 9b037da and 3602b50.

📒 Files selected for processing (5)
  • internal/client/client.go
  • internal/client/client_test.go
  • internal/sentinel/sentinel.go
  • internal/sentinel/sentinel_test.go
  • test/integration/integration_test.go

- Propagate context cancellation/timeout errors in fetchFilteredResources
  instead of swallowing them via graceful degradation
- Handle empty resource IDs in mergeResources so they are treated as
  distinct entries instead of collapsing
- Add synchronization for shared variables in integration tests
  (atomic counters for pollCycleCount/callCount, mutex for
  receivedSearchParams)
- Fail fast in mockServerForConditionQueries on unexpected search queries
  instead of silently merging results
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
test/integration/integration_test.go (1)

464-468: Fail fast on unexpected search queries in this mock handler.

The default branch currently returns valid clusters (Line 464-468), which can mask regressions to non-condition query behavior. Prefer returning an error and failing the test on unexpected query shapes.

Suggested fail-fast default branch
 		default:
-			clusters = []map[string]interface{}{
-				createMockCluster("cluster-old", 2, 2, true, now.Add(-35*time.Minute)),
-				createMockCluster("cluster-not-ready", 1, 1, false, now.Add(-15*time.Second)),
-			}
+			t.Errorf("unexpected search query: %q", search)
+			http.Error(w, "unexpected search query", http.StatusBadRequest)
+			return
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/integration_test.go` around lines 464 - 468, The default
branch of the mock search handler is returning valid clusters which can hide
unexpected query shapes; locate the switch/default in the mock handler that
assigns to clusters (it currently uses createMockCluster("cluster-old", ...) and
createMockCluster("cluster-not-ready", ...)), and change the default behavior to
fail fast instead of returning data — e.g., call t.Fatalf (or otherwise return
an error/HTTP 400) with a clear message including the unexpected query shape so
the test fails immediately on unrecognized queries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/sentinel/sentinel_test.go`:
- Around line 808-866: The test tracks queryCount but never asserts it, so add
an assertion after calling s.trigger(ctx) to ensure both backend queries were
attempted (i.e., queryCount == 2); place this check immediately after the
existing publish assertion and fail the test with a clear message if queryCount
is not 2 so the test will catch cases where the stale-ready query is skipped.
Reference the queryCount variable and the call to s.trigger(ctx) when adding the
assertion.

In `@test/integration/integration_test.go`:
- Around line 226-247: The test TestIntegration_LabelSelectorFiltering's mock
server currently ignores validating the actual outbound search query; modify the
httptest server handler (the server variable that reads searchParam) to record
each incoming searchParam into a slice or channel and then assert in the test
that the captured search values contain the expected condition fragments (e.g.,
"labels.shard='1'" plus the appropriate "Ready='True'" and "Ready='False'"
clauses) to ensure query composition is exercised; use the existing
createMockClusterWithLabels helper and searchParam variable to locate the
handler and add the capture/assert logic.
- Around line 390-400: The test currently only checks prefixes and can silently
pass if no queries were captured; update the validation around
receivedSearchParams: first assert that len(receivedSearchParams) > 0 (or use
paramsCopy) to ensure at least one query was observed, then assert that among
paramsCopy there exists at least one query containing "Ready='False'" and at
least one containing "Ready='True'"; keep the existing prefix check
(labelPrefix) but add these presence assertions to fail the test when queries
are missing or when either Ready condition is absent.

---

Nitpick comments:
In `@test/integration/integration_test.go`:
- Around line 464-468: The default branch of the mock search handler is
returning valid clusters which can hide unexpected query shapes; locate the
switch/default in the mock handler that assigns to clusters (it currently uses
createMockCluster("cluster-old", ...) and createMockCluster("cluster-not-ready",
...)), and change the default behavior to fail fast instead of returning data —
e.g., call t.Fatalf (or otherwise return an error/HTTP 400) with a clear message
including the unexpected query shape so the test fails immediately on
unrecognized queries.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 716baa0d-106a-473c-b48c-071c9923c566

📥 Commits

Reviewing files that changed from the base of the PR and between 3602b50 and 365acf5.

📒 Files selected for processing (3)
  • internal/sentinel/sentinel.go
  • internal/sentinel/sentinel_test.go
  • test/integration/integration_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/sentinel/sentinel.go

- Assert queryCount in TestTrigger_StaleQueryFailure to verify both
  not-ready and stale queries were attempted
- Capture and assert search params in TestIntegration_LabelSelectorFiltering
  to validate label selector and condition fragments in outbound queries
- Assert receivedSearchParams is non-empty and contains both Ready='False'
  and Ready='True' queries in TestIntegration_TSLSyntaxMultipleLabels
- Fail fast on unexpected search queries in BrokerLoggerContext mock handler
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/integration/integration_test.go (1)

193-195: Prefer condition-based waiting over fixed sleep to reduce CI flakiness.

Assertions at Line 193, Line 313, and Line 434 depend on earlier fixed time.Sleep(300ms). On slower runs this can produce false negatives even when behavior is correct.

Refactor pattern
-	time.Sleep(300 * time.Millisecond)
-	cancel()
+	deadline := time.Now().Add(2 * time.Second)
+	for time.Now().Before(deadline) {
+		if pollCycleCount.Load() >= 2 {
+			break
+		}
+		time.Sleep(25 * time.Millisecond)
+	}
+	cancel()

Apply the same “wait-until condition or timeout” pattern for captured-query assertions in the other two tests.

Also applies to: 313-315, 434-436

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/integration_test.go` around lines 193 - 195, Replace the
fixed 300ms sleeps with a condition-based wait: implement a short helper or
inline loop that polls the condition until a timeout (e.g., 5s) and fails the
test on timeout; specifically, for the pollCycleCount check replace the direct
assertion of pollCycleCount.Load() < 2 with a wait-until that polls
pollCycleCount.Load() >= 2 and then asserts, and apply the same pattern to the
captured-query assertions referenced at the other two locations so they wait-for
the expected captured queries (instead of relying on time.Sleep).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/sentinel/sentinel_test.go`:
- Around line 724-729: The test mutates receivedSearchParams and queryCount from
the HTTP handler goroutines without synchronization; protect
receivedSearchParams with a sync.Mutex (e.g., add var receivedMu sync.Mutex and
wrap append/read of receivedSearchParams in receivedMu.Lock()/Unlock()) and
convert queryCount to an atomic.Int32 (or use atomic.AddInt32/atomic.LoadInt32)
so increments in the handler use atomic.Add and reads in the test use
atomic.Load, updating references to receivedSearchParams and queryCount in the
handler and the test assertions accordingly (look for the httptest.NewServer
handler closure that appends to receivedSearchParams and the places the test
reads receivedSearchParams/queryCount).

---

Nitpick comments:
In `@test/integration/integration_test.go`:
- Around line 193-195: Replace the fixed 300ms sleeps with a condition-based
wait: implement a short helper or inline loop that polls the condition until a
timeout (e.g., 5s) and fails the test on timeout; specifically, for the
pollCycleCount check replace the direct assertion of pollCycleCount.Load() < 2
with a wait-until that polls pollCycleCount.Load() >= 2 and then asserts, and
apply the same pattern to the captured-query assertions referenced at the other
two locations so they wait-for the expected captured queries (instead of relying
on time.Sleep).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2194d8f0-2df4-446c-81ef-b2b1bc194592

📥 Commits

Reviewing files that changed from the base of the PR and between 365acf5 and 3cb0633.

📒 Files selected for processing (2)
  • internal/sentinel/sentinel_test.go
  • test/integration/integration_test.go

- Protect receivedSearchParams with sync.Mutex in
  TestTrigger_ConditionBasedQueries (append in handler, read in test)
- Convert queryCount to atomic.Int32 in TestTrigger_StaleQueryFailure
  (increment in handler, read in test)
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/sentinel/sentinel_test.go`:
- Around line 823-839: The test handler currently treats any non-Ready='False'
search as a stale-query failure; change the logic to explicitly check for
Ready='True' (e.g., replace the catch-all else with else if
strings.Contains(search, "Ready='True'") to return the internal server error
stub) and make the final else return a distinct failure for unknown/malformed
queries (for example write http.StatusBadRequest with an explanatory body or
call t.Fatalf) so unknown search strings cause the test to fail; apply the same
explicit Ready='True' check to the other occurrence referenced around
createMockCluster / createMockClusterList so only the intended stale-query input
triggers the 500 response.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d08d6965-4b27-4af9-8ef9-d848b95eebc9

📥 Commits

Reviewing files that changed from the base of the PR and between 3cb0633 and 8e5e44f.

📒 Files selected for processing (1)
  • internal/sentinel/sentinel_test.go

…test

Replace catch-all else with explicit Ready='True' check so only the
intended stale query triggers the 500 response. Unknown search patterns
now fail fast with 400 and t.Errorf.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/sentinel/sentinel_test.go (1)

729-752: Add a default case to fail on unexpected queries for consistency.

Unlike mockServerForConditionQueries (line 132-135) and TestTrigger_StaleQueryFailure (line 839-842), this handler silently returns an empty cluster list when neither Ready='False' nor Ready='True' is matched. This makes the test less robust — if the search string format changes unexpectedly, the test would pass with empty results rather than failing fast.

♻️ Suggested improvement
 		} else if strings.Contains(search, "Ready='True'") {
 			clusters = []map[string]interface{}{
 				createMockCluster("stale-1", 2, 2, true, now.Add(-31*time.Minute)),
 			}
+		} else {
+			t.Errorf("unexpected search query: %q", search)
+			w.WriteHeader(http.StatusBadRequest)
+			return
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/sentinel/sentinel_test.go` around lines 729 - 752, The test HTTP
handler created in the anonymous function for the httptest server silently
returns an empty cluster list when the query doesn't match "Ready='False'" or
"Ready='True'"; update the handler inside the server :=
httptest.NewServer(http.HandlerFunc(...)) so it has a default branch that fails
the test (e.g., t.Fatalf or t.Errorf) when an unexpected search string is
received, referencing the received search parameter (receivedSearchParams or
search) and the response-building helpers
(createMockCluster/createMockClusterList) to locate the code to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/sentinel/sentinel_test.go`:
- Around line 729-752: The test HTTP handler created in the anonymous function
for the httptest server silently returns an empty cluster list when the query
doesn't match "Ready='False'" or "Ready='True'"; update the handler inside the
server := httptest.NewServer(http.HandlerFunc(...)) so it has a default branch
that fails the test (e.g., t.Fatalf or t.Errorf) when an unexpected search
string is received, referencing the received search parameter
(receivedSearchParams or search) and the response-building helpers
(createMockCluster/createMockClusterList) to locate the code to change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: db226001-df7a-47ab-857c-916c74b82ada

📥 Commits

Reviewing files that changed from the base of the PR and between 8e5e44f and a9d8791.

📒 Files selected for processing (1)
  • internal/sentinel/sentinel_test.go

@rh-amarin
Copy link
Contributor

Trade-off
Resources that recently became Ready may have generation-based reconciliation delayed up to max_age_ready (30min), since they are only fetched when stale. This is acceptable because Ready=True already indicates the adapter has processed the resource.

When there is a generation change, the resource immediately gets Ready=False, so the sentinel will trigger because of this reason and will not take into account the most recent time it was Ready=True

So, in theory there should be no delay, or I didn't understood the use case

//
// Results are merged and deduplicated by resource ID. The DecisionEngine still
// evaluates the filtered set in memory (e.g., for generation-based checks).
func (s *Sentinel) fetchFilteredResources(ctx context.Context, labelSelector map[string]string) ([]client.Resource, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible optimization would be to fetch in parallel.
But we can leave that for later if necessary.

My concern would be the timeout for the requests, we have a timeout: 5s which means that both requests timing out will consume all the sentinel cycle.

But, checking the implementation, it seems that a cycle that takes longer will not mix with another sentinel execution, which will happen 5secs after the previous one has finished,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, parallelizing the queries is a valid optimization. In practice, the
gains would only be significant in worst-case scenarios (e.g., both queries
timing out — 10s sequential vs 5s parallel). For typical fast responses, the
difference is negligible. I'll leave this as a future optimization if we
observe slow query times in production.


// Query 1: Not-ready resources
notReadyResources, err := s.client.FetchResources(ctx, rt, labelSelector,
"status.conditions.Ready='False'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have these queries as constants to give them some name and be more easily identifiable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Extracted both filters as package-level constants: notReadyFilter and staleReadyFilterFmt.


// mergeResources combines two resource slices, deduplicating by resource ID.
// Resources from the first slice take precedence when duplicates are found.
// Resources with empty IDs are treated as distinct and always preserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we have a resource with empty ID?
That is basically what gets sent in the messages, shouldn't we discard these as errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, resources with empty IDs shouldn't be processed. Added an early
filter in the trigger loop that skips resources with empty ID and logs a
warning. This also allowed simplifying mergeResources — no more special
handling for empty IDs.

// Fetch resources from HyperFleet API
resources, err := s.client.FetchResources(ctx, client.ResourceType(s.config.ResourceType), labelSelector)
// Fetch resources using condition-based server-side filtering:
// Query 1: Not-ready resources (need frequent reconciliation)
Copy link
Contributor

@rh-amarin rh-amarin Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a way to have a TSL query with OR conditions and have a single query?

But checking the API code, is not even possible now to search by status.conditions.Ready.last_updated_time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, TSL doesn't support OR conditions and last_updated_time filtering isn't
available in the API yet. If either becomes available in the future, we can
consolidate into a single query.

- Replace V(2) logging with Debug/Debugf for simplicity
- Extract filter query strings into named constants
- Skip resources with empty ID and log a warning
- Simplify mergeResources by removing empty ID special handling
@rafabene rafabene marked this pull request as draft March 6, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants