Skip to content

HYPERFLEET-536 - docs: update architecture docs for condition subfield queries#99

Open
rafabene wants to merge 6 commits intomainfrom
HYPERFLEET-536-docs
Open

HYPERFLEET-536 - docs: update architecture docs for condition subfield queries#99
rafabene wants to merge 6 commits intomainfrom
HYPERFLEET-536-docs

Conversation

@rafabene
Copy link
Contributor

@rafabene rafabene commented Mar 6, 2026

Summary

  • Update Sentinel component docs with selective querying strategy (two targeted API calls instead of fetching all resources)
  • Update Decision Engine section to reference condition-based status instead of deprecated status.phase
  • Add "Selective Querying (Implemented)" section to sentinel-pulses.md
  • Add "Query Ready/Not-Ready Clusters via API Search" pattern to status-guide.md

Files changed

  • hyperfleet/components/sentinel/sentinel.md — Resource Watcher and Decision Engine sections
  • hyperfleet/docs/sentinel-pulses.md — New selective querying section
  • hyperfleet/docs/status-guide.md — New query pattern with examples

Related

  • API implementation PR: will be linked after creation
  • JIRA: HYPERFLEET-536

Test plan

  • Verify documentation accuracy against implemented API behavior
  • Cross-reference doc examples with integration tests

Summary by CodeRabbit

  • Documentation

    • Reworked Sentinel docs to use condition-based Ready checks instead of phase checks; added selective-querying patterns for not-ready, stale-ready, and missing-ready resources, supported condition subfields (timestamps, observed_generation), examples, cutoff semantics, generation-first then freshness two-tier flow, invariants and fallback guidance.
  • New Features

    • Added CloudEvent publishing with templated message data, support for multiple publish backends, explicit retry/backoff and publish outcome logging.

…d queries

Update Sentinel, status-guide, and sentinel-pulses docs to reflect
the new selective querying capability using condition subfield searches.
@openshift-ci openshift-ci bot requested review from Mischulee and xueli181114 March 6, 2026 16:16
@openshift-ci
Copy link

openshift-ci bot commented Mar 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yasun1 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR shifts Sentinel from phase-based full scans to selective API queries and condition-driven evaluation. Resource Watcher now issues targeted searches for not-ready, stale-ready (Ready.last_updated_time < cutoff), and a full-scan fallback when Ready is absent. Decision Engine adds Evaluate(resource, now) and compares resource.generation to status.conditions.Ready.observed_generation before applying max-age checks using status.conditions.Ready.last_updated_time. Message Publisher adds PublishEvent(ctx, resource, reason), emits CloudEvents 1.0 with Go-template message_data, and documents Pub/Sub and RabbitMQ backends with retry/backoff. Docs add supported condition subfields and adapter expectations.

Sequence Diagram(s)

sequenceDiagram
    participant Watcher as Resource Watcher
    participant API as HyperFleet API
    participant Engine as Decision Engine
    participant Publisher as Message Publisher
    participant Broker as Pub/Sub or RabbitMQ

    Watcher->>API: GET /api/.../{resourceType}?search=status.conditions.Ready='False'
    alt stale-ready
        Watcher->>API: GET /api/.../{resourceType}?search=status.conditions.Ready='True' AND status.conditions.Ready.last_updated_time < <cutoff>
    end
    alt missing-ready
        Watcher->>API: GET /api/.../{resourceType}?search=NOT EXISTS(status.conditions.Ready)
    end
    API-->>Watcher: list resources (conditions, timestamps, observed_generation)
    loop per resource
        Watcher->>Engine: Evaluate(resource, now)
        Engine-->>Watcher: decision (publish/skip) + reason
        opt publish
            Watcher->>Publisher: PublishEvent(ctx, resource, reason) (render CloudEvent via template)
            Publisher->>Broker: Publish (with retry/backoff)
            Broker-->>Publisher: ack/nack
            Publisher-->>Watcher: publish result logged
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • rh-amarin
  • xueli181114
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'HYPERFLEET-536 - docs: update architecture docs for condition subfield queries' clearly and specifically summarizes the main change: updating documentation to reflect a condition subfield query strategy for the Sentinel architecture component.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch HYPERFLEET-536-docs

Comment @coderabbitai help to get the list of available commands and usage tips.

@rafabene
Copy link
Contributor Author

rafabene commented Mar 6, 2026

Related API implementation PR: openshift-hyperfleet/hyperfleet-api#71

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hyperfleet/components/sentinel/sentinel.md`:
- Line 644: Update the Sentinel docs to spell out the required API invariant for
generation-first correctness: when using selective querying (Resource Watcher)
the API must atomically update the resource's readiness flag (e.g., flip
Ready=False or equivalent) before the next Sentinel poll so the Decision
Engine's per-resource evaluation observes the new generation; alternatively
document and describe adding a periodic full-scan fallback to catch
generation-mismatch cases. Reference the "Resource Watcher" and "Decision
Engine" behavior and explicitly state the invariant (atomic readiness flip) and
the fallback option so readers know how to avoid delayed generation-mismatch
events.

In `@hyperfleet/docs/sentinel-pulses.md`:
- Around line 41-43: The docs only query "Not-ready resources" and "Stale ready
resources" and therefore miss resources that have no Ready condition yet; add a
third case to detect "Missing Ready condition" and show the corresponding API
query so those objects are reconciled. Update sentinel-pulses.md to include a
new bullet (alongside the existing "Not-ready resources" and "Stale ready
resources" bullets) that describes "Missing Ready condition" and provides the
API query that selects resources where status.conditions.Ready is absent (e.g.,
GET
/api/hyperfleet/v1/{resourceType}?search=<exists(status.conditions.Ready)=false>
or the equivalent NOT/exists syntax your API supports) so bootstrap/migration
edge cases are covered.

In `@hyperfleet/docs/status-guide.md`:
- Around line 2450-2452: The document mixes two condition-query syntaxes
(status.conditions[type=Ready].status and status.conditions.Ready) which can
confuse users; pick one and normalize all occurrences (including the blocks
around status.conditions.Ready and lines 2462-2469) to the chosen form so
examples are consistent; update every instance of
status.conditions[type=Ready].status to the selected syntax (or vice versa) and
ensure the Ready-check example and any adapter/condition examples all use the
same condition-query pattern (e.g., consistently use
status.conditions.Ready='True' or consistently use
status.conditions[type=Ready].status==='True').
- Around line 2476-2478: The three field descriptions currently describe
adapter-level semantics but are shown under cluster status.conditions.<Type>;
update the phrasing for `last_updated_time`, `last_transition_time`, and
`observed_generation` to describe condition/resource-level semantics instead of
adapter/adapter-generation. Concretely, change `last_updated_time` to something
like “When this condition was last observed on the resource (TIMESTAMPTZ)”,
`last_transition_time` to “When this condition’s status (True/False/Unknown)
last changed on the resource (TIMESTAMPTZ)”, and `observed_generation` to “The
resource generation that this condition reflects (INTEGER)”; make these edits in
the same section that references cluster `status.conditions.<Type>`.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 01ffb8a9-2b43-480d-a42c-07247044ba8d

📥 Commits

Reviewing files that changed from the base of the PR and between a046489 and 8c7eeae.

📒 Files selected for processing (3)
  • hyperfleet/components/sentinel/sentinel.md
  • hyperfleet/docs/sentinel-pulses.md
  • hyperfleet/docs/status-guide.md

…eady condition query, and normalize condition syntax

- sentinel.md: spell out API invariant (atomic Ready=False flip on spec update)
  and periodic full-scan fallback option for generation-mismatch correctness
- sentinel.md + sentinel-pulses.md: add third selective query for resources
  with missing Ready condition (bootstrap/migration edge case)
- status-guide.md: normalize condition syntax from status.conditions[type=Ready].status
  to status.conditions.Ready for consistency across all docs
- status-guide.md: fix condition subfield descriptions to use resource-level
  semantics instead of adapter-level
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 652-653: Update the Decision Engine note under "selective
querying" to mention the third bucket returned by Resource Watcher: include
resources with a missing Ready condition in addition to not-ready and
stale-ready resources; refer to the Decision Engine note text (the paragraph
starting "With selective querying...") and Resource Watcher/selective querying
terminology so the phrasing explicitly lists "not-ready, stale-ready, and
missing Ready condition" cases.
- Around line 648-649: The docs currently reference two different fields for
observed generation (resource.status.observed_generation vs
resource.status.conditions.Ready.observed_generation); pick a single canonical
source—prefer the condition-level field
resource.status.conditions.Ready.observed_generation for decision engine
logic—and update the earlier Decision Engine paragraph (the one that still
mentions resource.status.observed_generation) to use
resource.status.conditions.Ready.observed_generation with matching phrasing;
also scan adjacent text to ensure all generation-check examples and variable
names consistently reference resource.generation vs
resource.status.conditions.Ready.observed_generation throughout the Decision
Engine docs.

In `@hyperfleet/docs/sentinel-pulses.md`:
- Line 39: The intro sentence starting "To avoid fetching ALL resources every
poll cycle, the API supports condition-based search queries that allow Sentinel
to make two targeted API calls:" is out of sync with the section that defines
three calls—update that sentence to accurately reflect the number (e.g., change
"two" to "three" or use "multiple/targeted API calls") so the intro matches the
subsequent list of calls; edit the exact sentence text in the document (the line
beginning with "To avoid fetching ALL resources...") to remove the mismatch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 63fe9c5d-e829-4d61-9a2a-a2e3fd88a689

📥 Commits

Reviewing files that changed from the base of the PR and between 8c7eeae and a78b697.

📒 Files selected for processing (3)
  • hyperfleet/components/sentinel/sentinel.md
  • hyperfleet/docs/sentinel-pulses.md
  • hyperfleet/docs/status-guide.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • hyperfleet/docs/status-guide.md

rafabene added 2 commits March 6, 2026 13:31
…uery count, and Decision Engine note

- sentinel.md: normalize Decision Logic to use
  resource.status.conditions.Ready.observed_generation consistently
- sentinel.md: update Decision Engine note to list all three selective
  query buckets (not-ready, stale-ready, missing Ready condition)
- sentinel-pulses.md: change "two targeted API calls" to "three" to
  match the actual list
The third selective query (NOT EXISTS for missing Ready condition) is not
implemented in the API and resources without a Ready condition are already
covered by the periodic full-scan fallback. Align docs with implementation.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
hyperfleet/components/sentinel/sentinel.md (3)

957-963: ⚠️ Potential issue | 🔴 Critical

Add required Trade-offs section to document design compromises.

Component documents MUST include a Trade-offs section documenting design compromises and their impact. This section should cover:

  • What we gain (with measurable impact)
  • What we lose or what gets harder
  • Technical debt incurred
  • Why the trade-offs are acceptable
📊 Suggested placement and content structure

Add before the "Post-MVP Enhancements" section:

## Trade-offs

### What We Gain
- **Reduced API load**: Selective querying reduces API calls by ~95% at scale when most resources are Ready (only fetching not-ready + stale subset vs. all resources)
- **Responsive reconciliation**: Generation-based triggering ensures spec changes reconcile within one poll interval (5s) instead of waiting for max age (up to 30m)
- **Horizontal scalability**: Label-based filtering allows multiple Sentinel instances to distribute load (e.g., 10 regional Sentinels vs. 1 global)

### What We Lose / What Gets Harder
- **API complexity**: Requires API support for condition subfield queries (status.conditions.Ready.last_updated_time)
- **Correctness dependency**: Relies on API atomically setting Ready=False when spec changes (documented as invariant)
- **Coverage gaps possible**: Label-based filtering (not true sharding) can leave resources unwatched if selectors misconfigured
- **Fallback complexity**: Periodic full-scan fallback needed to catch edge cases, adding implementation complexity

### Technical Debt
- Periodic full-scan fallback implementation details not yet finalized (frequency, configuration)
- No coordination between multiple Sentinel instances (could lead to duplicate events on overlapping selectors)

### Why Acceptable
- API load reduction is critical for scaling to 10,000+ clusters
- Generation-based triggering delivers better UX (responsive to user changes)
- Coverage gaps are operator responsibility and can be validated via metrics

As per coding guidelines: "Component documents MUST include a 'Trade-offs' section documenting: what we gain (with measurable impact), what we lose or what gets harder, technical debt incurred, and why the trade-offs are acceptable."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel.md` around lines 957 - 963, Add a new
"Trade-offs" section immediately before the existing "Post-MVP Enhancements"
section in sentinel.md that documents: (1) What We Gain with measurable impacts
(e.g., API load reduction %, responsiveness improvements, horizontal scalability
examples), (2) What We Lose / What Gets Harder (e.g., API complexity,
correctness dependencies, coverage gaps, fallback complexity), (3) Technical
Debt (e.g., missing periodic full-scan/fallback details, no instance
coordination), and (4) Why Acceptable (short justification tying gains to
scale/UX and operator mitigations); ensure the heading is "## Trade-offs" and
include the four subheadings "### What We Gain", "### What We Lose / What Gets
Harder", "### Technical Debt", and "### Why Acceptable" with concise bullet
points as suggested.

957-963: ⚠️ Potential issue | 🔴 Critical

Add required Alternatives Considered section.

Component documents MUST include an Alternatives Considered section documenting alternative approaches with brief descriptions and specific reasons why each was rejected.

🔄 Suggested placement and content structure

Add before or after the Trade-offs section:

## Alternatives Considered

### 1. Watch-based API (Kubernetes-style watch streams)
**Description**: Use long-lived HTTP connections with watch streams that push resource changes to Sentinel.

**Why Rejected**: 
- Requires significant API infrastructure (watch cache, bookmark management, reconnection logic)
- Complexity outweighs benefits for MVP (polling sufficient at current scale)
- Can revisit post-MVP if polling becomes a bottleneck

### 2. Full-scan polling (no selective querying)
**Description**: Fetch all resources on every poll cycle, apply Decision Engine filters in-memory.

**Why Rejected**: 
- Does not scale beyond ~1,000 clusters (API load, network bandwidth, memory consumption)
- Selective querying reduces API load by ~95% at scale
- Full-scan retained only as periodic fallback for correctness

### 3. True sharding with coordination (e.g., consistent hashing)
**Description**: Use distributed coordination (etcd, Consul) to assign resources to specific Sentinel instances.

**Why Rejected**: 
- Adds significant operational complexity (coordination service, failure recovery)
- Label-based filtering provides sufficient horizontal scalability for current needs
- Operators can manually ensure coverage via configuration and metrics

As per coding guidelines: "Component documents MUST include an 'Alternatives Considered' section documenting alternative approaches with brief descriptions and specific reasons why each was rejected."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel.md` around lines 957 - 963, Add a new
"## Alternatives Considered" section (place it before or after the existing
"Trade-offs" section) that lists the three rejected approaches: "Watch-based API
(Kubernetes-style watch streams)" with a brief description and reasons rejected
(needs watch cache, bookmark handling, reconnection logic; too complex for MVP),
"Full-scan polling (no selective querying)" with description and reasons
rejected (doesn't scale beyond ~1,000 clusters, high API/network/memory cost;
keep as periodic fallback), and "True sharding with coordination (e.g.,
consistent hashing)" with description and reasons rejected (requires
etcd/Consul, adds operational complexity; label-based filtering is sufficient);
ensure each entry is short, uses the exact alternative titles above, and
explains why it was rejected.

1-1: ⚠️ Potential issue | 🔴 Critical

Add required document header with metadata fields.

All Markdown documents must start with a header containing Title, Status (Draft/Active/Deprecated), Owner, and Last Updated (YYYY-MM-DD) date. This component document is missing the required header block entirely.

📋 Proposed header format
+---
+Title: HyperFleet Sentinel
+Status: Active
+Owner: [Team/Owner Name]
+Last Updated: 2026-03-06
+---
+
 ## What & Why

As per coding guidelines: "All Markdown documents must start with a header containing Title, Status (Draft/Active/Deprecated), Owner, and Last Updated (YYYY-MM-DD) date."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel.md` at line 1, Add a required
document header block at the very top of the file (above the existing "## What &
Why" heading) that includes the metadata fields Title, Status (one of
Draft/Active/Deprecated), Owner, and Last Updated (YYYY-MM-DD); ensure the
Status value is valid, the Owner identifies the component owner, and Last
Updated uses the current date, then save the file so the header appears before
the "## What & Why" section.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hyperfleet/components/sentinel/sentinel.md`:
- Line 591: Update the "periodic full-scan fallback" sentence in the Decision
Engine documentation to include concrete configuration details: state the exact
config field and env var names (e.g., DecisionEngine.scanFrequency or ENV
FULL_SCAN_INTERVAL / FULL_SCAN_CYCLES), show an example value and units (e.g.,
"once every N poll cycles" -> "scanFrequency: 10 (poll cycles) or
FULL_SCAN_INTERVAL=10s"), indicate whether the fallback is optional or required
for spec compliance (e.g., "optional; recommended for newly created resources"),
and document the default behavior when not set (e.g., "defaults to disabled" or
"defaults to 100 poll cycles / 5m"). Also add a short example config snippet in
the Decision Engine section and cross-reference the "periodic full-scan
fallback" line so readers can find the parameter names and defaults.

---

Outside diff comments:
In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 957-963: Add a new "Trade-offs" section immediately before the
existing "Post-MVP Enhancements" section in sentinel.md that documents: (1) What
We Gain with measurable impacts (e.g., API load reduction %, responsiveness
improvements, horizontal scalability examples), (2) What We Lose / What Gets
Harder (e.g., API complexity, correctness dependencies, coverage gaps, fallback
complexity), (3) Technical Debt (e.g., missing periodic full-scan/fallback
details, no instance coordination), and (4) Why Acceptable (short justification
tying gains to scale/UX and operator mitigations); ensure the heading is "##
Trade-offs" and include the four subheadings "### What We Gain", "### What We
Lose / What Gets Harder", "### Technical Debt", and "### Why Acceptable" with
concise bullet points as suggested.
- Around line 957-963: Add a new "## Alternatives Considered" section (place it
before or after the existing "Trade-offs" section) that lists the three rejected
approaches: "Watch-based API (Kubernetes-style watch streams)" with a brief
description and reasons rejected (needs watch cache, bookmark handling,
reconnection logic; too complex for MVP), "Full-scan polling (no selective
querying)" with description and reasons rejected (doesn't scale beyond ~1,000
clusters, high API/network/memory cost; keep as periodic fallback), and "True
sharding with coordination (e.g., consistent hashing)" with description and
reasons rejected (requires etcd/Consul, adds operational complexity; label-based
filtering is sufficient); ensure each entry is short, uses the exact alternative
titles above, and explains why it was rejected.
- Line 1: Add a required document header block at the very top of the file
(above the existing "## What & Why" heading) that includes the metadata fields
Title, Status (one of Draft/Active/Deprecated), Owner, and Last Updated
(YYYY-MM-DD); ensure the Status value is valid, the Owner identifies the
component owner, and Last Updated uses the current date, then save the file so
the header appears before the "## What & Why" section.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2928d983-1a8f-4cc4-9232-a0b8abac58af

📥 Commits

Reviewing files that changed from the base of the PR and between 76b49f3 and 0d57ce3.

📒 Files selected for processing (2)
  • hyperfleet/components/sentinel/sentinel.md
  • hyperfleet/docs/sentinel-pulses.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • hyperfleet/docs/sentinel-pulses.md

Change wording from "are covered by the periodic full-scan fallback"
to "are not returned by either selective query" with a recommendation
to add a full-scan fallback. This aligns with the Decision Engine note
which correctly states implementations "may also add" a fallback.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
hyperfleet/components/sentinel/sentinel.md (2)

622-643: ⚠️ Potential issue | 🟠 Major

The file still mixes legacy status fields with the new condition-based contract.

This section makes resource.status.conditions.Ready.observed_generation and status.conditions.Ready.last_updated_time canonical, but the same document still describes resource.status.observed_generation, status.last_updated_time, and status.phase elsewhere. That leaves two incompatible implementations in one spec. Please align the remaining examples and test scenarios in this file to the condition-based model before merging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel.md` around lines 622 - 643, The
document mixes legacy status fields with the new condition-based contract;
update all remaining examples, test scenarios and references to use the
condition-based fields: replace uses of resource.status.observed_generation,
status.last_updated_time, and status.phase with the condition model (use
resource.status.conditions.Ready.observed_generation and
status.conditions.Ready.last_updated_time) and ensure all logic, examples, and
tests follow the priority: check resource.generation vs
resource.status.conditions.Ready.observed_generation first, then compute max age
from status.conditions.Ready.last_updated_time (and the Ready condition values
max_age_ready/max_age_not_ready); make sure no example or test still asserts or
documents the old fields.

627-650: ⚠️ Potential issue | 🟠 Major

Define the missing-Ready path end-to-end.

Line 591 allows a periodic full-scan fallback for resources that do not have a Ready condition yet, but this section still only defines Evaluate() behavior for Ready=True and Ready=False, and the note only mentions not-ready and stale-ready resources. If that fallback is enabled, implementers still do not know whether a missing Ready should publish immediately, be treated as not-ready, or default observed_generation to 0. Please document that path explicitly here.

Suggested doc adjustment
 2. **Check resource status conditions** (fallback to max age intervals):
    - Select appropriate max age interval based on the `Ready` condition:
+     - If `status.conditions.Ready` is missing → treat as initial/unobserved state and publish immediately
      - If `status.conditions.Ready='True'` → use `max_age_ready` (30 minutes)
      - If `status.conditions.Ready='False'` → use `max_age_not_ready` (10 seconds)
@@
-> **Note**: With selective querying (see Resource Watcher above), the Decision Engine only receives resources that already need attention — not-ready resources and stale ready resources. This shifts the primary filtering responsibility to the API query layer, making the Decision Engine's per-resource evaluation more efficient.
+> **Note**: With selective querying (see Resource Watcher above), the Decision Engine only receives resources that already need attention — not-ready resources, stale ready resources, and resources missing the `Ready` condition when a periodic full-scan fallback is enabled. This shifts the primary filtering responsibility to the API query layer, making the Decision Engine's per-resource evaluation more efficient.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel.md` around lines 627 - 650, Clarify
the missing-Ready path in Evaluate(): explicitly state that if
resource.status.conditions.Ready is absent the Decision Engine should treat the
resource as NotReady (i.e., behave as Ready='False'), set treated
observed_generation to 0 for generation-comparison using resource.generation vs
treated observed_generation, and publish a reconciliation event immediately (or
on the next poll) so the resource is handled by adapters; also require a clear
log message from Evaluate() that identifies "missing Ready → treated as
NotReady, observed_generation=0" and include guidance that the periodic
full-scan fallback must still include missing-Ready resources to ensure no
resources are skipped.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hyperfleet/components/sentinel/sentinel.md`:
- Line 593: The sentence "This approach reduces API load significantly at
scale..." is too vague; update the sentinel.md paragraph to include a concrete,
quantified example and a brief calculation comparing the two-query strategy to a
full-list strategy (e.g., show API calls per poll: one cheap List by label to
retrieve Ready=True for ~95% of N resources plus targeted Gets/List for the
remaining M stale resources, vs. a full reconciliation that queries N resources
every poll), or include measured numbers observed in tests (e.g., N=10,000,
M=500 → ~1 List + 500 targeted calls vs. 10,000 calls, a ~95% reduction).
Reference the existing sentence and adjacent text so reviewers can replace the
vague claim with the example + simple percentage reduction and, if available, a
short note on measurement method (poll interval and environment).

---

Duplicate comments:
In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 622-643: The document mixes legacy status fields with the new
condition-based contract; update all remaining examples, test scenarios and
references to use the condition-based fields: replace uses of
resource.status.observed_generation, status.last_updated_time, and status.phase
with the condition model (use
resource.status.conditions.Ready.observed_generation and
status.conditions.Ready.last_updated_time) and ensure all logic, examples, and
tests follow the priority: check resource.generation vs
resource.status.conditions.Ready.observed_generation first, then compute max age
from status.conditions.Ready.last_updated_time (and the Ready condition values
max_age_ready/max_age_not_ready); make sure no example or test still asserts or
documents the old fields.
- Around line 627-650: Clarify the missing-Ready path in Evaluate(): explicitly
state that if resource.status.conditions.Ready is absent the Decision Engine
should treat the resource as NotReady (i.e., behave as Ready='False'), set
treated observed_generation to 0 for generation-comparison using
resource.generation vs treated observed_generation, and publish a reconciliation
event immediately (or on the next poll) so the resource is handled by adapters;
also require a clear log message from Evaluate() that identifies "missing Ready
→ treated as NotReady, observed_generation=0" and include guidance that the
periodic full-scan fallback must still include missing-Ready resources to ensure
no resources are skipped.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d95d0daf-5a7d-4898-9b42-3b18db9a4a62

📥 Commits

Reviewing files that changed from the base of the PR and between 0d57ce3 and 11f72fd.

📒 Files selected for processing (2)
  • hyperfleet/components/sentinel/sentinel.md
  • hyperfleet/docs/sentinel-pulses.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • hyperfleet/docs/sentinel-pulses.md

Replace vague "reduces API load significantly" with a quantified
example: N=10,000 resources, 50 not-ready + 200 stale = ~250 rows
returned vs 10,000 (~97% reduction). Clarifies that the number of
API calls per resource type is constant at 2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant