Skip to content

HYPERFLEET-634 - doc: sentinel operator guide#63

Open
86254860 wants to merge 2 commits intoopenshift-hyperfleet:mainfrom
86254860:HYPERFLEET-634
Open

HYPERFLEET-634 - doc: sentinel operator guide#63
86254860 wants to merge 2 commits intoopenshift-hyperfleet:mainfrom
86254860:HYPERFLEET-634

Conversation

@86254860
Copy link

@86254860 86254860 commented Mar 6, 2026

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive Sentinel Operator Guide detailing core concepts (generation- and time-based reconciliation, filtering/sharding), configuration reference (selectors, CEL-based message construction, broker settings), CloudEvents payload structure, broker integrations (RabbitMQ, Google Pub/Sub), deployment and operational workflows (Helm, multi-instance), adapter contract considerations, topic naming strategies, architecture diagrams, example snippets, checklists, and troubleshooting.

@openshift-ci openshift-ci bot requested review from Mischulee and xueli181114 March 6, 2026 08:59
@openshift-ci
Copy link

openshift-ci bot commented Mar 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rh-amarin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

A new comprehensive Sentinel Operator guide was added at docs/sentinel-operator-guide.md. The document defines Sentinel as a polling-based event publisher for cluster lifecycle orchestration, describes the Decision Engine (generation- and time-based reconciliation), resource filtering and sharding, configuration schema (including CEL-based message_data and adapter contracts), broker integrations (RabbitMQ, Google Pub/Sub), CloudEvents payload structure, Helm deployment workflows for single- and multi-instance setups, and operational troubleshooting and checklists.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly references the main change: adding a comprehensive sentinel operator guide documentation file, which matches the +789 lines added in docs/sentinel-operator-guide.md.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/sentinel-operator-guide.md`:
- Around line 150-152: Several fenced code blocks are missing language
identifiers (causing lint/render issues); update the three blocks containing the
lines "next_event = reference_time + max_age_interval", "Error: invalid config:
message_data.id: empty CEL expression is not allowed", and
"{namespace}-{resourceType}" to include a language tag (e.g., ```text) on the
opening fence so each block becomes ```text ... ``` to satisfy linting and
improve rendering.
- Around line 618-620: Update the readiness documentation to explicitly state
that the /readyz endpoint remains false until the first successful poll
completes (i.e., s.LastSuccessfulPoll() is non-zero) and broker health checks
pass, so pods intentionally stay unready during initial startup; reference
HYPERFLEET-552 as the change that introduced this behavior and add guidance to
tune probe timing if startup latency causes false-positive failures, and mirror
this note where readiness is mentioned alongside the health check lines.
- Around line 7-22: The TOC and in-document links are pointing to invalid
fragment IDs (e.g., "#broker-setup")—update either the heading IDs or the links
so they match the actual generated anchor names: use the exact slug form the
renderer creates (lowercase, spaces → hyphens, remove/normalize punctuation like
parentheses). Specifically, fix TOC entries for "Decision Engine", "Time-Based
Reconciliation (Max Age Intervals)", "Resource Filtering and Sharding", "Broker
Configuration"/"Broker Setup" and all refs to the broker section so the hrefs
match the headings' slugs (or add explicit anchor tags next to headings such as
<!-- anchor: broker-setup --> if your renderer supports it); verify and update
the broken references mentioned (the ones pointing to broker-setup and the other
listed anchors) so clicking the TOC navigates to the correct sections.
- Line 535: The heading "Phase 1: Configuration Planning" is using #### which
jumps the hierarchy from ## to ####; change that heading marker to ### so it
follows the preceding `##` level and keeps the document structure valid—locate
the "Phase 1: Configuration Planning" heading and replace the four-sharp prefix
with three-sharps.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1b37f46e-bab2-4792-af3e-01da0c9a288d

📥 Commits

Reviewing files that changed from the base of the PR and between b2e5b6f and 3e3f253.

📒 Files selected for processing (1)
  • docs/sentinel-operator-guide.md

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
docs/sentinel-operator-guide.md (4)

536-536: ⚠️ Potential issue | 🟡 Minor

Fix heading level increment at Line 536.

This jumps from ## to ####; use ### to keep hierarchy valid.

🛠️ Proposed heading fix
-#### Phase 1: Configuration Planning
+### Phase 1: Configuration Planning
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` at line 536, The heading "#### Phase 1:
Configuration Planning" uses too many hashes and breaks the document hierarchy;
change that heading to "### Phase 1: Configuration Planning" so it correctly
follows the previous "##" level—update the line containing the exact heading
text to use three hashes instead of four.

620-620: ⚠️ Potential issue | 🟡 Minor

Document current /readyz startup behavior to avoid false alarms.

Line 620 and Line 677 still imply generic readiness failures, but /readyz is expected to stay unready until the first successful poll completes and broker checks pass. Add this note in both checklist and troubleshooting text.

🛠️ Proposed doc update
 - [ ] Check readiness endpoint: `curl http://<sentinel-service>:8080/readyz`
+  - Note: `/readyz` remains unready until Sentinel completes at least one successful poll cycle and broker health checks pass.
-| **Health/readiness endpoints return errors**                                                                                 | Configuration validation failed | Check pod logs for startup errors: `kubectl logs -n hyperfleet-system -l app.kubernetes.io/name=sentinel`. Verify all required config fields. |
+| **Health/readiness endpoints return errors**                                                                                 | Startup not complete or configuration issue | `/readyz` is expected to be unready until the first successful poll and broker health checks pass. If it persists, check pod logs: `kubectl logs -n hyperfleet-system -l app.kubernetes.io/name=sentinel` and verify required config fields. |
Based on learnings: In HYPERFLEET-552, the sentinel service readiness semantics were changed: /readyz now requires one successful poll completion (s.LastSuccessfulPoll() non-zero) plus broker health checks.

Also applies to: 677-677

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` at line 620, Update the checklist entry and
the troubleshooting section to document that the /readyz endpoint intentionally
remains unready until the service has completed at least one successful poll and
broker health checks pass; explicitly state the readiness criteria as
"s.LastSuccessfulPoll() must be non-zero AND broker checks healthy" and mention
that this change came from HYPERFLEET-552 so readers don't treat an unready
/readyz as a generic failure until those conditions are met.

151-153: ⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks.

The fences at Line 151, Line 409, and Line 474 are still missing language tags.

🛠️ Proposed fence-language fix
-   ```
+   ```text
    next_event = reference_time + max_age_interval
    ```

-  ```
+  ```text
   Error: invalid config: message_data.id: empty CEL expression is not allowed

- +text
{namespace}-{resourceType}

Also applies to: 409-411, 474-476

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` around lines 151 - 153, Several fenced code
blocks in the document are missing language identifiers; update the three blocks
containing the snippets "next_event = reference_time + max_age_interval",
"Error: invalid config: message_data.id: empty CEL expression is not allowed",
and "{namespace}-{resourceType}" to include a language tag (e.g., ```text)
immediately after the opening backticks so the fences read ```text ... ``` for
each occurrence.

7-22: ⚠️ Potential issue | 🟡 Minor

Fix invalid in-document anchor fragments.

Several links still point to non-existent fragments (for example Line 7–Line 22 TOC entries and Line 524 #broker-setup). Update links to the renderer-generated slugs so navigation works.

🛠️ Proposed anchor fix
-1. [Introduction](`#introduction`)
+1. [Introduction](`#1-introduction`)
-   - [What is Sentinel?](`#what-is-sentinel`)
+   - [What is Sentinel?](`#11-what-is-sentinel`)
-   - [When to Use Sentinel](`#when-to-use-sentinel`)
+   - [When to Use Sentinel](`#12-when-to-use-sentinel`)
...
-2. [Core Concepts](`#core-concepts`)
+2. [Core Concepts](`#2-core-concepts`)
...
-3. [Configuration Reference](`#configuration-reference`)
+3. [Configuration Reference](`#3-configuration-reference`)
...
-4. [Deployment Checklist](`#deployment-checklist`)
+4. [Deployment Checklist](`#4-deployment-checklist`)
-When deploying to GKE, use **Workload Identity Federation** for authentication instead of service account keys. See [Deployment: Broker Setup](`#broker-setup`) for details.
+When deploying to GKE, use **Workload Identity Federation** for authentication instead of service account keys. See [Configure Authentication and Permissions](`#configure-authentication-and-permissions`) for details.

Also applies to: 524-524, 544-544, 552-552, 557-557, 563-563

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` around lines 7 - 22, The TOC contains
invalid in-document anchor fragments (e.g., "Decision Engine", "Message Data
(CEL Expressions)", "Broker Configuration", "Broker Setup") that don't match the
renderer-generated slugs; update each TOC link to the exact slugified heading
used by the renderer (match casing, replace spaces/special chars with hyphens,
remove parentheses, etc.) so links like "Decision Engine", "Generation-Based
Reconciliation", "Time-Based Reconciliation (Max Age Intervals)", "Resource
Selector", "Message Data (CEL Expressions)" and "Broker Configuration"/"Broker
Setup" point to the correct generated anchors; verify and correct the other
reported anchors (those around Broker setup and nearby headings) to ensure
navigation works.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/sentinel-operator-guide.md`:
- Around line 630-633: The fenced code block containing the expected log output
(the triple-backtick block that starts before "Fetched resources count=15...")
is missing a language tag; update that block to include a language identifier
(e.g., add "text" after the opening ```) so the fence reads ```text to ensure
proper rendering of the log output.

---

Duplicate comments:
In `@docs/sentinel-operator-guide.md`:
- Line 536: The heading "#### Phase 1: Configuration Planning" uses too many
hashes and breaks the document hierarchy; change that heading to "### Phase 1:
Configuration Planning" so it correctly follows the previous "##" level—update
the line containing the exact heading text to use three hashes instead of four.
- Line 620: Update the checklist entry and the troubleshooting section to
document that the /readyz endpoint intentionally remains unready until the
service has completed at least one successful poll and broker health checks
pass; explicitly state the readiness criteria as "s.LastSuccessfulPoll() must be
non-zero AND broker checks healthy" and mention that this change came from
HYPERFLEET-552 so readers don't treat an unready /readyz as a generic failure
until those conditions are met.
- Around line 151-153: Several fenced code blocks in the document are missing
language identifiers; update the three blocks containing the snippets
"next_event = reference_time + max_age_interval", "Error: invalid config:
message_data.id: empty CEL expression is not allowed", and
"{namespace}-{resourceType}" to include a language tag (e.g., ```text)
immediately after the opening backticks so the fences read ```text ... ``` for
each occurrence.
- Around line 7-22: The TOC contains invalid in-document anchor fragments (e.g.,
"Decision Engine", "Message Data (CEL Expressions)", "Broker Configuration",
"Broker Setup") that don't match the renderer-generated slugs; update each TOC
link to the exact slugified heading used by the renderer (match casing, replace
spaces/special chars with hyphens, remove parentheses, etc.) so links like
"Decision Engine", "Generation-Based Reconciliation", "Time-Based Reconciliation
(Max Age Intervals)", "Resource Selector", "Message Data (CEL Expressions)" and
"Broker Configuration"/"Broker Setup" point to the correct generated anchors;
verify and correct the other reported anchors (those around Broker setup and
nearby headings) to ensure navigation works.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9fd2d867-4cdf-427a-8b37-514713b3c5b7

📥 Commits

Reviewing files that changed from the base of the PR and between 3e3f253 and 928afe7.

📒 Files selected for processing (1)
  • docs/sentinel-operator-guide.md

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
docs/sentinel-operator-guide.md (1)

7-22: ⚠️ Potential issue | 🟡 Minor

Fix invalid internal anchors in TOC and section cross-references.

Several links still point to fragments that don’t exist in the rendered document (for example, Line 524: #broker-setup, plus multiple TOC/Reference links). This breaks in-doc navigation and keeps MD051 warnings open.

Suggested anchor/link fix
-1. [Introduction](`#introduction`)
-   - [What is Sentinel?](`#what-is-sentinel`)
-   - [When to Use Sentinel](`#when-to-use-sentinel`)
-2. [Core Concepts](`#core-concepts`)
-   - [Decision Engine](`#decision-engine`)
-      - [Generation-Based Reconciliation](`#generation-based-reconciliation`)
-      - [Time-Based Reconciliation (Max Age Intervals)](`#time-based-reconciliation-max-age-intervals`)
-   - [Resource Filtering and Sharding](`#resource-filtering-and-sharding`)
-3. [Configuration Reference](`#configuration-reference`)
-   - [Configuration File Structure](`#configuration-file-structure`)
-   - [Required Fields](`#required-fields`)
-   - [Optional Fields](`#optional-fields`)
-   - [Resource Selector](`#resource-selector`)
-   - [Message Data (CEL Expressions)](`#message-data-cel-expressions`)
-   - [Broker Configuration](`#broker-configuration`)
-4. [Deployment Checklist](`#deployment-checklist`)
+1. [Introduction](`#1-introduction`)
+   - [What is Sentinel?](`#11-what-is-sentinel`)
+   - [When to Use Sentinel](`#12-when-to-use-sentinel`)
+2. [Core Concepts](`#2-core-concepts`)
+   - [Decision Engine](`#21-decision-engine`)
+      - [Generation-Based Reconciliation](`#211-generation-based-reconciliation`)
+      - [Time-Based Reconciliation (Max Age Intervals)](`#212-time-based-reconciliation-max-age-intervals`)
+   - [Resource Filtering and Sharding](`#22-resource-filtering-and-sharding`)
+3. [Configuration Reference](`#3-configuration-reference`)
+   - [Configuration File Structure](`#31-configuration-file-structure`)
+   - [Required Fields](`#32-required-fields`)
+   - [Optional Fields](`#33-optional-fields`)
+   - [Resource Selector](`#34-resource-selector`)
+   - [Message Data (CEL Expressions)](`#35-message-data-cel-expressions`)
+   - [Broker Configuration](`#36-broker-configuration`)
+4. [Deployment Checklist](`#4-deployment-checklist`)
-When deploying to GKE, use **Workload Identity Federation** for authentication instead of service account keys. See [Deployment: Broker Setup](`#broker-setup`) for details.
+When deploying to GKE, use **Workload Identity Federation** for authentication instead of service account keys. See [Configure Authentication and Permissions](`#configure-authentication-and-permissions`) for details.

Also applies to: 524-524, 544-544, 552-552, 557-557, 563-563

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` around lines 7 - 22, The TOC and
cross-reference anchors are invalid (e.g., references to `#broker-setup` and other
non-existent fragments) which breaks navigation and triggers MD051; locate the
section headings and either update the links in the TOC (entries like "Broker
Configuration", "Message Data (CEL Expressions)", "Resource Selector",
"Deployment Checklist", and "Configuration File Structure") to match the exact
rendered anchors (lowercase, hyphenated) or rename/add the corresponding
headings to match the anchors (for example create or rename a heading to "Broker
Setup" if links use `#broker-setup`), then run the renderer/markdown linter to
confirm anchors resolve and remove duplicate or stale fragment links.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/sentinel-operator-guide.md`:
- Around line 7-22: The TOC and cross-reference anchors are invalid (e.g.,
references to `#broker-setup` and other non-existent fragments) which breaks
navigation and triggers MD051; locate the section headings and either update the
links in the TOC (entries like "Broker Configuration", "Message Data (CEL
Expressions)", "Resource Selector", "Deployment Checklist", and "Configuration
File Structure") to match the exact rendered anchors (lowercase, hyphenated) or
rename/add the corresponding headings to match the anchors (for example create
or rename a heading to "Broker Setup" if links use `#broker-setup`), then run the
renderer/markdown linter to confirm anchors resolve and remove duplicate or
stale fragment links.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: df4d77ff-40d3-4718-abb2-b373a27ae635

📥 Commits

Reviewing files that changed from the base of the PR and between 928afe7 and 8aae11f.

📒 Files selected for processing (1)
  • docs/sentinel-operator-guide.md

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
docs/sentinel-operator-guide.md (4)

7-22: ⚠️ Potential issue | 🟡 Minor

TOC anchor links remain broken despite previous fix attempt.

The TOC links still don't match the auto-generated heading anchors. For example, #introduction should be #1-introduction, #what-is-sentinel should be #11-what-is-sentinel, etc.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` around lines 7 - 22, The TOC links use
unnumbered anchors but your renderer generates numbered heading IDs (e.g.,
headings like "Introduction" and "What is Sentinel" produce anchors
"1-introduction" and "11-what-is-sentinel"); update each TOC entry to use the
numbered anchors that match the generated IDs (e.g., change "#introduction" →
"#1-introduction", "#what-is-sentinel" → "#11-what-is-sentinel", etc.), or
regenerate the TOC using the same tool/command that produces the document
headings to ensure the TOC and header IDs remain in sync (verify for entries
such as "Core Concepts", "Decision Engine", "Generation-Based Reconciliation"
and "Time-Based Reconciliation (Max Age Intervals)").

524-524: ⚠️ Potential issue | 🟡 Minor

Fix broken anchor link.

The link #broker-setup doesn't match any heading in the document. Based on context about Workload Identity Federation authentication, this should likely point to the "Configure Authentication and Permissions" section.

🔗 Proposed fix
-When deploying to GKE, use **Workload Identity Federation** for authentication instead of service account keys. See [Deployment: Broker Setup](`#broker-setup`) for details.
+When deploying to GKE, use **Workload Identity Federation** for authentication instead of service account keys. See [Configure Authentication and Permissions](`#configure-authentication-and-permissions`) for details.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` at line 524, The anchor link "#broker-setup"
is broken; update the link in the sentence referencing Workload Identity
Federation to point to the correct heading id for the "Configure Authentication
and Permissions" section (e.g., replace "#broker-setup" with
"#configure-authentication-and-permissions" or the actual slug used by the
document generator) so the link navigates to the Configure Authentication and
Permissions section.

634-634: ⚠️ Potential issue | 🟡 Minor

Update log duration format to match actual code output.

The example shows duration=125ms but the code formats it as duration=0.125s using %.3fs.

📝 Proposed fix
-  Trigger cycle completed total=15 published=3 skipped=12 duration=125ms topic=hyperfleet-dev-clusters subset=clusters
+  Trigger cycle completed total=15 published=3 skipped=12 duration=0.125s topic=hyperfleet-dev-clusters subset=clusters
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` at line 634, The example log line in the
docs shows duration=125ms but the runtime uses printf formatting with '%.3fs'
(seconds to 3 decimals); update the example log entry (the line beginning
"Trigger cycle completed total=15 published=3 skipped=12 duration=125ms
topic=hyperfleet-dev-clusters subset=clusters") to use the seconds format
produced by '%.3fs' (e.g., duration=0.125s) so the docs match the actual output.

420-420: ⚠️ Potential issue | 🟡 Minor

Fix CloudEvents type field to match actual implementation.

The example shows "type": "hyperfleet.clusters.reconcile" but the code generates com.redhat.hyperfleet.Cluster.reconcile (with com.redhat. prefix and singular capitalized Cluster from resource.Kind).

📝 Proposed fix
-  "type": "hyperfleet.clusters.reconcile",
+  "type": "com.redhat.hyperfleet.Cluster.reconcile",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` at line 420, The CloudEvents "type" example
is incorrect; update the example value to match the implementation which
constructs the type as "com.redhat.hyperfleet.<ResourceKind>.reconcile" (note
singular, capitalized ResourceKind from resource.Kind and the com.redhat.
prefix). Locate the sample line with `"type": "hyperfleet.clusters.reconcile"`
and replace it with the form produced by the code (e.g., `"type":
"com.redhat.hyperfleet.Cluster.reconcile"`) so it matches the actual behavior.
🧹 Nitpick comments (9)
docs/sentinel-operator-guide.md (9)

472-476: Clarify that topic naming convention is a default, not a requirement.

The documentation presents {namespace}-{resourceType} as the topic naming pattern, but this is the Helm chart default convention. Different deployments (GCP, ROSA, etc.) may use their own naming strategies. Consider clarifying this is a recommended default.

📝 Suggested clarification
-The `BROKER_TOPIC` environment variable sets the full topic name where events are published. When using Helm, the default topic follows the pattern:
+The `BROKER_TOPIC` environment variable sets the full topic name where events are published. When using the provided Helm chart, the default topic follows this pattern:
 
 ```text
 {namespace}-{resourceType}
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/sentinel-operator-guide.md around lines 472 - 476, The docs currently
present the BROKER_TOPIC pattern "{namespace}-{resourceType}" as the topic name;
update the text to clarify this is the Helm chart default (a recommended
convention) and not a hard requirement so other deployments (e.g., GCP, ROSA)
can use different naming schemes—refer to the BROKER_TOPIC env var and
explicitly state "Helm default: {namespace}-{resourceType}" and mention that
custom topic naming is supported and may vary by deployment.


</details>

---

`41-41`: **Remove specific broker implementations from core responsibilities list to avoid staleness.**

Hardcoding "RabbitMQ or Google Pub/Sub" in the introduction will require updates whenever broker support changes. Consider using generic "message broker" here and let the configuration section enumerate current implementations.



<details>
<summary>📝 Proposed change</summary>

```diff
-3. **Publish CloudEvents** to a message broker (RabbitMQ or Google Pub/Sub)
+3. **Publish CloudEvents** to a message broker
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` at line 41, Replace the hardcoded broker
examples in the intro line "Publish CloudEvents to a message broker (RabbitMQ or
Google Pub/Sub)" with a generic phrase such as "Publish CloudEvents to a message
broker" and move specific implementations into the configuration/implementation
section; update the sentence containing "Publish CloudEvents" so it no longer
names RabbitMQ or Google Pub/Sub and ensure the docs elsewhere (configuration
section) list the currently supported brokers.

214-229: Clarify that multiple adapters can consume from the same topic.

The architecture diagram shows a 1:1 relationship between topics and adapters, which might imply that only one adapter can consume from each topic. In reality, multiple adapters can subscribe to the same topic in a fan-out pattern. Consider adding a note to clarify this.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` around lines 214 - 229, The diagram
currently implies a 1:1 mapping between Broker Topics (B1, B2, B3) and Adapters
(A1, A2, A3); add a short clarifying note right below the Mermaid block stating
that multiple adapters can subscribe to the same Broker Topic (fan-out) so
A1/A2/A3 are examples and not exclusive consumers, and optionally show one
example edge or parenthetical (e.g., "multiple adapters may consume from B1") to
make the fan-out behavior explicit.

661-753: Consider linking to detailed docs instead of duplicating reference material in appendices.

The appendices duplicate content that exists in dedicated documentation (metrics.md, configuration.md). While having quick-reference tables is convenient for operators, this creates maintenance burden and risks drift between the operator guide and detailed docs. Consider replacing appendices with links to the authoritative sources.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` around lines 661 - 753, Remove the
duplicated detailed reference tables in Appendix A and B and replace them with
concise “Quick links” and a short one-line summary pointing to the authoritative
docs (e.g., configuration.md and metrics.md); specifically, delete the large
Configuration Fields Reference and Metrics Catalog Reference tables and instead
add links labeled (Configuration reference → configuration.md) and (Metrics
catalog → metrics.md) plus a brief note that quick examples remain here for
convenience; locate these changes around the headings "Appendix A:
Troubleshooting" / "Appendix B: Quick Reference" and update any in-text
references (like PromQL examples) to point readers to the external docs for full
details.

515-519: Clarify development vs production authentication methods.

The section shows two authentication options but doesn't clearly indicate that "Option 1: Use Application Default Credentials" with gcloud auth application-default login is for local development only. In production, ADC should use Workload Identity (already mentioned at line 522-524) or service account keys.

📝 Suggested clarification
 # Environment variables
 export BROKER_TOPIC="hyperfleet-prod-clusters"
 
-# Option 1: Use Application Default Credentials (recommended for GKE)
+# Option 1: Use Application Default Credentials (local development)
 gcloud auth application-default login
 
-# Option 2: Use service account key file
+# Option 2: Use service account key file (not recommended for production)
 export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` around lines 515 - 519, Update the "Option
1: Use Application Default Credentials (recommended for GKE)" section to clarify
that the gcloud auth application-default login command is intended for local
development only (not for production), and explicitly state that in production
ADC should rely on Workload Identity or a service account key as already
mentioned; modify the heading/text for "Option 1" to mark it as "local
development" and add one sentence pointing readers to the existing Workload
Identity guidance or the "Option 2" service account key option for production
usage.

180-180: Consider more precise terminology than "sharding".

The term "sharding" typically implies automatic distribution with guaranteed coverage, but this system requires manual label-based filtering where operators must ensure all resources are covered (as noted in lines 233-236). Consider using "Resource Filtering and Partitioning" or "Label-Based Resource Distribution" for more accurate terminology.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` at line 180, Update the heading "### 2.2
Resource Filtering and Sharding" and all occurrences of "sharding" in this
section to a more accurate term like "Resource Filtering and Partitioning" or
"Label-Based Resource Distribution"; also update the explanatory text that
references manual label-based filtering (around the noted lines referring to
operator responsibility) to reflect that distribution is manual/label-based
rather than automatic sharding, ensuring wording is consistent across the
section and any examples or labels shown.

56-56: Clarify what "independently" means for operator understanding.

Consider expanding to explain that Sentinel is unaware of downstream adapters and publishes events in a fan-out pattern where multiple adapters can consume from the broker topic.

📝 Suggested expansion
-Sentinel operates independently of adapters and uses a **dual-trigger reconciliation strategy**:
+Sentinel operates independently of adapters (it is unaware of which adapters consume its events) and publishes to message broker topics in a fan-out pattern. It uses a **dual-trigger reconciliation strategy**:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` at line 56, Update the sentence "Sentinel
operates independently of adapters and uses a **dual-trigger reconciliation
strategy**:" to clarify that "independently" means Sentinel is adapter-agnostic
(it does not track or call specific downstream adapters), and explicitly
describe that it publishes events to a broker topic in a fan-out pattern so
multiple adapters can independently consume the same events; mention
implications such as no built-in adapter-specific retry/ack semantics and that
adapters must handle their own delivery guarantees. Reference the existing
phrase "Sentinel operates independently of adapters", "dual-trigger
reconciliation strategy", and "broker topic" when adding this explanatory
expansion.

33-35: Add HyperFleet system context for better operator understanding.

The introduction jumps directly to technical details without explaining Sentinel's role in the broader HyperFleet system. Consider adding context about why Sentinel exists and its purpose in cluster lifecycle orchestration.

📝 Suggested addition
 ### 1.1 What is Sentinel?
 
+Sentinel is a component within the HyperFleet cluster lifecycle management system that triggers reconciliation events when managed resources change or require periodic validation.
+
 HyperFleet Sentinel is a **polling-based event publisher** that monitors HyperFleet API resources and publishes CloudEvents to a message broker when reconciliation is needed. It acts as the trigger mechanism for the HyperFleet orchestration platform.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` around lines 33 - 35, The intro section
titled "1.1 What is Sentinel?" is too technical and lacks system-level context;
update that paragraph to briefly state Sentinel's role within the broader
HyperFleet system (e.g., it acts as the event trigger for cluster lifecycle
orchestration), why it exists (to detect resource drift and initiate
reconciliations), and how it interacts with other components (HyperFleet API,
orchestrator, message broker) so operators understand its purpose and place in
workflows; modify the existing text under the "What is Sentinel?" heading to
include these sentences while keeping the existing line about polling-based
event publishing and CloudEvents.

134-136: Clarify status.ready field vs conditions.

The documentation uses status.ready throughout (e.g., in the table and CEL expressions), but reviewer feedback indicates the API only returns conditions arrays, not a direct status.ready field.

Consider either:

  1. Clarifying that status.ready is a computed/convenience field derived from conditions[type=Ready].value == "True"
  2. Using the actual API structure: resource.status.conditions.filter(c, c.type=="Ready")[0].value == "True"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` around lines 134 - 136, The docs reference a
non-existent direct field status.ready; update the text and CEL examples to
either (a) explain that status.ready is a computed/convenience property derived
from the Ready condition (i.e., the document should state that status.ready ==
true when conditions[type=="Ready"].value == "True"), or (b) replace all
occurrences of status.ready in prose and CEL examples with the actual API
expression using resource.status.conditions to find the Ready condition (e.g.,
filter for c.type=="Ready" and check its value). Ensure references to CEL
expressions and the table rows (the "**Ready** (`status.ready: true`)" and
"**Not Ready** (`status.ready: false`)") are updated accordingly and include a
brief note describing that this is derived from the Ready condition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/sentinel-operator-guide.md`:
- Around line 7-22: The TOC links use unnumbered anchors but your renderer
generates numbered heading IDs (e.g., headings like "Introduction" and "What is
Sentinel" produce anchors "1-introduction" and "11-what-is-sentinel"); update
each TOC entry to use the numbered anchors that match the generated IDs (e.g.,
change "#introduction" → "#1-introduction", "#what-is-sentinel" →
"#11-what-is-sentinel", etc.), or regenerate the TOC using the same tool/command
that produces the document headings to ensure the TOC and header IDs remain in
sync (verify for entries such as "Core Concepts", "Decision Engine",
"Generation-Based Reconciliation" and "Time-Based Reconciliation (Max Age
Intervals)").
- Line 524: The anchor link "#broker-setup" is broken; update the link in the
sentence referencing Workload Identity Federation to point to the correct
heading id for the "Configure Authentication and Permissions" section (e.g.,
replace "#broker-setup" with "#configure-authentication-and-permissions" or the
actual slug used by the document generator) so the link navigates to the
Configure Authentication and Permissions section.
- Line 634: The example log line in the docs shows duration=125ms but the
runtime uses printf formatting with '%.3fs' (seconds to 3 decimals); update the
example log entry (the line beginning "Trigger cycle completed total=15
published=3 skipped=12 duration=125ms topic=hyperfleet-dev-clusters
subset=clusters") to use the seconds format produced by '%.3fs' (e.g.,
duration=0.125s) so the docs match the actual output.
- Line 420: The CloudEvents "type" example is incorrect; update the example
value to match the implementation which constructs the type as
"com.redhat.hyperfleet.<ResourceKind>.reconcile" (note singular, capitalized
ResourceKind from resource.Kind and the com.redhat. prefix). Locate the sample
line with `"type": "hyperfleet.clusters.reconcile"` and replace it with the form
produced by the code (e.g., `"type": "com.redhat.hyperfleet.Cluster.reconcile"`)
so it matches the actual behavior.

---

Nitpick comments:
In `@docs/sentinel-operator-guide.md`:
- Around line 472-476: The docs currently present the BROKER_TOPIC pattern
"{namespace}-{resourceType}" as the topic name; update the text to clarify this
is the Helm chart default (a recommended convention) and not a hard requirement
so other deployments (e.g., GCP, ROSA) can use different naming schemes—refer to
the BROKER_TOPIC env var and explicitly state "Helm default:
{namespace}-{resourceType}" and mention that custom topic naming is supported
and may vary by deployment.
- Line 41: Replace the hardcoded broker examples in the intro line "Publish
CloudEvents to a message broker (RabbitMQ or Google Pub/Sub)" with a generic
phrase such as "Publish CloudEvents to a message broker" and move specific
implementations into the configuration/implementation section; update the
sentence containing "Publish CloudEvents" so it no longer names RabbitMQ or
Google Pub/Sub and ensure the docs elsewhere (configuration section) list the
currently supported brokers.
- Around line 214-229: The diagram currently implies a 1:1 mapping between
Broker Topics (B1, B2, B3) and Adapters (A1, A2, A3); add a short clarifying
note right below the Mermaid block stating that multiple adapters can subscribe
to the same Broker Topic (fan-out) so A1/A2/A3 are examples and not exclusive
consumers, and optionally show one example edge or parenthetical (e.g.,
"multiple adapters may consume from B1") to make the fan-out behavior explicit.
- Around line 661-753: Remove the duplicated detailed reference tables in
Appendix A and B and replace them with concise “Quick links” and a short
one-line summary pointing to the authoritative docs (e.g., configuration.md and
metrics.md); specifically, delete the large Configuration Fields Reference and
Metrics Catalog Reference tables and instead add links labeled (Configuration
reference → configuration.md) and (Metrics catalog → metrics.md) plus a brief
note that quick examples remain here for convenience; locate these changes
around the headings "Appendix A: Troubleshooting" / "Appendix B: Quick
Reference" and update any in-text references (like PromQL examples) to point
readers to the external docs for full details.
- Around line 515-519: Update the "Option 1: Use Application Default Credentials
(recommended for GKE)" section to clarify that the gcloud auth
application-default login command is intended for local development only (not
for production), and explicitly state that in production ADC should rely on
Workload Identity or a service account key as already mentioned; modify the
heading/text for "Option 1" to mark it as "local development" and add one
sentence pointing readers to the existing Workload Identity guidance or the
"Option 2" service account key option for production usage.
- Line 180: Update the heading "### 2.2 Resource Filtering and Sharding" and all
occurrences of "sharding" in this section to a more accurate term like "Resource
Filtering and Partitioning" or "Label-Based Resource Distribution"; also update
the explanatory text that references manual label-based filtering (around the
noted lines referring to operator responsibility) to reflect that distribution
is manual/label-based rather than automatic sharding, ensuring wording is
consistent across the section and any examples or labels shown.
- Line 56: Update the sentence "Sentinel operates independently of adapters and
uses a **dual-trigger reconciliation strategy**:" to clarify that
"independently" means Sentinel is adapter-agnostic (it does not track or call
specific downstream adapters), and explicitly describe that it publishes events
to a broker topic in a fan-out pattern so multiple adapters can independently
consume the same events; mention implications such as no built-in
adapter-specific retry/ack semantics and that adapters must handle their own
delivery guarantees. Reference the existing phrase "Sentinel operates
independently of adapters", "dual-trigger reconciliation strategy", and "broker
topic" when adding this explanatory expansion.
- Around line 33-35: The intro section titled "1.1 What is Sentinel?" is too
technical and lacks system-level context; update that paragraph to briefly state
Sentinel's role within the broader HyperFleet system (e.g., it acts as the event
trigger for cluster lifecycle orchestration), why it exists (to detect resource
drift and initiate reconciliations), and how it interacts with other components
(HyperFleet API, orchestrator, message broker) so operators understand its
purpose and place in workflows; modify the existing text under the "What is
Sentinel?" heading to include these sentences while keeping the existing line
about polling-based event publishing and CloudEvents.
- Around line 134-136: The docs reference a non-existent direct field
status.ready; update the text and CEL examples to either (a) explain that
status.ready is a computed/convenience property derived from the Ready condition
(i.e., the document should state that status.ready == true when
conditions[type=="Ready"].value == "True"), or (b) replace all occurrences of
status.ready in prose and CEL examples with the actual API expression using
resource.status.conditions to find the Ready condition (e.g., filter for
c.type=="Ready" and check its value). Ensure references to CEL expressions and
the table rows (the "**Ready** (`status.ready: true`)" and "**Not Ready**
(`status.ready: false`)") are updated accordingly and include a brief note
describing that this is derived from the Ready condition.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 652ff7c7-945e-4077-a440-de7f9c6901cb

📥 Commits

Reviewing files that changed from the base of the PR and between 8aae11f and d9d6a1a.

📒 Files selected for processing (1)
  • docs/sentinel-operator-guide.md

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
docs/sentinel-operator-guide.md (3)

632-635: ⚠️ Potential issue | 🟡 Minor

Make the sample log duration match the real log format.

The example shows 125ms, but Sentinel logs this duration in seconds with fractional precision. Keeping the example as-is makes copy/paste verification misleading.

🛠️ Proposed fix
   Fetched resources count=15 label_selectors=1 topic=hyperfleet-dev-clusters subset=clusters
-  Trigger cycle completed total=15 published=3 skipped=12 duration=125ms topic=hyperfleet-dev-clusters subset=clusters
+  Trigger cycle completed total=15 published=3 skipped=12 duration=0.125s topic=hyperfleet-dev-clusters subset=clusters
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` around lines 632 - 635, The sample log in
the snippet uses a millisecond duration ("duration=125ms") but real Sentinel
logs emit durations in seconds with fractional precision; update the example
line (the "Trigger cycle completed total=15 published=3 skipped=12
duration=125ms topic=hyperfleet-dev-clusters subset=clusters" entry) to use
seconds with fractional precision (e.g., "duration=0.125s") so the sample
matches the actual log format.

524-524: ⚠️ Potential issue | 🟡 Minor

Fix the remaining broken section links.

These references still point to anchors that do not exist in the rendered document. The headings in this guide use numbered slugs, so these links should target those numbered fragments instead.

🛠️ Proposed fix
-When deploying to GKE, use **Workload Identity Federation** for authentication instead of service account keys. See [Deployment: Broker Setup](`#broker-setup`) for details.
+When deploying to GKE, use **Workload Identity Federation** for authentication instead of service account keys. See [Configure Authentication and Permissions](`#configure-authentication-and-permissions`) for details.

-  - Reference: [Resource Filtering and Sharding](`#resource-filtering-and-sharding`)
+  - Reference: [Resource Filtering and Sharding](`#22-resource-filtering-and-sharding`)

-  - Reference: [Optional Fields](`#optional-fields`)
+  - Reference: [Optional Fields](`#33-optional-fields`)

-- [ ] Reference: [Message Data (CEL Expressions)](`#message-data-cel-expressions`)
+- [ ] Reference: [Message Data (CEL Expressions)](`#35-message-data-cel-expressions`)

-  - Reference: [Required Fields](`#required-fields`)
+  - Reference: [Required Fields](`#32-required-fields`)

Also applies to: 544-544, 552-552, 557-557, 563-563

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` at line 524, Links like "Deployment: Broker
Setup" target non-existent anchors; update those Markdown link targets to the
rendered numbered heading slugs used in this guide (replace the plain anchor
fragments with the corresponding numbered-fragment slug for the "Deployment:
Broker Setup" heading and the other similar links referenced around the same
sections). Find the occurrences of the link text "Deployment: Broker Setup" (and
the other broken anchors called out) and change their href targets to the
guide's actual numbered slugs so they resolve to the rendered headings.

417-421: ⚠️ Potential issue | 🟠 Major

Update the CloudEvent type example to match what Sentinel emits.

The example still documents hyperfleet.clusters.reconcile, which does not match the actual event type format used by Sentinel. That makes the payload example misleading for downstream consumers and test fixtures.

🛠️ Proposed fix
-  "type": "hyperfleet.clusters.reconcile",
+  "type": "com.redhat.hyperfleet.Cluster.reconcile",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/sentinel-operator-guide.md` around lines 417 - 421, The CloudEvent
`type` example currently shows "hyperfleet.clusters.reconcile" which is
incorrect; locate the JSON example that sets the "type" field (the fragment
containing "type": "hyperfleet.clusters.reconcile") and replace that string with
the actual event type Sentinel emits (retrieve the exact value from the Sentinel
emitter or event docs), then update any related examples/fixtures/tests to use
the same authoritative event type so documentation and downstream consumers
remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/sentinel-operator-guide.md`:
- Around line 632-635: The sample log in the snippet uses a millisecond duration
("duration=125ms") but real Sentinel logs emit durations in seconds with
fractional precision; update the example line (the "Trigger cycle completed
total=15 published=3 skipped=12 duration=125ms topic=hyperfleet-dev-clusters
subset=clusters" entry) to use seconds with fractional precision (e.g.,
"duration=0.125s") so the sample matches the actual log format.
- Line 524: Links like "Deployment: Broker Setup" target non-existent anchors;
update those Markdown link targets to the rendered numbered heading slugs used
in this guide (replace the plain anchor fragments with the corresponding
numbered-fragment slug for the "Deployment: Broker Setup" heading and the other
similar links referenced around the same sections). Find the occurrences of the
link text "Deployment: Broker Setup" (and the other broken anchors called out)
and change their href targets to the guide's actual numbered slugs so they
resolve to the rendered headings.
- Around line 417-421: The CloudEvent `type` example currently shows
"hyperfleet.clusters.reconcile" which is incorrect; locate the JSON example that
sets the "type" field (the fragment containing "type":
"hyperfleet.clusters.reconcile") and replace that string with the actual event
type Sentinel emits (retrieve the exact value from the Sentinel emitter or event
docs), then update any related examples/fixtures/tests to use the same
authoritative event type so documentation and downstream consumers remain
consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ad176138-e4b2-4478-802b-3e0fb3a3ea08

📥 Commits

Reviewing files that changed from the base of the PR and between d9d6a1a and 2f86a05.

📒 Files selected for processing (1)
  • docs/sentinel-operator-guide.md

@86254860 86254860 force-pushed the HYPERFLEET-634 branch 4 times, most recently from 642fd69 to 9375468 Compare March 9, 2026 07:49
@86254860 86254860 force-pushed the HYPERFLEET-634 branch 3 times, most recently from 1f2f9c3 to de08269 Compare March 9, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants