Skip to content

[SREP-3811] Add change-volume-type command for EBS volume migration#857

Open
charlesgong wants to merge 1 commit intoopenshift:masterfrom
charlesgong:SREP-3811
Open

[SREP-3811] Add change-volume-type command for EBS volume migration#857
charlesgong wants to merge 1 commit intoopenshift:masterfrom
charlesgong:SREP-3811

Conversation

@charlesgong
Copy link

@charlesgong charlesgong commented Mar 5, 2026

This commit adds a new 'cluster change-volume-type' command to osdctl to support migrating EBS volumes from one type to another (e.g., io1 to gp3) for cost optimization and performance tuning.

Features:

  • Online volume modification with no downtime
  • Support for all AWS volume types (gp2, gp3, io1, io2, st1, sc1)
  • Dry-run mode to preview changes
  • Real-time progress monitoring
  • Elevation reason tracking for auditability
  • Configurable IOPS and throughput for applicable volume types

This command is particularly useful for SREP-3811 - migrating master node volumes from io1 to gp3 for cost optimization while maintaining performance.

Examples:
osdctl cluster change-volume-type
--cluster-id 2abc123def456
--volume-id vol-1234567890abcdef0
--target-type gp3
--iops 3000
--throughput 125
--reason "SREP-3811 - Master volume cost optimization"

JIRA: SREP-3811

Summary by CodeRabbit

  • New Features
    • Added a CLI command to change EBS volume types for AWS-backed clusters with optional IOPS/throughput, dry-run preview, ownership and compatibility validations, progress monitoring, timeout handling, and final status reporting.
  • Documentation
    • New user guide and CLI reference pages for the change-volume-type command; links added to cluster docs and README.
  • Tests
    • Added unit tests covering validation, defaults, and command flag behavior.

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new CLI command change-volume-type to change AWS EBS volume types for cluster volumes with validation, OCM and AWS integration, dry-run support, progress polling, unit tests, and documentation; the command is registered under the cluster subcommands.

Changes

Cohort / File(s) Summary
Feature implementation & tests
cmd/cluster/changevolumetype.go, cmd/cluster/changevolumetype_test.go
New change-volume-type Cobra command and changeVolumeTypeOptions; flags --cluster-id, --volume-id, --target-type, --reason (required), optional --iops, --throughput, --dry-run; input validation, OCM client & AWS EC2 interactions (DescribeVolumes, ModifyVolume, DescribeVolumesModifications), ownership checks, dry-run preview, progress polling, and unit tests for validation, defaults, and flags.
Command registration
cmd/cluster/cmd.go
Registers newCmdChangeVolumeType() as a new cluster subcommand.
Documentation
docs/README.md, docs/osdctl_cluster.md, docs/osdctl_cluster_change-volume-type.md
Adds command reference, detailed usage, examples, notes about supported target types, IOPS/throughput behavior, and a dry-run example.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant Cmd as Command Handler
    participant OCM as OCM Service
    participant AWS as AWS EC2

    User->>Cmd: Invoke change-volume-type (flags)
    Cmd->>Cmd: Validate inputs (clusterID, targetType, IOPS/throughput)
    Cmd->>OCM: Init client & fetch cluster
    OCM-->>Cmd: Cluster details (provider, infraID, region)
    Cmd->>AWS: DescribeVolumes (volume info)
    AWS-->>Cmd: Volume details (type, size, tags, IOPS, throughput, state)
    Cmd->>Cmd: Verify ownership, current type, modification status
    alt Dry-run
        Cmd-->>User: Print prospective change summary
    else Execute change
        Cmd->>AWS: ModifyVolume (VolumeType [+ IOPS/Throughput])
        AWS-->>Cmd: Modification started ack
        loop Poll (periodic, up to ~60min)
            Cmd->>AWS: DescribeVolumesModifications
            AWS-->>Cmd: Modification status (modifying / completed / failed)
        end
        Cmd->>AWS: DescribeVolumes (final state)
        AWS-->>Cmd: Final volume info
        Cmd-->>User: Report result or error
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test file lacks testify/assert usage and comprehensive coverage of critical business logic in the run() function. Refactor tests to use testify/assert and add unit tests for validation logic, AWS/OCM operations, and timeout handling.
Stable And Deterministic Test Names ❓ Inconclusive Test file structure needs verification to determine if test names contain dynamic information that could change between runs. Unable to examine test file without shell execution capability; review cmd/cluster/changevolumetype_test.go for dynamic test naming patterns.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main change: adding a new CLI command for EBS volume migration with type changes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from MateSaary and typeid March 5, 2026 01:12
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: charlesgong
Once this PR has been reviewed and has the lgtm label, please assign petrkotas for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
cmd/cluster/changevolumetype_test.go (3)

46-61: Default-value test is disconnected from actual runtime defaults.

This test checks zero-values on a raw struct, but runtime defaults come from Cobra flag binding in newCmdChangeVolumeType(). It gives weak signal on real CLI behavior.

Please assert defaults from the constructed command flags instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/cluster/changevolumetype_test.go` around lines 46 - 61, The test
TestChangeVolumeType_OptionsDefaults is asserting zero-values on a raw
changeVolumeTypeOptions struct instead of the actual CLI defaults; change it to
construct the command via newCmdChangeVolumeType(), then read the bound flag
defaults from that command (e.g., using cmd.Flags().GetInt/GetBool or
Flags().Lookup("iops"/"throughput"/"dry-run").Value) and assert those values
match expected runtime defaults; keep the test name and validate the iops,
throughput and dry-run flag defaults from the command rather than the raw struct
so it reflects real CLI behavior.

7-44: This test is tautological and doesn’t validate production behavior.

TestChangeVolumeType_ValidateTargetType duplicates a local list/loop instead of testing command/runtime validation. It can pass while real validation logic regresses.

Refactor to assert behavior through a shared validation helper used by production code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/cluster/changevolumetype_test.go` around lines 7 - 44, The test
TestChangeVolumeType_ValidateTargetType is tautological because it reimplements
validation logic; update it to call the actual production validation helper
(e.g., ValidateVolumeType, validateTargetType, or the command's Validate method)
instead of using the local validTypes slice and loop. Replace the local loop
with calls to the production validator for each tt.targetType and assert that
the returned error presence matches tt.expectError (use t.Run as-is), ensuring
the test imports or references the real validation function used by the
change-volume-type command.

83-85: Required-flag behavior should be asserted, not left as a note.

Line 83-85 acknowledges missing verification for required flags. That leaves a regression gap for a safety-critical command.

Please add an assertion that required-flag enforcement metadata/behavior is present.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/cluster/changevolumetype_test.go` around lines 83 - 85, Add an actual
assertion that the command enforces required flags by exercising the Cobra
command rather than relying on metadata notes: in changevolumetype_test.go
(e.g., inside TestChangeVolumeTypeCmd or the existing test block that builds the
command via the function that creates the change-volume-type command) execute
the command with required flags omitted and assert Execute() returns a non-nil
error and the error message indicates a missing/required flag (e.g., contains
"required" or "required flag"). This verifies runtime enforcement of
MarkFlagRequired behavior instead of leaving it as a comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/cluster/changevolumetype.go`:
- Around line 247-304: The monitoring loop that polls
awsClient.DescribeVolumesModifications (the for i := 0; i < 120; i++ loop using
ticker and o.volumeID) can exit due to timeout without ever reaching
VolumeModificationStateCompleted, yet the function always prints "Volume type
change completed successfully"; fix this by introducing an explicit completion
status flag (e.g., completed bool) set to true only when state ==
types.VolumeModificationStateCompleted, and set an error/return when the loop
finishes without completed (or return a “still in progress” error) instead of
falling through to the success message; update the final status block and the
final fmt.Printf + return to reflect failure/unknown when completed is false and
still return nil only when completed is true.
- Around line 147-156: After calling awsClient.DescribeVolumes for o.volumeID,
validate ownership by inspecting describeOutput.Volumes (iterate through
describeOutput.Volumes) and checking the volume Tags for the cluster identifier
(e.g., presence of a tag like "kubernetes.io/cluster/<clusterName>" or a
"Cluster"/"cluster-id" tag matching o.clusterName or o.clusterID); if no
matching tag is found return an error refusing to modify the volume. Perform
this check before calling ModifyVolume to ensure the volume belongs to the
target cluster; reference awsClient.DescribeVolumes, describeOutput.Volumes,
o.volumeID and the code path that calls ModifyVolume when implementing the
guard.
- Around line 78-79: The flags for iops and throughput are currently given
defaults and then unconditionally applied, causing silent overrides; update the
change-volume-type logic to treat --iops and --throughput as opt-in by checking
cmd.Flags().Changed("iops") and cmd.Flags().Changed("throughput") (where the
code currently reads/assigns ops.iops and ops.throughput in the migration/apply
section) and only set those fields on the target volume/request when the
corresponding flag was explicitly provided; leave existing values untouched when
the flag is not changed.
- Around line 64-68: The command currently swallows errors by using Run and
printing them; change it to return errors so Cobra exits non-zero on failure:
replace the Run func with a RunE func that returns ops.run() (e.g., RunE:
func(cmd *cobra.Command, args []string) error { return ops.run() }), ensuring
any existing fmt.Printf("Error: ...") is removed so the error is propagated to
the CLI caller.
- Around line 126-127: The cloud provider check in changevolumetype.go uses a
case-sensitive comparison against "aws" which will reject providers like "AWS";
update the check in the function that currently calls
cluster.CloudProvider().ID() to perform a case-insensitive comparison (e.g., use
strings.EqualFold or compare strings.ToLower(cluster.CloudProvider().ID()) ==
"aws"), adjust the error return to include the actual provider ID as before, and
ensure the strings package is imported if not already.

---

Nitpick comments:
In `@cmd/cluster/changevolumetype_test.go`:
- Around line 46-61: The test TestChangeVolumeType_OptionsDefaults is asserting
zero-values on a raw changeVolumeTypeOptions struct instead of the actual CLI
defaults; change it to construct the command via newCmdChangeVolumeType(), then
read the bound flag defaults from that command (e.g., using
cmd.Flags().GetInt/GetBool or
Flags().Lookup("iops"/"throughput"/"dry-run").Value) and assert those values
match expected runtime defaults; keep the test name and validate the iops,
throughput and dry-run flag defaults from the command rather than the raw struct
so it reflects real CLI behavior.
- Around line 7-44: The test TestChangeVolumeType_ValidateTargetType is
tautological because it reimplements validation logic; update it to call the
actual production validation helper (e.g., ValidateVolumeType,
validateTargetType, or the command's Validate method) instead of using the local
validTypes slice and loop. Replace the local loop with calls to the production
validator for each tt.targetType and assert that the returned error presence
matches tt.expectError (use t.Run as-is), ensuring the test imports or
references the real validation function used by the change-volume-type command.
- Around line 83-85: Add an actual assertion that the command enforces required
flags by exercising the Cobra command rather than relying on metadata notes: in
changevolumetype_test.go (e.g., inside TestChangeVolumeTypeCmd or the existing
test block that builds the command via the function that creates the
change-volume-type command) execute the command with required flags omitted and
assert Execute() returns a non-nil error and the error message indicates a
missing/required flag (e.g., contains "required" or "required flag"). This
verifies runtime enforcement of MarkFlagRequired behavior instead of leaving it
as a comment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4690c43f-bc48-426c-9e6f-66d48a34e24d

📥 Commits

Reviewing files that changed from the base of the PR and between 76b8070 and f643ccc.

📒 Files selected for processing (3)
  • cmd/cluster/changevolumetype.go
  • cmd/cluster/changevolumetype_test.go
  • cmd/cluster/cmd.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/osdctl_cluster_change-volume-type.md`:
- Line 17: Update the user-facing sentence to improve grammar by replacing
"Requires elevated backplane access (--reason flag)" with "Requires backplane
elevation (--reason flag)"; locate the line in
docs/osdctl_cluster_change-volume-type.md that mentions the --reason flag and
perform this exact text swap so the description reads "Requires backplane
elevation (--reason flag)".
- Line 49: The flag description for -t, --target-type currently lists only (gp3,
gp2, io1, io2) but the implementation and validation allow st1 and sc1 as well;
update the documentation text for the -t/--target-type flag to include st1 and
sc1 and ensure it matches the actual validation set used by the changeVolumeType
validation logic (the --target-type flag and its validation list).
- Around line 19-43: The examples and command/help snippets for the osdctl
cluster change-volume-type documentation lack explicit fence languages; update
the example blocks that begin with the sample osdctl cluster change-volume-type
invocations to use ```bash fences and wrap the plain command/flags blocks (the
triple-backtick blocks showing "osdctl cluster change-volume-type --cluster-id
..." and the "### Options" help block) with ```text fences so markdownlint
passes and rendering is clear; locate the blocks around the example invocations
and the command/help snippets in docs/osdctl_cluster_change-volume-type.md and
replace their opening/closing triple backticks with the appropriate
language-tagged fences.

In `@docs/README.md`:
- Line 1411: The README flag help for "-t, --target-type string" is missing st1
and sc1; update the documented target volume types to include "st1" and "sc1" so
it reads something like "(gp3, gp2, io1, io2, st1, sc1)"; also scan for the same
flag/usage string in the CLI implementation (the code that defines the
-t/--target-type flag) and ensure the help text and any validation logic in that
implementation accept st1 and sc1 to keep docs and behavior consistent.
- Around line 1371-1395: The Examples/Flags markdown is breaking lint because
the example commands are not inside a fenced code block and markdown headings
are being parsed; wrap the example lines (the "Examples:" label and both osdctl
command examples) in a fenced block annotated with bash (use ```bash ... ```),
change the single-line usage snippet to a fenced block annotated with text
(```text ... ```), and ensure the Flags section's code fence is properly
opened/closed so the Flags list remains inside its own fenced block; locate the
Examples/Flags content around the existing osdctl cluster change-volume-type
examples and update those three fences accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fbfc4a59-e7c5-4e83-96f0-5a8c81f798de

📥 Commits

Reviewing files that changed from the base of the PR and between f643ccc and 3fada8b.

📒 Files selected for processing (3)
  • docs/README.md
  • docs/osdctl_cluster.md
  • docs/osdctl_cluster_change-volume-type.md

Comment on lines +19 to +43
Examples:
# Migrate a master volume from io1 to gp3
osdctl cluster change-volume-type \
--cluster-id 2abc123def456 \
--volume-id vol-1234567890abcdef0 \
--target-type gp3 \
--iops 3000 \
--throughput 125 \
--reason "SREP-3811 - Master volume cost optimization"

# Dry run to preview changes
osdctl cluster change-volume-type \
--cluster-id 2abc123def456 \
--volume-id vol-1234567890abcdef0 \
--target-type gp3 \
--reason "SREP-3811" \
--dry-run

```
osdctl cluster change-volume-type --cluster-id <cluster-id> --volume-id <volume-id> --target-type <type> [flags]
```

### Options

```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wrap examples and command blocks with explicit markdown fence languages.

Current formatting causes markdownlint failures and reduces render clarity. Use bash for examples and text for flag/help blocks.

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 20-20: Headings must start at the beginning of the line

(MD023, heading-start-left)


[warning] 29-29: Headings must start at the beginning of the line

(MD023, heading-start-left)


[warning] 37-37: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 41-41: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)


[warning] 43-43: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/osdctl_cluster_change-volume-type.md` around lines 19 - 43, The examples
and command/help snippets for the osdctl cluster change-volume-type
documentation lack explicit fence languages; update the example blocks that
begin with the sample osdctl cluster change-volume-type invocations to use
```bash fences and wrap the plain command/flags blocks (the triple-backtick
blocks showing "osdctl cluster change-volume-type --cluster-id ..." and the "###
Options" help block) with ```text fences so markdownlint passes and rendering is
clear; locate the blocks around the example invocations and the command/help
snippets in docs/osdctl_cluster_change-volume-type.md and replace their
opening/closing triple backticks with the appropriate language-tagged fences.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
docs/README.md (1)

1372-1374: ⚠️ Potential issue | 🟡 Minor

Add explicit fence languages for the new usage/options blocks.

The new fenced blocks are missing language identifiers, which triggers markdownlint (MD040).

🛠️ Proposed doc fix
-```
+```text
 osdctl cluster change-volume-type --cluster-id <cluster-id> --volume-id <volume-id> --target-type <type> [flags]

@@
- +text
--as string Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
--cluster string The name of the kubeconfig cluster to use
-C, --cluster-id string Cluster ID (internal ID or external ID)
@@
-t, --target-type string Target volume type (gp2, gp3, io1, io2, st1, sc1)
--throughput int32 Throughput in MB/s (for gp3 only). If not specified, uses AWS defaults.
-v, --volume-id string EBS volume ID (e.g., vol-1234567890abcdef0)

Also applies to: 1378-1397

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/README.md` around lines 1372 - 1374, The fenced code blocks showing the
osdctl cluster change-volume-type usage and options are missing language
identifiers causing markdownlint MD040; update the opening fences for both the
usage block (the block containing "osdctl cluster change-volume-type
--cluster-id <cluster-id> --volume-id <volume-id> --target-type <type> [flags]")
and the options block (the block listing flags like "--as string", "--cluster
string", "-C, --cluster-id string", "-t, --target-type string", etc.) to include
an explicit language (e.g., use ```text) for each fenced block (also apply the
same change to the other occurrence mentioned around the second block).
docs/osdctl_cluster_change-volume-type.md (1)

20-22: ⚠️ Potential issue | 🟡 Minor

Use language-tagged fences for usage/examples/options blocks.

These blocks currently fail markdownlint MD040 and render less clearly without explicit fence languages.

🛠️ Proposed doc fix
-```
+```text
 osdctl cluster change-volume-type --cluster-id <cluster-id> --volume-id <volume-id> --target-type <type> [flags]

@@
- +bash

Migrate a master volume from io1 to gp3 with custom IOPS

osdctl cluster change-volume-type
@@
osdctl cluster change-volume-type
--cluster-id 2abc123def456
--volume-id vol-1234567890abcdef0
--target-type gp3
--reason "SREP-3811"
--dry-run

@@
-```
+```text
  -C, --cluster-id string    Cluster ID (internal ID or external ID)
      --dry-run              Dry run - show what would be changed without executing
@@
  -t, --target-type string   Target volume type (gp2, gp3, io1, io2, st1, sc1)
      --throughput int32     Throughput in MB/s (for gp3 only). If not specified, uses AWS defaults.
  -v, --volume-id string     EBS volume ID (e.g., vol-1234567890abcdef0)

@@
- +text
--as string Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
--cluster string The name of the kubeconfig cluster to use
@@
--skip-aws-proxy-check aws_proxy Don't use the configured aws_proxy value
-S, --skip-version-check skip checking to see if this is the most recent release

Also applies to: 26-50, 54-63, 67-78

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/osdctl_cluster_change-volume-type.md` around lines 20 - 22, Update the
code fences for the usage, examples and options blocks of the osdctl cluster
change-volume-type documentation so they include appropriate language tags
(e.g., use ```bash for command examples and ```text for plain output/option
lists) — locate the blocks containing the usage line "osdctl cluster
change-volume-type --cluster-id <cluster-id> --volume-id <volume-id>
--target-type <type> [flags]" and the example commands shown with backslashes
and the options list (flags like --dry-run, --target-type, --throughput,
--volume-id, --as, --cluster, --skip-aws-proxy-check, --skip-version-check) and
replace the plain triple-backtick fences with the correct fenced language
markers (bash or text) for those blocks across the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/cluster/changevolumetype.go`:
- Around line 112-123: Add explicit validation that rejects unsupported
IOPS/throughput flag combinations instead of silently omitting them: in the
change volume type command where targetType is validated (references:
o.targetType, validTypes slice) and in the later block handling performance
params (references: o.Iops, o.Throughput), check the chosen targetType’s allowed
parameters (e.g., gp3 supports both iops and throughput, gp2/io1/io2 have
specific rules) and return an error when a user supplies --iops or --throughput
that are not valid for the selected o.targetType; ensure the error is returned
before building the API request so dry-run and real runs both fail loudly with a
descriptive message.
- Around line 183-201: The current ownership check (loop over volume.Tags using
strings.Contains on *tag.Key and *tag.Value) is too permissive; update the logic
in changevolumetype.go (where infraID := cluster.InfraID(), clusterOwned
variable and the loop over volume.Tags) to perform exact matching: require
*tag.Key == "kubernetes.io/cluster/"+infraID for the cluster tag and require
exact equality on the Name tag value (*tag.Key == "Name" && *tag.Value ==
infraID) (or another strict equality rule your project prefers) instead of
strings.Contains, so only resources with precise matching tags are considered
cluster-owned.

---

Duplicate comments:
In `@docs/osdctl_cluster_change-volume-type.md`:
- Around line 20-22: Update the code fences for the usage, examples and options
blocks of the osdctl cluster change-volume-type documentation so they include
appropriate language tags (e.g., use ```bash for command examples and ```text
for plain output/option lists) — locate the blocks containing the usage line
"osdctl cluster change-volume-type --cluster-id <cluster-id> --volume-id
<volume-id> --target-type <type> [flags]" and the example commands shown with
backslashes and the options list (flags like --dry-run, --target-type,
--throughput, --volume-id, --as, --cluster, --skip-aws-proxy-check,
--skip-version-check) and replace the plain triple-backtick fences with the
correct fenced language markers (bash or text) for those blocks across the file.

In `@docs/README.md`:
- Around line 1372-1374: The fenced code blocks showing the osdctl cluster
change-volume-type usage and options are missing language identifiers causing
markdownlint MD040; update the opening fences for both the usage block (the
block containing "osdctl cluster change-volume-type --cluster-id <cluster-id>
--volume-id <volume-id> --target-type <type> [flags]") and the options block
(the block listing flags like "--as string", "--cluster string", "-C,
--cluster-id string", "-t, --target-type string", etc.) to include an explicit
language (e.g., use ```text) for each fenced block (also apply the same change
to the other occurrence mentioned around the second block).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 80cb2fe3-19f2-4dba-a57b-0e4e541f66d0

📥 Commits

Reviewing files that changed from the base of the PR and between 3fada8b and 04935a6.

📒 Files selected for processing (4)
  • cmd/cluster/changevolumetype.go
  • cmd/cluster/changevolumetype_test.go
  • docs/README.md
  • docs/osdctl_cluster_change-volume-type.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/cluster/changevolumetype_test.go

Comment on lines +183 to +201
// Validate volume belongs to this cluster
clusterOwned := false
infraID := cluster.InfraID()
for _, tag := range volume.Tags {
if tag.Key != nil && tag.Value != nil {
// Check for kubernetes.io/cluster/<infra-id> tag
if strings.HasPrefix(*tag.Key, "kubernetes.io/cluster/") && strings.Contains(*tag.Key, infraID) {
clusterOwned = true
break
}
// Also check for Name tag containing infra ID
if *tag.Key == "Name" && strings.Contains(*tag.Value, infraID) {
clusterOwned = true
break
}
}
}
if !clusterOwned {
return fmt.Errorf("volume %s does not belong to cluster %s (infra ID: %s). Refusing to modify volume that may belong to another cluster", o.volumeID, cluster.ID(), infraID)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Harden volume ownership validation with exact tag matching.

Using strings.Contains on tag key/value can match unrelated resources that happen to include the infra ID substring. For a mutating operation, this check should be strict to avoid cross-cluster volume modification risk.

🛠️ Proposed ownership-check hardening
 	// Validate volume belongs to this cluster
 	clusterOwned := false
 	infraID := cluster.InfraID()
+	expectedClusterTagKey := "kubernetes.io/cluster/" + infraID
 	for _, tag := range volume.Tags {
 		if tag.Key != nil && tag.Value != nil {
-			// Check for kubernetes.io/cluster/<infra-id> tag
-			if strings.HasPrefix(*tag.Key, "kubernetes.io/cluster/") && strings.Contains(*tag.Key, infraID) {
+			// Prefer exact ownership tag match
+			if *tag.Key == expectedClusterTagKey {
 				clusterOwned = true
 				break
 			}
-			// Also check for Name tag containing infra ID
-			if *tag.Key == "Name" && strings.Contains(*tag.Value, infraID) {
+			// Optional fallback: tighten Name tag match pattern
+			if *tag.Key == "Name" && strings.HasPrefix(*tag.Value, infraID+"-") {
 				clusterOwned = true
 				break
 			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/cluster/changevolumetype.go` around lines 183 - 201, The current
ownership check (loop over volume.Tags using strings.Contains on *tag.Key and
*tag.Value) is too permissive; update the logic in changevolumetype.go (where
infraID := cluster.InfraID(), clusterOwned variable and the loop over
volume.Tags) to perform exact matching: require *tag.Key ==
"kubernetes.io/cluster/"+infraID for the cluster tag and require exact equality
on the Name tag value (*tag.Key == "Name" && *tag.Value == infraID) (or another
strict equality rule your project prefers) instead of strings.Contains, so only
resources with precise matching tags are considered cluster-owned.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/cluster/changevolumetype.go`:
- Line 90: The CLI flag ops.reason is required but never passed into the
elevation/auth flow; update the run() path to forward the reason into the cloud
config/elevation call: when invoking osdCloud.CreateAWSV2Config(connection,
cluster) (and any intermediate helper that prepares elevation), add a reason
parameter (e.g., pass ops.reason) and update the CreateAWSV2Config signature and
callers to accept and propagate that reason so the AWS elevation call receives
the audit reason; alternatively, if you prefer not to enforce it yet, remove the
required flag declaration for --reason (cmd.Flags().StringVar(&ops.reason,
"reason", ...)) so it is optional until wired.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a275a1db-9123-45de-8fd1-9abe963bc753

📥 Commits

Reviewing files that changed from the base of the PR and between 04935a6 and f507962.

📒 Files selected for processing (1)
  • cmd/cluster/changevolumetype.go

cmd.Flags().StringVarP(&ops.clusterID, "cluster-id", "C", "", "Cluster ID (internal ID or external ID)")
cmd.Flags().StringVarP(&ops.volumeID, "volume-id", "v", "", "EBS volume ID (e.g., vol-1234567890abcdef0)")
cmd.Flags().StringVarP(&ops.targetType, "target-type", "t", "", "Target volume type (gp2, gp3, io1, io2, st1, sc1)")
cmd.Flags().StringVar(&ops.reason, "reason", "", "Reason for elevation (OHSS/PD/JIRA ticket)")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

--reason is required but not used in the execution path.

Line 90/Line 101 require an elevation reason, but o.reason is never consumed in run(). The current AWS setup call at Line 172 (osdCloud.CreateAWSV2Config(connection, cluster)) also has no reason input (see pkg/osdCloud/aws.go:183-197), so audit/elevation attribution is not actually enforced by this command path. Please wire --reason into the elevation/auth flow (or stop requiring it until it is wired).

Also applies to: 101-101, 106-176

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/cluster/changevolumetype.go` at line 90, The CLI flag ops.reason is
required but never passed into the elevation/auth flow; update the run() path to
forward the reason into the cloud config/elevation call: when invoking
osdCloud.CreateAWSV2Config(connection, cluster) (and any intermediate helper
that prepares elevation), add a reason parameter (e.g., pass ops.reason) and
update the CreateAWSV2Config signature and callers to accept and propagate that
reason so the AWS elevation call receives the audit reason; alternatively, if
you prefer not to enforce it yet, remove the required flag declaration for
--reason (cmd.Flags().StringVar(&ops.reason, "reason", ...)) so it is optional
until wired.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
cmd/cluster/changevolumetype.go (2)

156-176: ⚠️ Potential issue | 🟠 Major

--reason remains display-only, not enforced in elevation/auth flow.

o.reason is printed, but the execution path still calls osdCloud.CreateAWSV2Config(connection, cluster) without carrying reason metadata, so backend audit attribution is not guaranteed.

Suggested direction
- cfg, err := osdCloud.CreateAWSV2Config(connection, cluster)
+ cfg, err := osdCloud.CreateAWSV2Config(connection, cluster, o.reason)

And update pkg/osdCloud/aws.go + underlying backplane query config plumbing to propagate reason through elevation/auth calls.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/cluster/changevolumetype.go` around lines 156 - 176, The CLI only prints
o.reason but does not propagate it into the elevation/auth flow; modify the call
site and plumbing so the reason is carried into backend auth: change the
CreateAWSV2Config call in changevolumetype (where o.reason is printed) to pass
the reason (e.g., CreateAWSV2Config(connection, cluster, o.reason)), update the
signature of osdCloud.CreateAWSV2Config and its implementations in
pkg/osdCloud/aws.go to accept and forward the reason string through any
elevation/auth request builders, and ensure the backplane/elevation request
metadata includes that reason so audit attribution is preserved.

193-212: ⚠️ Potential issue | 🔴 Critical

Ownership fallback is still too permissive for a mutating operation.

Allowing Name tag prefix (infraID + "-") can still match unrelated resources. For cross-cluster safety, rely on strict ownership tag matching only.

Safer ownership check
 	clusterOwned := false
 	infraID := cluster.InfraID()
 	expectedClusterTagKey := "kubernetes.io/cluster/" + infraID
 	for _, tag := range volume.Tags {
 		if tag.Key != nil && tag.Value != nil {
-			// Exact match on kubernetes.io/cluster/<infra-id> ownership tag
-			if *tag.Key == expectedClusterTagKey {
+			// Exact ownership tag key + expected ownership values
+			if *tag.Key == expectedClusterTagKey && (*tag.Value == "owned" || *tag.Value == "shared") {
 				clusterOwned = true
 				break
 			}
-			// Fallback: Name tag with infra ID prefix
-			if *tag.Key == "Name" && strings.HasPrefix(*tag.Value, infraID+"-") {
-				clusterOwned = true
-				break
-			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/cluster/changevolumetype.go` around lines 193 - 212, The ownership check
is too permissive because it accepts a Name tag prefix match; update the logic
in the block that computes clusterOwned (which uses cluster.InfraID(),
expectedClusterTagKey, and iterates volume.Tags) to only set clusterOwned when
an exact match on the kubernetes.io/cluster/<infra-id> tag key is found and
remove the fallback that checks for *tag.Key == "Name" and
strings.HasPrefix(*tag.Value, infraID+"-"); keep the existing error return using
o.volumeID, cluster.ID(), and infraID when clusterOwned is false.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/cluster/changevolumetype.go`:
- Around line 321-355: The loop currently treats an empty
modOutput.VolumesModifications by breaking out and ultimately triggering a
timeout; instead handle transient empty lists by continuing to poll: in the loop
that reads modOutput.VolumesModifications, replace the break when
len(modOutput.VolumesModifications) == 0 with a continue (optionally logging
that the list was empty and sleeping/polling again), so you don't prematurely
exit before latestMod/ModificationState checks (references:
modOutput.VolumesModifications, latestMod, ModificationState, completed,
VolumeModificationStateCompleted/Failed).

---

Duplicate comments:
In `@cmd/cluster/changevolumetype.go`:
- Around line 156-176: The CLI only prints o.reason but does not propagate it
into the elevation/auth flow; modify the call site and plumbing so the reason is
carried into backend auth: change the CreateAWSV2Config call in changevolumetype
(where o.reason is printed) to pass the reason (e.g.,
CreateAWSV2Config(connection, cluster, o.reason)), update the signature of
osdCloud.CreateAWSV2Config and its implementations in pkg/osdCloud/aws.go to
accept and forward the reason string through any elevation/auth request
builders, and ensure the backplane/elevation request metadata includes that
reason so audit attribution is preserved.
- Around line 193-212: The ownership check is too permissive because it accepts
a Name tag prefix match; update the logic in the block that computes
clusterOwned (which uses cluster.InfraID(), expectedClusterTagKey, and iterates
volume.Tags) to only set clusterOwned when an exact match on the
kubernetes.io/cluster/<infra-id> tag key is found and remove the fallback that
checks for *tag.Key == "Name" and strings.HasPrefix(*tag.Value, infraID+"-");
keep the existing error return using o.volumeID, cluster.ID(), and infraID when
clusterOwned is false.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 998dc627-0235-49e7-ba3b-fbef2232e71e

📥 Commits

Reviewing files that changed from the base of the PR and between f507962 and 2f5100c.

📒 Files selected for processing (1)
  • cmd/cluster/changevolumetype.go

Add a new osdctl cluster subcommand to change EBS volume types (e.g.,
io1 to gp3) for AWS cluster volumes without downtime.

Features:
- Online volume modification using AWS Elastic Volumes API
- Supports all EBS types: gp2, gp3, io1, io2, st1, sc1
- Dry-run mode for previewing changes
- Real-time progress monitoring with 60-minute timeout
- Strict volume ownership validation via kubernetes.io/cluster tags
- Early rejection of invalid --iops/--throughput combinations
- Elevation reason tracking for SRE audit trail

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 5, 2026

@charlesgong: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

// newCmdChangeVolumeType creates the cobra command for changing an EBS volume type
// on an AWS cluster. It registers required flags (cluster-id, volume-id, target-type, reason)
// and optional performance flags (iops, throughput, dry-run).
func newCmdChangeVolumeType() *cobra.Command {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not the intent of the work prescribed in SREP-3811. Since we use CPMS, we can just change the root volume type and let CPMS rollout new nodes with that type, vs doing an in-place migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants