Skip to content

[ECE] Update steps of entire installation maintenance#5340

Open
kunisen wants to merge 3 commits intomainfrom
kunisen-docpr-stl-1751
Open

[ECE] Update steps of entire installation maintenance#5340
kunisen wants to merge 3 commits intomainfrom
kunisen-docpr-stl-1751

Conversation

@kunisen
Copy link
Contributor

@kunisen kunisen commented Mar 3, 2026

Preview / View

Summary

Background

Our current doc about "shutting down all ECE hosts" is confusing and misleading:
https://www.elastic.co/docs/deploy-manage/maintenance/ece/perform-ece-hosts-maintenance#ece-perform-host-maintenance-delete-runner

  • It says: By shutting down the host (less destructive) but it's actually shutting down ALL ECE hosts (per the description like Shut down all allocators:, etc).

  • It says: Enable maintenance mode on the allocator., but given it's shutting down entire ECE and all hosts, it doesn't make sense to do this step, all service will be taken down.

  • Also with some additional steps required in this scenario: We should stop routing request first, take snapshot, also we shouldn't recommend terminating deployments, etc.

Details are described in https://github.com/elastic/support-tech-lead/issues/1751.

Doc update detail


I had a sync from @R0ky and here's the note:

[1] Overview

Shuttind down all the runners (hosts) in an ECE deployment 
This method lets you temporarily shutt down all runners in an ECE deployment, e.g. for data center moves or planned power outages. It is offered as an non-guaranteed and less destructive alternative to fully rebuilding your ECE infrastructure. 

To shut down all the runners:

Disable traffic from load balancers.

Shut down all allocators

Shut down all non-director hosts.

Shut down directors.

[2] Steps

So this to stop:

  • To shut down all the runners:
  • Disable routing on all non system deployments
  • Make sure all non system deployments are green
  • Take snapshot on all deployments
  • Disable traffic from load balancers.
  • Shut down all allocators
  • Shut down all non-director hosts.
  • Shut down directors

To start:

  • Start all directors.
  • Verify that there is a healthy Zookeeper quorum (at least one zk_server_state leader, and zk_followers + zk_synced_followers should match the number of Zookeeper followers):
  • Start all remaining hosts.
  • Re-enable traffic from load balancers.
  • Re-enable routing based on Deployment priority

Side notes

Snapshot

Chat with @AlexP-Elastic in this link

shutdown deployments: Do you agree that we should add make sure you have a good snapshot taken first,

++ (though let’s make sure we recommend not shutting them down at all - just mention if the customer has a good reason they need to, then they should)

system deployment

Chat with @AlexP-Elastic in this link

system deployments: Do you agree that we should add a note saying avoid shutting down system clusters

++


Generative AI disclosure

  1. Did you use a generative AI (GenAI) tool to assist in creating this contribution?
  • Yes
  • No

@kunisen kunisen requested a review from a team as a code owner March 3, 2026 08:13
@kunisen kunisen requested review from AlexP-Elastic and R0ky March 3, 2026 08:13
@kunisen kunisen self-assigned this Mar 3, 2026
@kunisen kunisen added documentation Improvements or additions to documentation supportability ability enable self-service or support of product ece Elastic Cloud Enterprise Team:Admin Issues owned by the Admin Docs Team labels Mar 3, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

Vale Linting Results

Summary: 7 suggestions found

💡 Suggestions (7)
File Line Rule Message
deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md 43 Elastic.HeadingColons Capitalize ': d'.
deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md 43 Elastic.WordChoice Consider using 'deactivate, deselect, hide, turn off' instead of 'disable', unless the term is in the UI.
deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md 89 Elastic.HeadingColons Capitalize ': d'.
deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md 89 Elastic.WordChoice Consider using 'deactivate, deselect, hide, turn off' instead of 'disable', unless the term is in the UI.
deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md 188 Elastic.WordChoice Consider using 'deactivate, deselect, hide, turn off' instead of 'Disable', unless the term is in the UI.
deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md 194 Elastic.WordChoice Consider using 'stop, exit' instead of 'terminate', unless the term is in the UI.
deploy-manage/maintenance/ece/perform-ece-hosts-maintenance.md 194 Elastic.WordChoice Consider using 'can, might' instead of 'may', unless the term is in the UI.

The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

🔍 Preview links for changed docs

@eedugon eedugon self-requested a review March 3, 2026 09:23
@kunisen
Copy link
Contributor Author

kunisen commented Mar 4, 2026

FWIW, got @R0ky's LGTM here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ece Elastic Cloud Enterprise supportability ability enable self-service or support of product Team:Admin Issues owned by the Admin Docs Team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant