|
| 1 | +# KEP-5502: EmptyDir Volume Sticky Bit Support |
| 2 | + |
| 3 | +<!-- toc --> |
| 4 | +- [Release Signoff Checklist](#release-signoff-checklist) |
| 5 | +- [Summary](#summary) |
| 6 | +- [Motivation](#motivation) |
| 7 | + - [Goals](#goals) |
| 8 | + - [Non-Goals](#non-goals) |
| 9 | +- [Proposal](#proposal) |
| 10 | + - [User Stories](#user-stories) |
| 11 | + - [Story 1: Shared Temporary Storage for Multi-User Workloads](#story-1-shared-temporary-storage-for-multi-user-workloads) |
| 12 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 13 | +- [Design Details](#design-details) |
| 14 | + - [API Changes](#api-changes) |
| 15 | + - [Implementation](#implementation) |
| 16 | + - [Test Plan](#test-plan) |
| 17 | + - [Prerequisite testing updates](#prerequisite-testing-updates) |
| 18 | + - [Unit tests](#unit-tests) |
| 19 | + - [Integration tests](#integration-tests) |
| 20 | + - [e2e tests](#e2e-tests) |
| 21 | + - [Graduation Criteria](#graduation-criteria) |
| 22 | + - [Alpha](#alpha) |
| 23 | + - [Beta](#beta) |
| 24 | + - [GA](#ga) |
| 25 | + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) |
| 26 | + - [Version Skew Strategy](#version-skew-strategy) |
| 27 | +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) |
| 28 | + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) |
| 29 | + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) |
| 30 | + - [Monitoring Requirements](#monitoring-requirements) |
| 31 | + - [Dependencies](#dependencies) |
| 32 | + - [Scalability](#scalability) |
| 33 | + - [Troubleshooting](#troubleshooting) |
| 34 | +- [Implementation History](#implementation-history) |
| 35 | +- [Drawbacks](#drawbacks) |
| 36 | +- [Alternatives](#alternatives) |
| 37 | +<!-- /toc --> |
| 38 | + |
| 39 | +## Release Signoff Checklist |
| 40 | + |
| 41 | +Items marked with (R) are required *prior to targeting to a milestone / release*. |
| 42 | + |
| 43 | +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 44 | +- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
| 45 | +- [ ] (R) Design details are appropriately documented |
| 46 | +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) |
| 47 | + - [ ] e2e Tests for all Beta API Operations (endpoints) |
| 48 | + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) |
| 49 | + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free |
| 50 | +- [ ] (R) Graduation criteria is in place |
| 51 | + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) within one minor version of promotion to GA |
| 52 | +- [ ] (R) Production readiness review completed |
| 53 | +- [ ] (R) Production readiness review approved |
| 54 | +- [ ] "Implementation History" section is up-to-date for milestone |
| 55 | +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 56 | +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
| 57 | + |
| 58 | +[kubernetes.io]: https://kubernetes.io/ |
| 59 | +[kubernetes/enhancements]: https://git.k8s.io/enhancements |
| 60 | +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes |
| 61 | +[kubernetes/website]: https://git.k8s.io/website |
| 62 | + |
| 63 | +## Summary |
| 64 | + |
| 65 | +This KEP proposes adding support for the sticky bit permission (mode 01777) to emptyDir volumes in Kubernetes. The sticky bit is a Unix file permission that restricts file deletion within a directory. Only the file owner, directory owner, or root can delete files, even if all users have write permission. Lack of a sticky bit on directories may result in being unable to use these as temporary directories for security reasons, making it impossible to use emptyDir and having to resort to ephemeral volumes. |
| 66 | + |
| 67 | +## Motivation |
| 68 | + |
| 69 | +The emptyDir volume currently creates directories with mode 0777, allowing any process with write access to delete or rename any file in the volume, regardless of who created it. This behavior can cause problems in multi-user or multi-process workloads where: |
| 70 | + |
| 71 | +1. Multiple containers or processes running as different users share the same emptyDir volume |
| 72 | +2. One process accidentally or maliciously deletes files created by another process |
| 73 | +3. Init containers and main containers need to share files, but the main container should not be able to delete the init container's files |
| 74 | + |
| 75 | +The sticky bit (mode 01777) is a standard Unix permission that solves this problem by ensuring that only the owner of a file (or the directory owner, or root) can delete or rename it, even when the directory is world-writable. |
| 76 | + |
| 77 | +### Goals |
| 78 | + |
| 79 | +- Add an optional `stickyBit` field to the emptyDir volume specification |
| 80 | +- When enabled, create emptyDir volumes with mode 01777 instead of 0777 |
| 81 | +- Maintain backward compatibility by keeping the default behavior (mode 0777) unchanged |
| 82 | +- Support the feature on all platforms that support Unix file permissions |
| 83 | + |
| 84 | +### Non-Goals |
| 85 | + |
| 86 | +- Changing the default behavior of existing emptyDir volumes (mode 0777 remains the default) |
| 87 | +- Adding support for other advanced file permission features |
| 88 | +- Implementing this feature for volume types other than emptyDir |
| 89 | +- Supporting this feature on platforms that don't support Unix-style file permissions (e.g., Windows) |
| 90 | + |
| 91 | +## Proposal |
| 92 | + |
| 93 | +Add a new optional boolean field `stickyBit` to the `EmptyDirVolumeSource` API type. When set to `true`, the kubelet will create the emptyDir volume with mode 01777 (0777 | sticky bit) instead of the default 0777. |
| 94 | + |
| 95 | +### User Stories |
| 96 | + |
| 97 | +#### Story 1: Shared Temporary Storage for Multi-User Workloads |
| 98 | + |
| 99 | +For containerized ruby apps, `/tmp` folders will be rejected if they do not have a sticky bit. This means `emptyDir` cannot be reliably used for tmp folders, and ephemeral volumes (more complex to manage) or RWX volumes have to be used (which are not well supported in many providers). |
| 100 | + |
| 101 | +Allowing emptyDir to be mounted with sticky bit set would tremendously reduce complexity for these applications. |
| 102 | + |
| 103 | +### Risks and Mitigations |
| 104 | + |
| 105 | +**Risk**: Users might not understand the sticky bit behavior and be confused when they cannot delete files created by other users. |
| 106 | + |
| 107 | +**Mitigation**: Document the feature clearly with examples. The feature is opt-in, so users must explicitly enable it. |
| 108 | + |
| 109 | +**Risk**: The feature might not work correctly on all container runtimes or storage backends. |
| 110 | + |
| 111 | +**Mitigation**: The sticky bit is a standard Unix permission supported by all major filesystems. The feature is opt-in (users must explicitly set `stickyBit: true`), allowing for gradual adoption and testing. |
| 112 | + |
| 113 | +**Risk**: Existing workloads might be affected if the default changes. |
| 114 | + |
| 115 | +**Mitigation**: The feature is opt-in via a new API field. Existing workloads will continue to use mode 0777 unless explicitly configured otherwise. |
| 116 | + |
| 117 | +## Design Details |
| 118 | + |
| 119 | +### API Changes |
| 120 | + |
| 121 | +Add a new optional field to the `EmptyDirVolumeSource` struct: |
| 122 | + |
| 123 | +```go |
| 124 | +type EmptyDirVolumeSource struct { |
| 125 | + // ... existing fields ... |
| 126 | + |
| 127 | + // StickyBit sets the emptyDir permission to 01777 instead of 0777. |
| 128 | + // When enabled, only the owner of a file can delete or rename it, |
| 129 | + // even if the directory is world-writable. |
| 130 | + // This is similar to the /tmp directory behavior on Unix systems. |
| 131 | + // +optional |
| 132 | + StickyBit *bool `json:"stickyBit,omitempty" protobuf:"varint,3,opt,name=stickyBit"` |
| 133 | +} |
| 134 | +``` |
| 135 | + |
| 136 | +### Implementation |
| 137 | + |
| 138 | +The implementation is in the emptyDir volume plugin in `pkg/volume/emptydir/empty_dir.go`: |
| 139 | + |
| 140 | +1. Define constants for the sticky bit mode: |
| 141 | + ```go |
| 142 | + const ( |
| 143 | + stickyBitMode os.FileMode = 01000 |
| 144 | + defaultPerm os.FileMode = 0777 |
| 145 | + ) |
| 146 | + ``` |
| 147 | + |
| 148 | +2. When creating the emptyDir directory, check if the `StickyBit` field is set: |
| 149 | + ```go |
| 150 | + perm := defaultPerm |
| 151 | + if ed.stickyBit != nil && *ed.stickyBit { |
| 152 | + perm = defaultPerm | stickyBitMode |
| 153 | + } |
| 154 | + ``` |
| 155 | + |
| 156 | +3. Apply the appropriate permissions when creating the directory |
| 157 | + |
| 158 | +### Test Plan |
| 159 | + |
| 160 | +[x] I/we understand the owners of the involved components may require updates to |
| 161 | +existing tests to make this code solid enough prior to committing the changes necessary |
| 162 | +to implement this enhancement. |
| 163 | + |
| 164 | +#### Prerequisite testing updates |
| 165 | + |
| 166 | +No prerequisite testing updates are required. The emptyDir volume plugin already has good test coverage. |
| 167 | + |
| 168 | +#### Unit tests |
| 169 | + |
| 170 | +Unit tests have been added to verify: |
| 171 | +- Directory creation with sticky bit enabled results in mode 01777 |
| 172 | +- Directory creation with sticky bit disabled or unset results in mode 0777 |
| 173 | + |
| 174 | +Coverage: |
| 175 | +- `pkg/volume/emptydir`: Unit tests cover the sticky bit implementation and default behavior |
| 176 | + |
| 177 | +#### Integration tests |
| 178 | + |
| 179 | +If needed, integration tests could additionally verify: |
| 180 | +- A pod with emptyDir volume and stickyBit enabled mounts correctly |
| 181 | +- Older kubelets ignore the field gracefully |
| 182 | + |
| 183 | +#### e2e tests |
| 184 | + |
| 185 | +TBD - e2e tests will be added as part of the implementation. |
| 186 | + |
| 187 | +### Graduation Criteria |
| 188 | + |
| 189 | +#### Alpha |
| 190 | + |
| 191 | +- API field implemented and functional |
| 192 | +- Unit tests passing |
| 193 | +- Documentation available |
| 194 | + |
| 195 | +#### Beta |
| 196 | + |
| 197 | +- No major bugs reported during alpha |
| 198 | +- Gather feedback from users |
| 199 | + |
| 200 | +#### GA |
| 201 | + |
| 202 | +- Stable for at least two releases |
| 203 | +- No major issues reported |
| 204 | + |
| 205 | +### Upgrade / Downgrade Strategy |
| 206 | + |
| 207 | +No special upgrade/downgrade handling is needed. The `stickyBit` field is optional and ignored by older kubelets that don't recognize it. |
| 208 | + |
| 209 | +### Version Skew Strategy |
| 210 | + |
| 211 | +The feature is kubelet-only. Older kubelets will ignore the `stickyBit` field and create emptyDir volumes with the default mode 0777. This is safe as it matches the previous behavior. |
| 212 | + |
| 213 | +## Production Readiness Review Questionnaire |
| 214 | + |
| 215 | +### Feature Enablement and Rollback |
| 216 | + |
| 217 | +###### How can this feature be enabled / disabled in a live cluster? |
| 218 | + |
| 219 | +- [ ] Feature gate (also fill in values in `kep.yaml`) |
| 220 | + - Feature gate name: |
| 221 | + - Components depending on the feature gate: |
| 222 | +- [x] Other |
| 223 | + - Describe the mechanism: The feature is enabled per-volume by setting `stickyBit: true` on an emptyDir volume in the pod spec. No feature gate is required as this is a simple opt-in API field. |
| 224 | + - Will enabling / disabling the feature require downtime of the control plane? No |
| 225 | + - Will enabling / disabling the feature require downtime or reprovisioning of a node? No |
| 226 | + |
| 227 | +###### Does enabling the feature change any default behavior? |
| 228 | + |
| 229 | +No. The feature only takes effect when users explicitly set `stickyBit: true` on an emptyDir volume. Existing emptyDir volumes and new emptyDir volumes without the field continue to use mode 0777. |
| 230 | + |
| 231 | +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? |
| 232 | + |
| 233 | +Yes. Since there is no feature gate, the feature is controlled per-pod by setting or omitting the `stickyBit` field. To "disable" the feature, simply remove `stickyBit: true` from pod specs. |
| 234 | + |
| 235 | +If rolling back to an older kubelet version that doesn't support the field, the field will be ignored and emptyDir volumes will be created with mode 0777. |
| 236 | + |
| 237 | +**Impact on existing workloads**: Pods that were running with sticky bit enabled will continue to run unchanged (the directory permissions don't change after creation). However, new pods or pods that are rescheduled will have emptyDir volumes created with mode 0777 instead of 01777, which could affect application behavior if the application relies on the sticky bit behavior. |
| 238 | + |
| 239 | +###### What happens if we reenable the feature if it was previously rolled back? |
| 240 | + |
| 241 | +The feature will work as expected for new pods. Existing pods that were created while the feature was disabled will continue to use mode 0777 until they are deleted and recreated. |
| 242 | + |
| 243 | +###### Are there any tests for feature enablement/disablement? |
| 244 | + |
| 245 | +Yes, unit tests verify that: |
| 246 | +- When `stickyBit: true`, the directory is created with mode 01777 |
| 247 | +- When `stickyBit` is false or unset, the directory is created with mode 0777 |
| 248 | +- The default behavior (mode 0777) is preserved when the field is not specified |
| 249 | + |
| 250 | +### Rollout, Upgrade and Rollback Planning |
| 251 | + |
| 252 | +###### How can a rollout or rollback fail? Can it impact already running workloads? |
| 253 | + |
| 254 | +**Rollout failure scenarios**: |
| 255 | +- If the feature has bugs that cause emptyDir volume creation to fail, pods using `stickyBit: true` will fail to start |
| 256 | +- If the host OS or filesystem doesn't support sticky bit (unlikely on standard Linux), volume creation could fail |
| 257 | + |
| 258 | +**Impact on running workloads**: |
| 259 | +- Already running workloads are not affected by enabling or disabling the feature |
| 260 | +- Only new pods or rescheduled pods are affected |
| 261 | +- The feature is opt-in, so workloads that don't use it are unaffected |
| 262 | + |
| 263 | +**Rollback scenarios**: |
| 264 | +- Rolling back to an older kubelet is safe and will not affect running pods |
| 265 | +- On older kubelets, new pods with `stickyBit: true` will get mode 0777 instead of 01777 (the field is ignored), which is a functional change but not a failure |
| 266 | + |
| 267 | +###### What specific metrics should inform a rollback? |
| 268 | + |
| 269 | +Increased pod startup failures or volume mount errors correlated with pods using `stickyBit: true`. |
| 270 | + |
| 271 | +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? |
| 272 | + |
| 273 | +Not yet. Will be tested manually before release. |
| 274 | + |
| 275 | +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? |
| 276 | + |
| 277 | +No. |
| 278 | + |
| 279 | +### Monitoring Requirements |
| 280 | + |
| 281 | +###### How can an operator determine if the feature is in use by workloads? |
| 282 | + |
| 283 | +Operators can: |
| 284 | +1. Query the API server for pods with emptyDir volumes that have `stickyBit: true`: |
| 285 | + ```bash |
| 286 | + kubectl get pods -A -o json | jq '.items[] | select(.spec.volumes[]?.emptyDir?.stickyBit == true)' |
| 287 | + ``` |
| 288 | +2. Check kubelet logs for messages related to sticky bit creation |
| 289 | +3. Inspect pod specifications directly |
| 290 | + |
| 291 | +###### How can someone using this feature know that it is working for their instance? |
| 292 | + |
| 293 | +- [x] Other (treat as last resort) |
| 294 | + - Details: Users can verify the feature is working by: |
| 295 | + 1. Creating a pod with an emptyDir volume with `stickyBit: true` |
| 296 | + 2. Exec into the pod and check the directory permissions: `ls -ld /path/to/emptydir` |
| 297 | + 3. Verify the permissions show `drwxrwxrwt` (mode 01777, the 't' at the end indicates sticky bit) |
| 298 | + 4. Test the behavior by creating a file as one user and attempting to delete it as another user |
| 299 | + |
| 300 | +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? |
| 301 | + |
| 302 | +This feature should not affect existing SLOs. The performance impact should be negligible |
| 303 | + |
| 304 | +- emptyDir volume creation time should not be measurably affected |
| 305 | +- Pod startup time should not be measurably affected |
| 306 | + |
| 307 | +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? |
| 308 | + |
| 309 | +- [ ] Metrics |
| 310 | + - Metric name: storage_operation_duration_seconds (existing metric) |
| 311 | + - Components exposing the metric: kubelet |
| 312 | + - This metric can be filtered by operation_name="setup" to track emptyDir volume creation time |
| 313 | + |
| 314 | +Operators should monitor: |
| 315 | +- Pod startup failures |
| 316 | +- Volume mount failures |
| 317 | +- kubelet errors |
| 318 | + |
| 319 | +###### Are there any missing metrics that would be useful to have to improve observability of this feature? |
| 320 | + |
| 321 | +No additional metrics are needed. The feature is a simple file permission change and can be observed using existing pod and volume metrics. |
| 322 | + |
| 323 | +### Dependencies |
| 324 | + |
| 325 | +###### Does this feature depend on any specific services running in the cluster? |
| 326 | + |
| 327 | +No. The feature only depends on: |
| 328 | +- The host OS supporting the sticky bit permission (standard on all Linux systems) |
| 329 | +- The filesystem supporting sticky bit (standard on all major filesystems) |
| 330 | + |
| 331 | +### Scalability |
| 332 | + |
| 333 | +###### Will enabling / using this feature result in any new API calls? |
| 334 | + |
| 335 | +No. |
| 336 | + |
| 337 | +###### Will enabling / using this feature result in introducing new API types? |
| 338 | + |
| 339 | +No. It adds a new field to an existing API type (EmptyDirVolumeSource). |
| 340 | + |
| 341 | +###### Will enabling / using this feature result in any new calls to the cloud provider? |
| 342 | + |
| 343 | +No. |
| 344 | + |
| 345 | +###### Will enabling / using this feature result in increasing size or count of the existing API objects? |
| 346 | + |
| 347 | +- API type(s): Pod (EmptyDirVolumeSource) |
| 348 | +- Estimated increase in size: One additional boolean field per emptyDir volume that uses the feature, when set |
| 349 | + |
| 350 | +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? |
| 351 | + |
| 352 | +No. The performance impact should be negligible |
| 353 | + |
| 354 | +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? |
| 355 | + |
| 356 | +No. The feature only changes one argument to a mkdir system call. |
| 357 | + |
| 358 | +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? |
| 359 | + |
| 360 | +No. |
| 361 | + |
| 362 | +### Troubleshooting |
| 363 | + |
| 364 | +###### How does this feature react if the API server and/or etcd is unavailable? |
| 365 | + |
| 366 | +The feature is implemented in the kubelet and does not depend on the API server or etcd after the pod spec has been retrieved. |
| 367 | +###### What are other known failure modes? |
| 368 | + |
| 369 | +None beyond the standard emptyDir failure modes. |
| 370 | + |
| 371 | +###### What steps should be taken if SLOs are not being met to determine the problem? |
| 372 | + |
| 373 | +This feature should not affect SLOs. If pod startup or volume mounting SLOs are not being met, check if the affected pods are using `stickyBit: true` and verify kubelet logs for errors. |
| 374 | + |
| 375 | +## Implementation History |
| 376 | + |
| 377 | +- 2025-02-19 Initial implementation started (kubernetes/kubernetes#130277) |
| 378 | +- 2025-08-25 KEP issue created (kubernetes/enhancements#5502) |
| 379 | +- 2026-01-30: KEP created for alpha in v1.36 |
| 380 | + |
| 381 | +## Drawbacks |
| 382 | + |
| 383 | +- Adds a new API field, slightly increasing API surface |
| 384 | +- Users unfamiliar with Unix permissions may be confused by sticky bit behavior |
| 385 | +- Not supported on Windows (but emptyDir permissions work differently there anyway) |
| 386 | + |
| 387 | +## Alternatives |
| 388 | + |
| 389 | +### Alternative 1: Provide more flexible mount options on emptyDir |
| 390 | + |
| 391 | +There appears to be interested to provide more configuration options for mounting, that could entail setting permissions. |
| 392 | + |
| 393 | +References: https://github.com/kubernetes/enhancements/pull/5856 |
| 394 | + |
| 395 | +## Infrastructure Needed (Optional) |
0 commit comments