Make PRESENCE_CHECK_TIMEOUT configurable via env var#1470
Make PRESENCE_CHECK_TIMEOUT configurable via env var#1470DudeRandom21 wants to merge 2 commits intomainfrom
Conversation
Deploy tasks can be reaped as dead during long bundle install steps (e.g. native extension compilation) because the hardcoded 30-second presence check timeout is too short. Add SHIPIT_PRESENCE_CHECK_TIMEOUT env var to allow configuring this per-environment while preserving the 30-second default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
edilsonacjr
left a comment
There was a problem hiding this comment.
Only one concern, not a huge problem but something that we can easily safeguard against.
Review assisted by pair-review
lib/shipit.rb
Outdated
| end | ||
|
|
||
| def presence_check_timeout | ||
| ENV.fetch('SHIPIT_PRESENCE_CHECK_TIMEOUT', 30).to_i |
There was a problem hiding this comment.
🐛 Bug: If SHIPIT_PRESENCE_CHECK_TIMEOUT is set to a non-numeric string (e.g. 'abc'), String#to_i returns 0. If set to '0' or a negative number, the parsed value is also non-positive. This value is passed as the ex: parameter to Shipit.redis.set(status_key, 'alive', ex: Shipit.presence_check_timeout) in Task#ping (task.rb:329). Redis requires the EX argument to be a strictly positive integer — a value of 0 or less raises Redis::CommandError: ERR invalid expire time in 'set' command. This would silently break liveness pings for all running tasks, causing ReapDeadTasksJob to reap them as dead — a worse outcome than the long-timeout problem this PR aims to solve.
Suggestion: Add a lower bound to ensure the timeout is always a positive integer:
| ENV.fetch('SHIPIT_PRESENCE_CHECK_TIMEOUT', 30).to_i | |
| [ENV.fetch('SHIPIT_PRESENCE_CHECK_TIMEOUT', 30).to_i, 1].max |
Alternatively, you could raise a clear error at boot time if the value is invalid, rather than silently clamping.
78171eb to
cc318c1
Compare
Summary
SHIPIT_PRESENCE_CHECK_TIMEOUTenvironment variable to configure the presence check timeout (default: 30s)bundle installsteps (native extension compilation) where the hardcoded 30s timeout was too shortPRESENCE_CHECK_TIMEOUTconstant fromTaskin favor ofShipit.presence_check_timeoutTest plan
SHIPIT_PRESENCE_CHECK_TIMEOUTis set🤖 Generated with Claude Code