Skip to content

Make PRESENCE_CHECK_TIMEOUT configurable via env var#1470

Open
DudeRandom21 wants to merge 2 commits intomainfrom
configurable-presence-check-timeout
Open

Make PRESENCE_CHECK_TIMEOUT configurable via env var#1470
DudeRandom21 wants to merge 2 commits intomainfrom
configurable-presence-check-timeout

Conversation

@DudeRandom21
Copy link
Contributor

Summary

  • Add SHIPIT_PRESENCE_CHECK_TIMEOUT environment variable to configure the presence check timeout (default: 30s)
  • Deploy tasks were being reaped as dead during long bundle install steps (native extension compilation) where the hardcoded 30s timeout was too short
  • Remove hardcoded PRESENCE_CHECK_TIMEOUT constant from Task in favor of Shipit.presence_check_timeout

Test plan

  • Verify existing Task and ReapDeadTasksJob tests pass
  • Confirm default (30) is preserved when env var is not set
  • Confirm configured value is used when SHIPIT_PRESENCE_CHECK_TIMEOUT is set

🤖 Generated with Claude Code

Deploy tasks can be reaped as dead during long bundle install steps
(e.g. native extension compilation) because the hardcoded 30-second
presence check timeout is too short. Add SHIPIT_PRESENCE_CHECK_TIMEOUT
env var to allow configuring this per-environment while preserving the
30-second default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@edilsonacjr edilsonacjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one concern, not a huge problem but something that we can easily safeguard against.


Review assisted by pair-review

lib/shipit.rb Outdated
end

def presence_check_timeout
ENV.fetch('SHIPIT_PRESENCE_CHECK_TIMEOUT', 30).to_i
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 Bug: If SHIPIT_PRESENCE_CHECK_TIMEOUT is set to a non-numeric string (e.g. 'abc'), String#to_i returns 0. If set to '0' or a negative number, the parsed value is also non-positive. This value is passed as the ex: parameter to Shipit.redis.set(status_key, 'alive', ex: Shipit.presence_check_timeout) in Task#ping (task.rb:329). Redis requires the EX argument to be a strictly positive integer — a value of 0 or less raises Redis::CommandError: ERR invalid expire time in 'set' command. This would silently break liveness pings for all running tasks, causing ReapDeadTasksJob to reap them as dead — a worse outcome than the long-timeout problem this PR aims to solve.

Suggestion: Add a lower bound to ensure the timeout is always a positive integer:

Suggested change
ENV.fetch('SHIPIT_PRESENCE_CHECK_TIMEOUT', 30).to_i
[ENV.fetch('SHIPIT_PRESENCE_CHECK_TIMEOUT', 30).to_i, 1].max

Alternatively, you could raise a clear error at boot time if the value is invalid, rather than silently clamping.

@DudeRandom21 DudeRandom21 force-pushed the configurable-presence-check-timeout branch from 78171eb to cc318c1 Compare March 13, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants