Fix: eliminate Windows PID race condition by replacing concorekill.bat overwrite mechanism with safe PID registry (#391)#465
Conversation
|
@GaneshPatil7517 there is a merge conflict. Pls check. |
…try (ControlCore-Project#391) Problem: On Windows, concore.py wrote a single PID to concorekill.bat at import time using 'w' mode. When multiple Python nodes started simultaneously (via makestudy), each overwrote the file. Only the last node's PID survived, making the kill script unable to terminate other nodes. Stale PIDs from crashed studies could also kill unrelated processes. Solution: - Replace single-PID overwrite with append-based PID registry (concorekill_pids.txt) storing one PID per line - Register atexit handler to remove current PID on graceful shutdown - Generate concorekill.bat dynamically with PID validation: each PID is checked via tasklist before taskkill executes - Clean up both files when last node exits - Backward compatible: users still run concorekill.bat Added 9 tests covering registration, cleanup, multi-node scenarios, missing registry handling, and kill script generation. Fixes ControlCore-Project#391
a3b873f to
d82fe38
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses Windows-specific reliability issues in concore.py by replacing the import-time “single PID overwrite” concorekill.bat mechanism with a multi-process PID registry and a regenerated kill script, aiming to eliminate PID overwrite races and reduce stale-PID risk.
Changes:
- Add a Windows PID registry (
concorekill_pids.txt) with per-process registration andatexitcleanup. - Generate a
concorekill.batscript that iterates the registry and validates PIDs viatasklistbeforetaskkill. - Add a new test suite (
TestPidRegistry) covering registration, cleanup behavior, and kill-script generation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
concore.py |
Implements PID registry, kill script generation, and exit-time cleanup on Windows. |
tests/test_concore.py |
Adds tests validating registry behavior and generated kill script contents. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… endings, format with ruff - Add msvcrt.locking in _cleanup_pid() to prevent race conditions when multiple nodes exit concurrently (Windows file lock) - Use 'usebackq' and quote %~dp0 path in for /f loop so kill script works in directories with spaces/parentheses - Open kill script with newline='' to prevent double-CR (\r\r\n) - Improve test_import_registers_pid_on_windows to actually test import-time side effect by forcing module reimport - Run ruff format on test_concore.py and test_read_status.py
…us.py (ruff F401, F841)
|
hello @pradeeban Sir resolved merge conflict and the pr is ready to review please review it when you get the time... |
|
@GaneshPatil7517 merge conflicts because I merged your other PR. Also, this PR is crowded with too many irrelevant changes, like linechanges, removal of empty lines, etc. Such changes should not be in a PR. These are the changes that are causing merge conflicts. Please keep your PRs minimal. Please create a new PR with just the minimal changes necessary. |
Problem
On Windows,
concore.pywrote a single PID toconcorekill.batat module import timeusing write mode (
"w"). When multiple Python nodes launched simultaneously viamakestudy, each node overwrote the file — only the last node's PID survived.This caused:
concorekill.batto an unrelated process, risking
taskkillterminating the wrong programSolution
Replaced the single-PID overwrite with an append-based PID registry system:
concorekill.batwith one hardcodedtaskkilllineconcorekill_pids.txtregistry (one PID per line, append mode)atexithandler removes current PID on shutdowntaskkillconcorekill.batvalidates each PID viatasklistbefore killingKey changes in
concore.py:_register_pid()— appends current PID toconcorekill_pids.txt_cleanup_pid()— removes current PID on exit; deletes both files when last node exits_write_kill_script()— generatesconcorekill.batthat checks each PID is a running Python process before issuingtaskkillBackward compatibility:
Users still run
concorekill.batexactly as before — no workflow change required.Only Python layer modified; C++, MATLAB, and Verilog implementations are untouched.
Tests
Added 9 tests in
TestPidRegistry:tasklist+taskkill)All 9 new tests pass. All pre-existing tests unaffected.
Fixes
Closes #391