mapcache_seed: Refactor seeder for robust multiprocessing and shutdown#359
mapcache_seed: Refactor seeder for robust multiprocessing and shutdown#359marisn wants to merge 2 commits intoMapServer:mainfrom
Conversation
This commit is a major overhaul of the mapcache_seed utility, addressing critical bugs in signal handling,
multiprocessing, and thread synchronization that could cause deadlocks or unresponsive behavior.
The key changes are:
1. Robust Signal Handling and Shutdown:
The Ctrl+C (SIGINT) handling has been completely rewritten to ensure a prompt and clean shutdown in all modes, fixing
numerous deadlocks and race conditions.
- For Multiprocessing:
- pop_queue() now checks a sig_int_received flag at the start to ensure child processes exit immediately after
finishing their current task.
- push_queue()'s EINTR retry loop now also checks the signal flag, preventing the parent process from deadlocking on
a full message queue after a signal has been received.
- For Multithreading:
- On SIGINT, the feed_worker now calls apr_queue_interrupt_all() to wake all blocked threads. pop_queue() correctly
handles the resulting APR_EOF to stop the worker.
- A race condition where a thread could start a new job just as a signal was received has been fixed by re-checking
the signal flag after a work item is successfully popped from the queue.
- Graceful vs. Urgent Shutdown:
- The final loop in feed_worker() that sends STOP commands is now wrapped in an if(!sig_int_received) block. This
ensures it only runs during a normal, graceful shutdown and is skipped on Ctrl+C, preventing a major deadlock.
2. Multiprocessing Enhancements:
The -p mode has been significantly improved for stability and correctness.
- IPC-based Logging: A new, dedicated System V message queue (log_msqid) has been implemented for logging in
multiprocessing mode. This decouples logging from the main work queue, improving performance and stability.
log_thread_fn and seed_worker were updated to use this new mechanism.
- `-p 1` Bugfix: A long-standing bug that caused the seeder to stall when using -p 1 has been fixed by changing all
relevant logic guards from nprocesses > 1 to nprocesses >= 1.
- IPC Creation: Message queues are now created using IPC_PRIVATE instead of ftok, which is more robust.
3. Code Quality and Minor Fixes:
- child_init is correctly called at start of each worker process after fork().
- sig_int_received is now correctly typed as volatile sig_atomic_t.
- memset() is now used to initialize structs before use.
- The help text for the -p option has been clarified.
Co-authored with gemini-2.5-pro
|
Sorry for large blob, but it was hard to understand causes and best fixes for all dead-locking or use of garbage values cases. |
|
@marisn - I'm not familiar enough with the MapCache code to know if these changes should be merged. To move it forward, it would be useful to add a basic test (just running the seeder with the
If using this switch currently crashes the seeder then adding these as a separate PR showing them failing, and then showing this PR fixes them would help get it merged. |
I overlooked the proposal to have a separate PR. Don't know how good idea it is – I run tests on my local machine – as there is a deadlock, it just hangs forever – would tie-up CI runners till timeout is reached. |
This commit is a major overhaul of the mapcache_seed utility, addressing critical bugs in signal handling, multiprocessing, and thread synchronization that could cause deadlocks or unresponsive behavior.
The key changes are:
The Ctrl+C (SIGINT) handling has been completely rewritten to ensure a prompt and clean shutdown in all modes, fixing numerous deadlocks and race conditions.
For Multiprocessing:
pop_queue()now checks asig_int_receivedflag at the start to ensure child processes exit immediately after finishing their current task.push_queue()'s EINTR retry loop now also checks the signal flag, preventing the parent process from deadlocking on a full message queue after a signal has been received.For Multithreading:
apr_queue_interrupt_all()to wake all blocked threads.pop_queue()correctly handles the resulting APR_EOF to stop the worker.Graceful vs. Urgent Shutdown:
feed_worker()that sends STOP commands is now wrapped in anif(!sig_int_received)block. This ensures it only runs during a normal, graceful shutdown and is skipped on Ctrl+C, preventing a major deadlock.The -p mode has been significantly improved for stability and correctness.
log_thread_fnandseed_workerwere updated to use this new mechanism.-p 1Bugfix: A long-standing bug that caused the seeder to stall when using -p 1 has been fixed by changing all relevant logic guards fromnprocesses > 1tonprocesses >= 1.sig_int_receivedis now correctly typed asvolatile sig_atomic_t.memset()is now used to initialize structs before use.-poption has been clarified.Co-authored with gemini-2.5-pro