Fix background thread initialization race#68
Conversation
|
Hi @puzpuzpuz! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
e52a78e to
05ebbdd
Compare
|
The build seems to be timing out on ARM. Not sure if that has anything to do with the patch. |
Thank you for the patch! These kinds of initialization bugs can be very tricky to root cause. I saw a similar one recently where the page size at compile time did not match the page size at runtime caused a multi-threaded Python process to deadlock when it called mallctl just to see if jemalloc was in use. I will take a closer look a the fix alongside the team next week. If possible, I would like to add a test for this. (Might require a follow-up commit.) As is, multi-threaded initialization does not have as much coverage as it deserves. |
e52a78e to
3c2d2da
Compare
I've added some multi-threaded tests in 3c2d2da. PTAL |
The race condition happens when per-cpu arenas and background threads features are enabled:
malloc()first -> startsmalloc_init_hard()malloc_init_hard(), Threads B, C, D also callmalloc()arena_init()->arena_new_create_background_thread()->background_thread_create(arena_ind=12, etc.)background_thread_create(tsd, 0)) yetThe initialization has locks, but arena creation for different arenas can proceed in parallel as soon as
malloc_init_state = initializedandinit_lockis released, and thebackground_thread_create(tsd, 0)call happens late inmalloc_init_hard()(after arenas are already being created by other threads).This bug won't be noticeable for applications that have single-threaded initialization path, but in case of JVM it's easily reproducible. When background thread initialization silently fails due to the race, RSS grows infinitely until the process gets OOM killed.