fix(pathfinder): use CTK canary fallback for header discovery#1731
fix(pathfinder): use CTK canary fallback for header discovery#1731rwgk merged 3 commits intoNVIDIA:mainfrom
Conversation
Reuse the CTK root canary probe for CTK header lookup when site-packages, conda, and CUDA_HOME/CUDA_PATH are unavailable, avoiding hardcoded default install paths. Add tests for fallback success, search-order precedence, and non-fatal canary miss behavior. Made-with: Cursor
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
| """ | ||
| try: | ||
| canary_abs_path = _resolve_system_loaded_abs_path_in_subprocess("cudart") | ||
| except (ChildProcessError, RuntimeError): |
There was a problem hiding this comment.
The try-except will mask bugs, we should remove it completely here.
The canary probe feature will become critical for users, therefore we should hold the implementation quality to the same standards as any other pathfinder code, since it's entirely owned by us. The only failure we're expecting is DynamicLibNotFoundError, which is already handled in canary_probe_subprocess.py. Everything else is a bug we want to know about and fix, so that users can have full confidence in the feature.
Avoid masking canary subprocess failures during CTK header discovery so probe bugs are visible. Update header-discovery tests so only a None canary result is non-fatal while runtime probe errors are asserted. Made-with: Cursor
rwgk
left a comment
There was a problem hiding this comment.
You're faster here than I'm with my cupti PR. I'll set this to auto-merge, then I'll update my release notes PR.
|
/ok to test 78be34e |
|
/ok to test |
|
* Add Linux support for loading libcupti.so.12 and libcupti.so.13 This commit adds support for finding and loading CUPTI libraries on Linux through cuda.pathfinder. It implements support for all enumerated installation methods: - Site-packages: nvidia/cuda_cupti/lib (CUDA 12) and nvidia/cu13/lib (CUDA 13) - Conda: $CONDA_PREFIX/lib (colocated with other CUDA libraries) - CTK via CUDA_HOME: $CUDA_HOME/extras/CUPTI/lib64 - CTK via canary probe: system CTK root discovery (similar to nvvm) Changes: - Add 'cupti' to supported library names and SONAMEs - Add site-packages paths for CUDA 12 and 13 - Add cupti to CTK root canary discoverable libraries - Update find_nvidia_dynamic_lib to handle extras/CUPTI/lib64 path - Add logic to distinguish CTK (extras/CUPTI/lib64) vs conda (lib) paths - Update _find_so_using_lib_dir to support versioned libraries via glob - Add comprehensive mock tests covering all installation methods Fixes #1572 (Linux support) Made-with: Cursor * Update cupti tests to use new SearchContext-based API Migrated test_load_nvidia_dynamic_lib_using_mocker.py from the old _FindNvidiaDynamicLib API to the new descriptor-based SearchContext API. Changes: - Replace _FindNvidiaDynamicLib imports with search_steps and load_nvidia_dynamic_lib modules - Update mocks to use run_find_steps, LOADER, and SearchContext - Use LIB_DESCRIPTORS to get cupti descriptor - Update all test functions to work with the new search step architecture Made-with: Cursor * Remove unused CTK canary variables from supported_nvidia_libs.py These variables (_CTK_ROOT_CANARY_ANCHOR_LIBNAMES and _CTK_ROOT_CANARY_DISCOVERABLE_LIBNAMES) were added in the cupti PR but are not used in the new descriptor-based architecture. The new code uses desc.ctk_root_canary_anchor_libnames directly from descriptors. Made-with: Cursor * Improve comment for change in LinuxSearchPlatform.find_in_lib_dir() * Add cputi to cu12, cu13 groups in cuda_pathfinder/pyproject.toml * Add cuda_cupti to cuda-components in .github/actions/fetch_ctk/action.yml * Add windows_dlls, site_packages_windows, anchor_rel_dirs_windows for cupti in /descriptor_catalog.py * test: Refactor cupti mock tests to focus on Conda and error paths Remove tests covered by real CI: - Site-packages tests (CUDA 12 and 13) - covered by real CI - CTK tests (CUDA_HOME and canary probe) - covered by real CI - Search order tests involving site-packages/CTK - covered by real CI Keep tests not covered by real CI: - Conda discovery test - Conda not covered by real CI - Error path test (not found) - error path not covered - Conda vs CTK search order test - Conda not covered by real CI Also remove unused imports and helper functions. Made-with: Cursor * Add pathfinder release/1.4.1-notes.rst * Add PR #1731 to release/1.4.1-notes.rst
Summary
CUDA_HOME/CUDA_PATHdo not resolve headers.cudartpath and search CTK include layout from that root, returningfound_via="system-ctk-root".CUDA_HOMEbefore canary), and non-fatal canary-miss behavior; update CTK header search-order docs.Closes #1707.
Test plan
pixi run pytest tests/test_find_nvidia_headers.py tests/test_ctk_root_discovery.py(fromcuda_pathfinder/)pixi run test(repo root)Made with Cursor