NVIDIA · cpcloud · Mar 7, 2026 · Mar 6, 2026 · Mar 6, 2026
diff --git a/AGENTS.md b/AGENTS.md
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
diff --git a/cuda_bindings/AGENTS.md b/cuda_bindings/AGENTS.md
@@ -0,0 +1,67 @@
+This file describes `cuda_bindings`, the low-level CUDA host API bindings
+subpackage in the `cuda-python` monorepo.
+
+## Scope and principles
+
+- **Role**: provide low-level, close-to-CUDA interfaces under
+  `cuda.bindings.*` with broad API coverage.
+- **Style**: prioritize correctness and API compatibility over convenience
+  wrappers. High-level ergonomics belong in `cuda_core`, not here.
+- **Cross-platform**: preserve Linux and Windows behavior unless a change is
+  intentionally platform-specific.
+
+## Package architecture
+
+- **Public module layer**: Cython modules under `cuda/bindings/` expose user
+  APIs (`driver`, `runtime`, `nvrtc`, `nvjitlink`, `nvvm`, `cufile`, etc.).
+- **Internal binding layer**: `cuda/bindings/_bindings/` provides lower-level
+  glue and loader helpers used by public modules.
+- **Platform internals**: `cuda/bindings/_internal/` contains
+  platform-specific implementation files and support code.
+- **Build/codegen backend**: `build_hooks.py` drives header parsing, template
+  expansion, extension configuration, and Cythonization.
+
+## Generated-source workflow
+
+- **Do not hand-edit generated binding files**: many files under
+  `cuda/bindings/` (including `*.pyx`, `*.pxd`, `*.pyx.in`, and `*.pxd.in`)
+  are generated artifacts.
+- **Generated files are synchronized from another repository**: changes to these
+  files in this repo are expected to be overwritten by the next sync.
+- **If generated output must change**: make the change at the generation source
+  and sync the updated artifacts back here, rather than patching generated files
+  directly in this repo.
+- **Header-driven generation**: parser behavior and required CUDA headers are
+  defined in `build_hooks.py`; update those rules when introducing new symbols.
+- **Platform split files**: keep `_linux.pyx` and `_windows.pyx` variants
+  aligned when behavior should be equivalent.
+
+## Testing expectations
+
+- **Primary tests**: `pytest tests/`
+- **Cython tests**:
+  - build: `tests/cython/build_tests.sh` (or platform equivalent)
+  - run: `pytest tests/cython/`
+- **Examples**: example coverage is pytest-based under `examples/`.
+- **Benchmarks**: run with `pytest --benchmark-only benchmarks/` when needed.
+- **Orchestrated run**: from repo root, `scripts/run_tests.sh bindings`.
+
+## Build and environment notes
+
+- `CUDA_HOME` or `CUDA_PATH` must point to a valid CUDA Toolkit for source
+  builds that parse headers.
+- `CUDA_PYTHON_PARALLEL_LEVEL` controls build parallelism.
+- `CUDA_PYTHON_PARSER_CACHING` controls parser-cache behavior during generation.
+- Runtime behavior is affected by
+  `CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM` and
+  `CUDA_PYTHON_DISABLE_MAJOR_VERSION_WARNING`.
+
+## Editing guidance
+
+- Keep CUDA return/error semantics explicit and avoid broad fallback behavior.
+- Reuse existing helper layers (`_bindings`, `_internal`, `_lib`) before adding
+  new one-off utilities.
+- If you add or change exported APIs, update relevant docs under
+  `docs/source/module/` and tests in `tests/`.
+- Prefer changes that are easy to regenerate/rebuild rather than patching
+  generated output directly.
diff --git a/cuda_bindings/CLAUDE.md b/cuda_bindings/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
diff --git a/cuda_core/AGENTS.md b/cuda_core/AGENTS.md
@@ -0,0 +1,65 @@
+This file describes `cuda_core`, the high-level Pythonic CUDA subpackage in the
+`cuda-python` monorepo.
+
+## Scope and principles
+
+- **Role**: provide higher-level CUDA abstractions (`Device`, `Stream`,
+  `Program`, `Linker`, memory resources, graphs) on top of `cuda.bindings`.
+- **API intent**: keep interfaces Pythonic while preserving explicit CUDA
+  behavior and error visibility.
+- **Compatibility**: changes should remain compatible with supported
+  `cuda.bindings` major versions (12.x and 13.x).
+
+## Package architecture
+
+- **Main package**: `cuda/core/` contains most Cython modules (`*.pyx`, `*.pxd`)
+  implementing runtime behaviors and public objects.
+- **Subsystems**:
+  - memory/resource stack: `cuda/core/_memory/`
+  - system-level APIs: `cuda/core/system/`
+  - compile/link path: `_program.pyx`, `_linker.pyx`, `_module.pyx`
+  - execution path: `_launcher.pyx`, `_launch_config.pyx`, `_stream.pyx`
+- **C++ helpers**: module-specific C++ implementations live under
+  `cuda/core/_cpp/`.
+- **Build backend**: `build_hooks.py` handles Cython extension setup and build
+  dependency wiring.
+
+## Build and version coupling
+
+- `build_hooks.py` determines CUDA major version from `CUDA_CORE_BUILD_MAJOR`
+  or CUDA headers (`CUDA_HOME`/`CUDA_PATH`) and uses it for build decisions.
+- Source builds require CUDA headers available through `CUDA_HOME` or
+  `CUDA_PATH`.
+- `cuda_core` expects `cuda.bindings` to be present and version-compatible.
+
+## Testing expectations
+
+- **Primary tests**: `pytest tests/`
+- **Cython tests**:
+  - build: `tests/cython/build_tests.sh` (or platform equivalent)
+  - run: `pytest tests/cython/`
+- **Examples**: validate affected examples in `examples/` when changing user
+  workflows or public APIs.
+- **Orchestrated run**: from repo root, `scripts/run_tests.sh core`.
+
+## Runtime/build environment notes
+
+- Runtime env vars commonly relevant:
+  - `CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM`
+  - `CUDA_PYTHON_DISABLE_MAJOR_VERSION_WARNING`
+- Build env vars commonly relevant:
+  - `CUDA_HOME` / `CUDA_PATH`
+  - `CUDA_CORE_BUILD_MAJOR`
+  - `CUDA_PYTHON_PARALLEL_LEVEL`
+  - `CUDA_PYTHON_COVERAGE`
+
+## Editing guidance
+
+- Keep user-facing behaviors coherent with docs and examples, especially around
+  stream semantics, memory ownership, and compile/link flows.
+- Reuse existing shared utilities in `cuda/core/_utils/` before adding new
+  helpers.
+- When changing Cython signatures or cimports, verify related `.pxd` and
+  call-site consistency.
+- Prefer explicit error propagation over silent fallback paths.
+- If you change public behavior, update tests and docs under `docs/source/`.
diff --git a/cuda_core/CLAUDE.md b/cuda_core/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
diff --git a/cuda_pathfinder/AGENTS.md b/cuda_pathfinder/AGENTS.md
@@ -0,0 +1,72 @@
+This file describes `cuda_pathfinder`, a Python sub-package of
+[cuda-python](https://github.com/NVIDIA/cuda-python). It locates and loads
+NVIDIA dynamic libraries (CTK, third-party, and driver) across Linux and
+Windows.
+
+## Scope and principles
+
+- **Language**: all implementation code in this package is pure Python.
+- **Public API**: keep user-facing imports stable via `cuda.pathfinder`.
+  Internal modules should stay under `cuda.pathfinder._*`.
+- **Behavior**: loader behavior must remain deterministic and explicit. Avoid
+  "best effort" silent fallbacks that mask why discovery/loading failed.
+- **Cross-platform**: preserve Linux and Windows behavior parity unless a change
+  is explicitly platform-scoped.
+
+## Package architecture
+
+- **Descriptor source-of-truth**: `cuda/pathfinder/_dynamic_libs/descriptor_catalog.py`
+  defines canonical metadata for known libraries.
+- **Registry layers**:
+  - `lib_descriptor.py` builds the name-keyed runtime registry from the catalog.
+  - `supported_nvidia_libs.py` keeps legacy exported tables derived from the
+    catalog for compatibility.
+- **Search pipeline**:
+  - `search_steps.py` implements composable find steps (`site-packages`,
+    `CONDA_PREFIX`, `CUDA_HOME`/`CUDA_PATH`, canary-assisted CTK root flow).
+  - `search_platform.py` and `platform_loader.py` isolate OS-specific logic.
+- **Load orchestration**:
+  - `load_nvidia_dynamic_lib.py` coordinates find/load phases, dependency
+    loading, driver-lib fast path, and cache semantics.
+- **Process isolation helper**:
+  - `cuda/pathfinder/_utils/spawned_process_runner.py` is used where process
+    global dynamic loader state would otherwise leak across tests.
+
+## Editing guidance
+
+- **Edit authored descriptors, not derived tables**: when adding/changing a
+  library, update `descriptor_catalog.py` first; keep derived exports in sync
+  through existing conversion logic and tests.
+- **Respect cache semantics**: `load_nvidia_dynamic_lib` is cached. Never add
+  behavior that closes returned handles or assumes repeated fresh loads.
+- **Keep error contracts intact**:
+  - unknown name -> `DynamicLibUnknownError`
+  - known but unsupported on this OS -> `DynamicLibNotAvailableError`
+  - known/supported but not found/loadable -> `DynamicLibNotFoundError`
+- **Do not hardcode host assumptions**: avoid baking in machine-local paths,
+  shell-specific quoting, or environment assumptions.
+- **Prefer focused abstractions**: if a change is platform-specific, route it
+  through existing platform abstraction points instead of branching in many call
+  sites.
+
+## Testing expectations
+
+- **Primary command**: run `pytest tests/` from `cuda_pathfinder/`.
+- **Real-loading tests**: prefer spawned child-process tests for actual dynamic
+  loading behavior; avoid in-process cross-test interference.
+- **Cache-aware tests**: if a test patches internals used by
+  `load_nvidia_dynamic_lib`, call `load_nvidia_dynamic_lib.cache_clear()`.
+- **Negative-case names**: use obviously fake names (for example
+  `"not_a_real_lib"`) in unknown/invalid-lib tests.
+- **INFO summary in CI logs**: use `info_summary_append` for useful
+  test-context lines visible in terminal summaries.
+- **Strictness toggle**:
+  `CUDA_PATHFINDER_TEST_LOAD_NVIDIA_DYNAMIC_LIB_STRICTNESS` controls whether
+  missing libraries are tolerated (`see_what_works`) or treated as failures
+  (`all_must_work`).
+
+## Useful commands
+
+- Run package tests: `pytest tests/`
+- Run package tests via orchestrator: `../scripts/run_tests.sh pathfinder`
+- Build package docs: `cd docs && ./build_docs.sh`
diff --git a/cuda_pathfinder/CLAUDE.md b/cuda_pathfinder/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
diff --git a/cuda_python/AGENTS.md b/cuda_python/AGENTS.md
@@ -0,0 +1,24 @@
+This file describes `cuda_python`, the metapackage layer in the `cuda-python`
+monorepo.
+
+## Scope
+
+- `cuda_python` is primarily packaging and documentation glue.
+- It does not host substantial runtime APIs like `cuda_core`,
+  `cuda_bindings`, or `cuda_pathfinder`.
+
+## Main files to edit
+
+- `pyproject.toml`: project metadata and dynamic dependency declaration.
+- `setup.py`: dynamic dependency pinning logic for matching `cuda-bindings`
+  versions (release vs pre-release behavior).
+- `docs/`: top-level docs build/aggregation scripts.
+
+## Editing guidance
+
+- Keep this package lightweight; prefer implementing runtime features in the
+  component packages rather than here.
+- Be careful when changing dependency/version logic in `setup.py`; preserve
+  compatibility between metapackage versioning and subpackage constraints.
+- If you update docs structure, ensure `docs/build_all_docs.sh` still collects
+  docs from `cuda_python`, `cuda_bindings`, `cuda_core`, and `cuda_pathfinder`.
diff --git a/cuda_python/CLAUDE.md b/cuda_python/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md