Skip to content

[pathfinder] Use LOAD_WITH_ALTERED_SEARCH_PATH for system DLL search on Windows#1506

Draft
rwgk wants to merge 5 commits intoNVIDIA:mainfrom
rwgk:pathfinder_LOAD_WITH_ALTERED_SEARCH_PATH
Draft

[pathfinder] Use LOAD_WITH_ALTERED_SEARCH_PATH for system DLL search on Windows#1506
rwgk wants to merge 5 commits intoNVIDIA:mainfrom
rwgk:pathfinder_LOAD_WITH_ALTERED_SEARCH_PATH

Conversation

@rwgk
Copy link
Collaborator

@rwgk rwgk commented Jan 16, 2026

Fixes #1781

Problem

On Windows, Python 3.8+ calls SetDefaultDllDirectories(LOAD_LIBRARY_SEARCH_DEFAULT_DIRS) at process startup. This restricts LoadLibraryExW(name, NULL, 0) to searching only:

  1. The application directory (Python's install directory)
  2. The system directory (System32)
  3. Directories explicitly added via AddDllDirectory()

PATH is excluded entirely. This means the pathfinder's load_with_system_search() silently fails for all CUDA Toolkit DLLs when CUDA_HOME/CUDA_PATH are not set, even though the DLLs are on PATH (e.g. C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin\x64\).

The bug went unnoticed because our Windows test environments always have CUDA_HOME or CUDA_PATH set, so the PATH-based fallback was never exercised.

Note: The initial diagnosis (Jan 2026) attributed the failure to co-located dependency resolution (error 126). Experiments in March 2026 showed the problem is more fundamental: LoadLibraryExW never finds the DLL in the first place. The fix turned out to be correct regardless.

Fix

Replace the bare LoadLibraryExW(dll_name, None, 0) call with a two-step approach:

  1. SearchPathW to locate the DLL's full path. SearchPathW is not affected by SetDefaultDllDirectories and always searches PATH.

  2. LoadLibraryExW(full_path, None, LOAD_WITH_ALTERED_SEARCH_PATH) to load the DLL by absolute path. LOAD_WITH_ALTERED_SEARCH_PATH additionally tells Windows to resolve the DLL's dependencies starting from its own directory, which is a useful secondary benefit for DLLs with co-located dependencies (e.g. nvrtc + nvrtc-builtins).

Test

Adds test_load_nvrtc_without_cuda_home_or_cuda_path, which loads nvrtc in two fresh subprocesses:

  • Phase 1: Load with the normal environment.
  • Phase 2: Load with CUDA_HOME and CUDA_PATH stripped.
  • Phase 3: If the normal load found nvrtc in a directory on PATH but the stripped load failed, the test fails — directly catching the bug.

Verified on Windows (without the fix applied to load_dl_windows.py):

CUDA_PATH set? Fix applied? Result
No N/A SKIPPED (nvrtc not found normally either)
Yes No (main) FAILED (found via CUDA_PATH, dir is on PATH, stripped load fails)
Yes Yes (this PR) PASSED (SearchPathW finds it via PATH)

…n Windows

When loading CUDA DLLs via system search on Windows, the previous
approach using LoadLibraryExW with flags=0 would find the DLL on PATH
but fail to locate its co-located dependencies (error 126).

This fix uses SearchPathW to first find the DLL's full path, then loads
it with LOAD_WITH_ALTERED_SEARCH_PATH so Windows searches for
dependencies starting from the DLL's directory.
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 16, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Collaborator Author

rwgk commented Jan 16, 2026

Archiving proof-of-concept code:

# LoadLibraryExW_nvrtc64_130_0_dll.py
import ctypes
kernel32 = ctypes.WinDLL("kernel32", use_last_error=True)

dll_path = r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1\bin\x64\nvrtc64_130_0.dll"
LOAD_WITH_ALTERED_SEARCH_PATH = 0x8

print("Trying with full path + LOAD_WITH_ALTERED_SEARCH_PATH...")
handle = kernel32.LoadLibraryExW(dll_path, None, LOAD_WITH_ALTERED_SEARCH_PATH)
if handle:
    print(f"SUCCESS: handle={handle}")
else:
    error = ctypes.get_last_error()
    print(f"Error code: {error}")
    print(f"Error message: {ctypes.FormatError(error)}")

@leofang leofang added this to the cuda.pathfinder 1.4.0 milestone Jan 16, 2026
@rwgk rwgk removed this from the cuda.pathfinder 1.4.0 milestone Mar 3, 2026
@rwgk rwgk added the cuda.pathfinder Everything related to the cuda.pathfinder module label Mar 3, 2026
rwgk added 2 commits March 18, 2026 16:54
Made-with: Cursor

# Conflicts:
#	cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_dl_windows.py
…_HOME/CUDA_PATH)

Loads nvrtc in a subprocess with CUDA_HOME and CUDA_PATH stripped from the
environment.  On Windows CI where nvrtc is only reachable via PATH, this
exercises the LOAD_WITH_ALTERED_SEARCH_PATH fix; on other platforms the
test passes harmlessly via whatever search path finds nvrtc first.

Made-with: Cursor
@rwgk
Copy link
Collaborator Author

rwgk commented Mar 19, 2026

NOTE: In the below, CUDA_PATH and CUDA_HOME are not set.

Sanity check: Reproduce original problem with the latest cuda-python main (@ commit a7670b6 == PR #1783):

PathfinderVenv) PS C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder> python -c "from cuda import pathfinder; print(pathfinder.__version__); print(pathfinder.load_nvidia_dynamic_lib('nvrtc'))"
1.4.4.dev16+g2f024a57e
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from cuda import pathfinder; print(pathfinder.__version__); print(pathfinder.load_nvidia_dynamic_lib('nvrtc'))
                                                                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
  File "C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder\cuda\pathfinder\_dynamic_libs\load_nvidia_dynamic_lib.py", line 289, in load_nvidia_dynamic_lib
    return _load_lib_no_cache(libname)
  File "C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder\cuda\pathfinder\_dynamic_libs\load_nvidia_dynamic_lib.py", line 181, in _load_lib_no_cache
    ctx.raise_not_found()
    ~~~~~~~~~~~~~~~~~~~^^
  File "C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder\cuda\pathfinder\_dynamic_libs\search_steps.py", line 66, in raise_not_found
    raise DynamicLibNotFoundError(f'Failure finding "{self.lib_searched_for}": {err}\n{att}')
cuda.pathfinder._dynamic_libs.load_dl_common.DynamicLibNotFoundError: Failure finding "nvrtc*.dll": No such file: nvrtc*.dll, No such file: nvrtc*.dll

Then git switching to this PR (everything else exactly equal; PR currently based on same main):

(PathfinderVenv) PS C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder> python -c "from cuda import pathfinder; print(pathfinder.__version__); print(pathfinder.load_nvidia_dynamic_lib('nvrtc'))"
1.4.4.dev16+g2f024a57e
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nvrtc64_130_0.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140715681251328, found_via='system-search')

Now moving nvrtc-builtins64_132.dll as admin:

PS C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin\x64> mv nvrtc-builtins64_132.dll __OFF__nvrtc-builtins64_132.dll

This still works:

(PathfinderVenv) PS C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder> python -c "from cuda import pathfinder; print(pathfinder.__version__); print(pathfinder.load_nvidia_dynamic_lib('nvrtc'))"
1.4.4.dev16+g2f024a57e
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nvrtc64_130_0.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140715574099968, found_via='system-search')

This also works:

(PathfinderVenv) PS C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder> python -c "from cuda import pathfinder; print(pathfinder.__version__); print(pathfinder.load_nvidia_dynamic_lib('cublasLt'))"
1.4.4.dev16+g2f024a57e
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\cublasLt64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140715125964800, found_via='system-search')

But not when git switching back to main:

(PathfinderVenv) PS C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder> python -c "from cuda import pathfinder; print(pathfinder.__version__); print(pathfinder.load_nvidia_dynamic_lib('cublasLt'))"
1.4.4.dev13+ga7670b642
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from cuda import pathfinder; print(pathfinder.__version__); print(pathfinder.load_nvidia_dynamic_lib('cublasLt'))
                                                                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder\cuda\pathfinder\_dynamic_libs\load_nvidia_dynamic_lib.py", line 289, in load_nvidia_dynamic_lib
    return _load_lib_no_cache(libname)
  File "C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder\cuda\pathfinder\_dynamic_libs\load_nvidia_dynamic_lib.py", line 181, in _load_lib_no_cache
    ctx.raise_not_found()
    ~~~~~~~~~~~~~~~~~~~^^
  File "C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder\cuda\pathfinder\_dynamic_libs\search_steps.py", line 66, in raise_not_found
    raise DynamicLibNotFoundError(f'Failure finding "{self.lib_searched_for}": {err}\n{att}')
cuda.pathfinder._dynamic_libs.load_dl_common.DynamicLibNotFoundError: Failure finding "cublasLt*.dll": No such file: cublasLt*.dll, No such file: cublasLt*.dll

With main systematically for all CTK libs except the special cases cupti and nvvm:

(PathfinderVenv) PS C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder> type C:\Users\rgrossekunst\wrk\try_loading_ctk_dynamic_libs.py
from cuda import pathfinder

print(pathfinder.__version__)
for (
    libname
) in pathfinder._dynamic_libs.supported_nvidia_libs.SUPPORTED_WINDOWS_DLLS_CTK:
    if libname in ("cupti", "nvvm"):
        continue
    try:
        print(pathfinder.load_nvidia_dynamic_lib(libname))
    except Exception as e:
        print(e)
(PathfinderVenv) PS C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder> python C:\Users\rgrossekunst\wrk\try_loading_ctk_dynamic_libs.py
1.4.4.dev13+ga7670b642
Failure finding "cudart*.dll": No such file: cudart*.dll, No such file: cudart*.dll

Failure finding "nvfatbin*.dll": No such file: nvfatbin*.dll, No such file: nvfatbin*.dll

Failure finding "nvJitLink*.dll": No such file: nvJitLink*.dll, No such file: nvJitLink*.dll

Failure finding "nvrtc*.dll": No such file: nvrtc*.dll, No such file: nvrtc*.dll

Failure finding "cublasLt*.dll": No such file: cublasLt*.dll, No such file: cublasLt*.dll

Failure finding "cublasLt*.dll": No such file: cublasLt*.dll, No such file: cublasLt*.dll

Failure finding "cufft*.dll": No such file: cufft*.dll, No such file: cufft*.dll

Failure finding "cufft*.dll": No such file: cufft*.dll, No such file: cufft*.dll

Failure finding "curand*.dll": No such file: curand*.dll, No such file: curand*.dll

Failure finding "nvJitLink*.dll": No such file: nvJitLink*.dll, No such file: nvJitLink*.dll

Failure finding "nvJitLink*.dll": No such file: nvJitLink*.dll, No such file: nvJitLink*.dll

Failure finding "nvJitLink*.dll": No such file: nvJitLink*.dll, No such file: nvJitLink*.dll

Failure finding "nppc*.dll": No such file: nppc*.dll, No such file: nppc*.dll

Failure finding "nppc*.dll": No such file: nppc*.dll, No such file: nppc*.dll

Failure finding "nppc*.dll": No such file: nppc*.dll, No such file: nppc*.dll

Failure finding "nppc*.dll": No such file: nppc*.dll, No such file: nppc*.dll

Failure finding "nppc*.dll": No such file: nppc*.dll, No such file: nppc*.dll

Failure finding "nppc*.dll": No such file: nppc*.dll, No such file: nppc*.dll

Failure finding "nppc*.dll": No such file: nppc*.dll, No such file: nppc*.dll

Failure finding "nppc*.dll": No such file: nppc*.dll, No such file: nppc*.dll

Failure finding "nppc*.dll": No such file: nppc*.dll, No such file: nppc*.dll

Failure finding "nppc*.dll": No such file: nppc*.dll, No such file: nppc*.dll

Failure finding "nppc*.dll": No such file: nppc*.dll, No such file: nppc*.dll

Failure finding "cublasLt*.dll": No such file: cublasLt*.dll, No such file: cublasLt*.dll

Failure finding "nvjpeg*.dll": No such file: nvjpeg*.dll, No such file: nvjpeg*.dll

After git switching back to this PR:

(PathfinderVenv) PS C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder> python C:\Users\rgrossekunst\wrk\try_loading_ctk_dynamic_libs.py
1.4.4.dev16+g2f024a57e
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\cudart64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140718484357120, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nvfatbin_130_0.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140717343571968, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nvJitLink_130_0.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140715583012864, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nvrtc64_130_0.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140715484774400, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\cublas64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140714886627328, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\cublasLt64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140714938400768, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\cufft64_12.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140714593288192, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\cufftw64_12.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140718485667840, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\curand64_10.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140714527555584, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\cusolver64_12.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140714231922688, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\cusolverMg64_12.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140714132570112, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\cusparse64_12.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140714364239872, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nppc64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140717340491776, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nppial64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140717142704128, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nppicc64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140717266960384, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nppidei64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140717127696384, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nppif64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140714073260032, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nppig64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140714048094208, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nppim64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140716785991680, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nppist64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140714021945344, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nppisu64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140717312442368, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nppitc64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140716929056768, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\npps64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140716709511168, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nvblas64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140718405320704, found_via='system-search')
LoadedDL(abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nvjpeg64_13.dll', was_already_loaded_from_elsewhere=False, _handle_uint=140716902907904, found_via='system-search')

@rwgk
Copy link
Collaborator Author

rwgk commented Mar 19, 2026

Cursor (Claude 4.6 Opus (Thinking)) Analysis

Summary

The experiments posted in PR #1506 comment reveal that the original PR description misdiagnoses the root cause. The real bug is that LoadLibraryExW(dll_name, None, 0) cannot find DLLs via PATH at all in Python 3.8+, not that it finds them but fails to resolve co-located dependencies.

Evidence

  1. All CTK libraries fail on main, not just nvrtc. Libraries like curand and cublasLt (which have no tricky dependency chains) fail identically.

  2. Renaming nvrtc-builtins64_132.dll has no effect on the PR branch — nvrtc still loads fine. This proves the dependency resolution theory is wrong.

  3. The error message (Failure finding "nvrtc*.dll": No such file: nvrtc*.dll) comes from the pathfinder's own find steps (site-packages, CUDA_HOME), indicating that load_with_system_search silently returned None — meaning LoadLibraryExW returned a null handle (the DLL was never found, not found-but-failed-to-load).

Root cause

Since Python 3.8, CPython calls SetDefaultDllDirectories(LOAD_LIBRARY_SEARCH_DEFAULT_DIRS) at process startup. This changes the DLL search order process-wide so that LoadLibraryExW with flags=0 only searches:

  1. The application directory (Python's install directory)
  2. The system directory (System32)
  3. Directories explicitly added via AddDllDirectory()

PATH is excluded entirely.

The CUDA DLLs live in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin\x64\, which is on PATH but is none of the above three locations. So LoadLibraryExW("nvrtc64_130_0.dll", None, 0) simply cannot find them.

Why the PR fix works

The fix has two parts, both necessary:

  1. SearchPathW is not affected by SetDefaultDllDirectories. It always searches the standard system search path including PATH. So SearchPathW(NULL, "nvrtc64_130_0.dll", ...) successfully resolves the full path (e.g., C:\...\bin\x64\nvrtc64_130_0.dll).

  2. LoadLibraryExW(full_path, None, LOAD_WITH_ALTERED_SEARCH_PATH) succeeds because:

    • It receives an absolute path, so no DLL search is needed to find the main DLL.
    • LOAD_WITH_ALTERED_SEARCH_PATH tells Windows to search for the DLL's dependencies starting from its own directory. This is a secondary benefit for DLLs with co-located dependencies (like nvrtc + nvrtc-builtins), but is not the primary fix.

What the PR description should say

Before (incorrect):

When loading CUDA DLLs via system search on Windows, the previous approach using LoadLibraryExW with flags=0 would find the DLL on PATH but fail to locate its co-located dependencies (error 126).

After (corrected):

On Windows, Python 3.8+ calls SetDefaultDllDirectories(LOAD_LIBRARY_SEARCH_DEFAULT_DIRS) at startup, which restricts LoadLibraryExW(name, NULL, 0) to searching only the application directory, system directories, and directories added via AddDllDirectory()PATH is excluded. This means the pathfinder's system-search fallback silently fails for all CUDA Toolkit DLLs when CUDA_HOME/CUDA_PATH are not set.

The fix uses SearchPathW (which is unaffected by SetDefaultDllDirectories) to locate the DLL via PATH, then loads it by absolute path with LOAD_WITH_ALTERED_SEARCH_PATH so Windows also resolves dependencies from the DLL's directory.

Impact on testing

The regression test (test_load_nvrtc_without_cuda_home_or_cuda_path) is still valid and well-targeted: it strips CUDA_HOME/CUDA_PATH and loads nvrtc in a subprocess, exercising the system-search code path. The fact that the bug affects all CTK libraries equally means any single library is a good canary; nvrtc is a fine choice given its importance.

Also load nvrtc with the normal environment first. If the normal load
finds nvrtc in a directory on PATH but the CUDA_HOME/CUDA_PATH-stripped
load fails, the test now fails instead of skipping — directly catching
the Windows bug where SetDefaultDllDirectories excludes PATH from
LoadLibraryExW.

Made-with: Cursor
@rwgk
Copy link
Collaborator Author

rwgk commented Mar 19, 2026

To ensure the hardened test (commit b2aabfe) works as intended:

NOTE: Initially CUDA_PATH and CUDA_HOME are not set.

(PathfinderVenv) PS C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder> python -m pytest -ra -s -v .\tests\test_load_nvidia_dynamic_lib.py::test_load_nvrtc_without_cuda_home_or_cuda_path
========================================================================================== test session starts ==========================================================================================
platform win32 -- Python 3.13.12, pytest-9.0.2, pluggy-1.6.0 -- C:\Users\rgrossekunst\wrk\forked\cuda-python\PathfinderVenv\Scripts\python.exe
cachedir: .pytest_cache
Using --randomly-seed=973333459
rootdir: C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder
configfile: pyproject.toml
plugins: mock-3.15.1, randomly-4.0.1, repeat-0.9.4
collected 1 item

tests/test_load_nvidia_dynamic_lib.py::test_load_nvrtc_without_cuda_home_or_cuda_path SKIPPED (nvrtc not available in this environment)

============================================================================================= INFO summary ==============================================================================================
INFO test_load_nvrtc_without_cuda_home_or_cuda_path: nvrtc not found (normal env)
======================================================================================== short test summary info ========================================================================================
SKIPPED [1] tests\test_load_nvidia_dynamic_lib.py:173: nvrtc not available in this environment
========================================================================================== 1 skipped in 0.21s ===========================================================================================

Now setting CUDA_PATH (this is the situation in the CI):

(PathfinderVenv) PS C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder> $env:CUDA_PATH="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2"
(PathfinderVenv) PS C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder> python -m pytest -ra -s -v .\tests\test_load_nvidia_dynamic_lib.py::test_load_nvrtc_without_cuda_home_or_cuda_path
========================================================================================== test session starts ==========================================================================================
platform win32 -- Python 3.13.12, pytest-9.0.2, pluggy-1.6.0 -- C:\Users\rgrossekunst\wrk\forked\cuda-python\PathfinderVenv\Scripts\python.exe
cachedir: .pytest_cache
Using --randomly-seed=2577605809
rootdir: C:\Users\rgrossekunst\wrk\forked\cuda-python\cuda_pathfinder
configfile: pyproject.toml
plugins: mock-3.15.1, randomly-4.0.1, repeat-0.9.4
collected 1 item

tests/test_load_nvidia_dynamic_lib.py::test_load_nvrtc_without_cuda_home_or_cuda_path FAILED

=============================================================================================== FAILURES ================================================================================================
____________________________________________________________________________ test_load_nvrtc_without_cuda_home_or_cuda_path _____________________________________________________________________________

info_summary_append = <function info_summary_append.<locals>._append at 0x0000024626000C20>

    def test_load_nvrtc_without_cuda_home_or_cuda_path(info_summary_append):
        """Regression test for issue #1781: nvrtc must load without CUDA_HOME/CUDA_PATH.

        On Windows, Python 3.8+ calls SetDefaultDllDirectories(LOAD_LIBRARY_SEARCH_DEFAULT_DIRS)
        at startup, which excludes PATH from the LoadLibraryExW search order. The fix uses
        SearchPathW (unaffected by SetDefaultDllDirectories) to locate the DLL via PATH, then
        loads it by absolute path with LOAD_WITH_ALTERED_SEARCH_PATH.

        This test loads nvrtc twice in fresh subprocesses: once with the normal environment,
        once with CUDA_HOME and CUDA_PATH stripped. If the normal load finds nvrtc in a
        directory on PATH, the stripped load must also succeed — otherwise the system search
        is broken.
        """
        timeout = 120 if IS_WINDOWS else 30

        # Phase 1: load nvrtc with normal environment.
        normal_result = run_load_nvidia_dynamic_lib_in_subprocess("nvrtc", timeout=timeout)
        if normal_result.returncode != 0:
            raise RuntimeError(build_child_process_failed_for_libname_message("nvrtc", normal_result))
        assert not normal_result.stderr
        normal_payload = parse_dynamic_lib_subprocess_payload(
            normal_result.stdout,
            libname="nvrtc",
            error_label="Load subprocess child process (normal env)",
        )
        if normal_payload.status == STATUS_NOT_FOUND:
            info_summary_append("nvrtc not found (normal env)")
            pytest.skip("nvrtc not available in this environment")
        normal_abs_path = normal_payload.abs_path
        assert normal_abs_path is not None
        assert os.path.isfile(normal_abs_path)
        info_summary_append(f"nvrtc (normal env): abs_path={quote_for_shell(normal_abs_path)}")

        # Phase 2: load nvrtc without CUDA_HOME/CUDA_PATH.
        env = os.environ.copy()
        env.pop("CUDA_HOME", None)
        env.pop("CUDA_PATH", None)
        command = build_dynamic_lib_subprocess_command(LOAD_NVIDIA_DYNAMIC_LIB_SUBPROCESS_MODE, "nvrtc")
        stripped_result = subprocess.run(  # noqa: S603
            command,
            capture_output=True,
            text=True,
            timeout=timeout,
            check=False,
            env=env,
            cwd=LOAD_NVIDIA_DYNAMIC_LIB_SUBPROCESS_CWD,
        )
        if stripped_result.returncode != 0:
            raise RuntimeError(build_child_process_failed_for_libname_message("nvrtc", stripped_result))
        assert not stripped_result.stderr
        stripped_payload = parse_dynamic_lib_subprocess_payload(
            stripped_result.stdout,
            libname="nvrtc",
            error_label="Load subprocess child process (no CUDA_HOME/CUDA_PATH)",
        )

        # Phase 3: evaluate.
        if stripped_payload.status != STATUS_NOT_FOUND:
            stripped_abs_path = stripped_payload.abs_path
            assert stripped_abs_path is not None
            assert os.path.isfile(stripped_abs_path)
            info_summary_append(f"nvrtc (no CUDA_HOME/CUDA_PATH): abs_path={quote_for_shell(stripped_abs_path)}")
            return

        # nvrtc was found normally but not without CUDA_HOME/CUDA_PATH.
        # If the DLL's directory is on PATH, the system search should have found it.
        dll_dir = os.path.normcase(os.path.normpath(os.path.dirname(normal_abs_path)))
        on_path = any(
            os.path.normcase(os.path.normpath(d)) == dll_dir for d in os.environ.get("PATH", "").split(os.pathsep) if d
        )
        if on_path:
>           pytest.fail(
                f"nvrtc was found at {normal_abs_path!r} (directory is on PATH) "
                f"but could not be loaded without CUDA_HOME/CUDA_PATH. "
                f"System search should find DLLs in PATH directories."
            )
E           Failed: nvrtc was found at 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nvrtc64_130_0.dll' (directory is on PATH) but could not be loaded without CUDA_HOME/CUDA_PATH. System search should find DLLs in PATH directories.

command    = ['C:\\Users\\rgrossekunst\\wrk\\forked\\cuda-python\\PathfinderVenv\\Scripts\\python.exe', '-m', 'cuda.pathfinder._dynamic_libs.dynamic_lib_subprocess', 'load', 'nvrtc']
dll_dir    = 'c:\\program files\\nvidia gpu computing toolkit\\cuda\\v13.2\\bin\\x64'
env        = {'ALLUSERSPROFILE': 'C:\\ProgramData', 'APPDATA': 'C:\\Users\\rgrossekunst\\AppData\\Roaming', 'CLIENTNAME': 'KX5Y2JTJ6J', 'COMMONPROGRAMFILES': 'C:\\Program Files\\Common Files', ...}
info_summary_append = <function info_summary_append.<locals>._append at 0x0000024626000C20>
normal_abs_path = 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nvrtc64_130_0.dll'
normal_payload = DynamicLibSubprocessPayload(status='ok', abs_path='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nvrtc64_130_0.dll')
normal_result = CompletedProcess(args=['C:\\Users\\rgrossekunst\\wrk\\forked\\cuda-python\\PathfinderVenv\\Scripts\\python.exe', '-m',...:\\\\Program Files\\\\NVIDIA GPU Computing Toolkit\\\\CUDA\\\\v13.2\\\\bin\\\\x64\\\\nvrtc64_130_0.dll"}\n', stderr='')
on_path    = True
stripped_payload = DynamicLibSubprocessPayload(status='not-found', abs_path=None)
stripped_result = CompletedProcess(args=['C:\\Users\\rgrossekunst\\wrk\\forked\\cuda-python\\PathfinderVenv\\Scripts\\python.exe', '-m',..., "message": "Failure finding \\"nvrtc*.dll\\": No such file: nvrtc*.dll, No such file: nvrtc*.dll\\n"}}\n', stderr='')
timeout    = 120

tests\test_load_nvidia_dynamic_lib.py:217: Failed
============================================================================================= INFO summary ==============================================================================================
INFO test_load_nvrtc_without_cuda_home_or_cuda_path: nvrtc (normal env): abs_path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin\x64\nvrtc64_130_0.dll"
======================================================================================== short test summary info ========================================================================================
FAILED tests/test_load_nvidia_dynamic_lib.py::test_load_nvrtc_without_cuda_home_or_cuda_path - Failed: nvrtc was found at 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.2\\bin\\x64\\nvrtc64_130_0.dll' (directory is on PATH) but could not be loaded without CUDA_HOME/CUDA_PATH. Sy...
=========================================================================================== 1 failed in 0.42s ===========================================================================================

…agnosis

SearchPathW is the primary fix (bypasses Python 3.8+'s
SetDefaultDllDirectories restriction); LOAD_WITH_ALTERED_SEARCH_PATH
is a secondary benefit for dependency resolution.

Made-with: Cursor
@rwgk
Copy link
Collaborator Author

rwgk commented Mar 19, 2026

/ok to test

@github-actions
Copy link

@rwgk
Copy link
Collaborator Author

rwgk commented Mar 19, 2026

Independent Cursor(GPT-5.4 Extra High) Review

Findings

  • High: cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_dl_windows.py:140 and cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_dl_windows.py:166 replace the native LoadLibraryExW(name, ..., 0) lookup with SearchPathW(None, ...) + LoadLibraryExW(full_path, ..., LOAD_WITH_ALTERED_SEARCH_PATH). That is not a PATH-only fallback. Microsoft explicitly warns that SearchPath is the wrong primitive for locating a DLL to hand to LoadLibrary, because it uses a different search order and can consult the current working directory. So this fixes issue 1781 by broadening lookup to a less safe, different search path, which can load the wrong DLL. It also affects driver-only loads, because the same path is used for nvcuda.dll and nvml.dll in cuda_pathfinder/cuda/pathfinder/_dynamic_libs/descriptor_catalog.py:377 and cuda_pathfinder/cuda/pathfinder/_dynamic_libs/descriptor_catalog.py:383.
  • Medium: cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_dl_windows.py:140 also regresses AddDllDirectory / os.add_dll_directory() support. Before this PR, Python 3.8+'s SetDefaultDllDirectories(LOAD_LIBRARY_SEARCH_DEFAULT_DIRS) meant bare LoadLibraryExW(..., 0) still searched LOAD_LIBRARY_SEARCH_USER_DIRS; SearchPathW does not. So a process that intentionally exposes a CUDA bin dir via os.add_dll_directory() but does not mutate PATH now goes from "loadable" to "not found". The new regression coverage only checks the PATH case in cuda_pathfinder/tests/test_load_nvidia_dynamic_lib.py:146, so this behavior change would slip through.

Assumptions

  • I did not run Windows tests from this Linux/WSL checkout, so these findings are from code inspection plus Win32 API semantics.

Brief

  • The PR does address the specific PATH-discovery failure from issue 1781, and the new nvrtc subprocess test is a useful canary for that case.
  • The problem is that the implementation overcorrects: it restores PATH lookup by swapping in a broader and less compatible search mechanism globally, instead of adding a narrower PATH fallback while preserving the existing native loader behavior.

@rwgk
Copy link
Collaborator Author

rwgk commented Mar 19, 2026

Cursor Claude 4.6 Opus (Thinking) analysis of the GPT-5.4 review and the alternative PR #1794:

It raises two valid points about the SearchPathW approach (our current PR):

  1. SearchPathW has different search semantics — Microsoft explicitly warns against using SearchPathW to locate DLLs for LoadLibrary. It can consult the current working directory (though mitigated by SafeSearchMode being on by default since XP SP2). It also applies to driver-only libs (nvcuda.dll, nvml.dll) which don't need this.

  2. Regresses AddDllDirectory support — Before the PR, LoadLibraryExW(name, 0) still searched LOAD_LIBRARY_SEARCH_USER_DIRS (directories added via AddDllDirectory/os.add_dll_directory()). The SearchPathW path doesn't check those. This is niche but real.

The alternative (pathfinder_iter_env_path_directories)

It's more conservative: Phase 1 keeps the original LoadLibraryExW(name, 0) unchanged (preserving AddDllDirectory, native search order, etc.), and Phase 2 only for non-driver libs manually scans PATH with os.path.isfile() and then calls load_with_abs_path().

The concern is spot-on: this is not a system search — it's an emulation. It replaces a 2-line Win32 API call (SearchPathW + LoadLibraryExW) with ~60 lines of Python that manually iterate PATH, expand variables, strip quotes, deduplicate, filter relative entries, check isfile, etc. All of that is reimplementing what the OS already knows how to do.

The case for doing nothing

This is actually the strongest option, and here's why:

  1. Python 3.8+ excluded PATH from DLL loading intentionally, as a security hardening measure. Both "fixes" work around that decision — one via SearchPathW, the other by manually scanning PATH.

  2. The NVIDIA installer automatically sets CUDA_PATH. That's the designed, supported configuration. The pathfinder's find_in_cuda_home step handles it perfectly.

  3. No user complaints. The scenario that triggers this (CUDA on PATH but no CUDA_PATH) is atypical. The primary reason to routinely remove the variable is testing with different CTK versions — a developer workflow, not an end-user one.

  4. The test is still valuable either way. Even without a code fix, test_load_nvrtc_without_cuda_home_or_cuda_path documents the known limitation. It passes on Linux (where dlopen isn't restricted), skips on Windows without CUDA, and would fail on Windows with CUDA on PATH — which is the correct description of the current behavior.

  5. Both alternatives add complexity and risk for a scenario with no real-world complaints. The SearchPathW approach has the security/semantics concerns GPT flagged. The PATH-iteration approach is more code to maintain and an emulation layer.

If a fix IS wanted, the SearchPathW approach (current PR) is preferable over the PATH-iteration alternative — it's a clean OS-level API call rather than a bespoke emulation, and the GPT concerns are mostly theoretical (SafeSearchMode, CWD already mitigated, driver libs could be excluded with a one-line guard). But the "do nothing + document" option seems like the right call given the practical situation.

@rwgk
Copy link
Collaborator Author

rwgk commented Mar 19, 2026

Back to Cursor GPT-5.4 Extra High

User prompt

Could you please read the newest PR comment here: /home/rgrossekunst/wrk/logs/gh_pr_info_NVIDIA_cuda-python_1506_2026-03-19+164039.md

I'm thinking it'll be best to not change the code, only update the documentation to clearly state that on Windows PATH is not included in the system search.

For full context, this is the prompt I gave Claude (in Cursor), to which it responded with the currently newest PR comment:

│ Could you please analyze the new PR comment and the alternative solution? │
│ │
│ A concern I have: the new solution does not look like an actual system search anymore, but an emulation. │
│ │
│ My starting point was: I was surprised to see that nvrtc wasn't discovered without CUDA_PATH/CUDA_HOME set. But maybe that's just it, and all we should do is document the behavior │
│ clearly? │
│ │
│ There are no complaints from users. │
│ │
│ The NVIDIA Windows installer automatically adds CUDA_PATH to the environment. (The primary reason I routinely remove the variable: I have to test with different CTK versions and want │
│ to be sure there is no mixup. That's probably atypical.) │

What's your take on the situation?

Assistant response

My take: I would not merge either code change.

The newest comment lands in the right place. On Windows, this behavior is coming from CPython’s hardened DLL-loading setup, not from a random pathfinder mistake. Both fixes are really trying to re-introduce PATH as a discovery mechanism after Python intentionally excluded it. SearchPathW does that with the wrong loader semantics, and the iter_env_path_directories branch does it by explicitly emulating PATH lookup. Both are behavior changes, not simple bug fixes.

Given your actual support story, “document, don’t fix” seems strongest:

  • the NVIDIA installer sets CUDA_PATH
  • pathfinder already handles CUDA_HOME / CUDA_PATH
  • there are no user reports
  • the failing case is mainly your multi-CTK developer workflow, which is real but atypical

So I’d frame this as “the docs/expectations are wrong,” not “the loader is broken.” In particular, the current public search-order text in cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py:231 is misleading on Windows because it implies CTK installs on system PATH are found by the OS-default loader path. That is the part I’d fix.

If you go docs-only, I’d also change the new test in cuda_pathfinder/tests/test_load_nvidia_dynamic_lib.py:146. As written, it encodes a Windows PATH-only success requirement that you’re now saying should not be guaranteed. So that test should be removed, relaxed, or rewritten to document the limitation rather than fail on it.

My recommendation:

  1. Don’t merge the current SearchPathW PR as code.
  2. Don’t merge the pathfinder_iter_env_path_directories alternative as code either.
  3. Update docs/comments/release notes to say that on Windows, load_with_system_search() does not include PATH under Python 3.8+, so CTK installs are expected to be found via CUDA_HOME / CUDA_PATH, wheels, Conda, or other explicit paths.
  4. Adjust the issue/PR framing accordingly.

If you want, I can make that documentation-only change set and replace the test with something that matches the documented Windows limitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.pathfinder Everything related to the cuda.pathfinder module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG][pathfinder] Windows system search fails to find any CUDA DLL when CUDA_HOME/CUDA_PATH are not set

2 participants