CUDA FlagsΒΆ
In order to simplify writing CUDA kernels, the PyTorch C++ library enables several compiler flags:
Using CUDA architectures of detected GPUs
Enabling
__host__ __device__
lambda functions, e.g., with thrust/CUB algorithmsEnabling relaxed
constexpr
rules to reuse, e.g.,std::clamp
in kernels and__device__
functionsSuppressing some noisy warnings
However, the PyTorch C++ library provides these flags by modifying the (old-school) CUDA_NVCC_FLAGS
variable. Although CMake will pick up the variable, the modifications are only visible in the directory (and subdirectory) scope(s) where PyTorch has been found by find_package
. This may lead to compiler errors for depending targets in parent or sibling directories when finding PyTorch with the GLOBAL
option enabled, as this promotes only the respective targets to all scopes but leaves the variables modifications in the calling scope.
Charonload automatically detects the modified compile flags and attaches them as an INTERFACE
property to the CUDA target of the PyTorch C++ library, such that they will be correctly propagated to any linking target.