gcc-reflection

mirror of https://forge.sourceware.org/marek/gcc.git synced 2026-02-22 03:47:02 -05:00

Author	SHA1	Message	Date
Jakub Jelinek	254a858ae7	Update copyright years.	2026-01-02 09:56:11 +01:00
Andrew Stubbs	62174ec27b	openmp, nvptx: ompx_gnu_managed_mem_alloc This adds support for using Cuda Managed Memory with omp_alloc. AMD support will be added in a future patch. There is one new predefined allocator, "ompx_gnu_managed_mem_alloc", plus a corresponding memory space, which can be used to allocate memory in the "managed" space. The nvptx plugin is modified to make the necessary Cuda calls, via two new (optional) plugin interfaces. gcc/fortran/ChangeLog: * openmp.cc (is_predefined_allocator): Use GOMP_OMP_PREDEF_ALLOC_MAX and GOMP_OMPX_PREDEF_ALLOC_MIN/MAX instead of hardcoded values in the comment. include/ChangeLog: * cuda/cuda.h (cuMemAllocManaged): Add declaration and related CU_MEM_ATTACH_GLOBAL flag. * gomp-constants.h (GOMP_OMPX_PREDEF_ALLOC_MAX): Update to 201. (GOMP_OMP_PREDEF_MEMSPACE_MAX): New constant. (GOMP_OMPX_PREDEF_MEMSPACE_MIN): New constant. (GOMP_OMPX_PREDEF_MEMSPACE_MAX): New constant. libgomp/ChangeLog: * allocator.c (ompx_gnu_max_predefined_alloc): Update to ompx_gnu_managed_mem_alloc. (_Static_assert): Fix assertion messages for allocators and add new assertions for memspace constants. (omp_max_predefined_mem_space): New define. (ompx_gnu_min_predefined_mem_space): New define. (ompx_gnu_max_predefined_mem_space): New define. (MEMSPACE_ALLOC): Add check for non-standard memspaces. (MEMSPACE_CALLOC): Likewise. (MEMSPACE_REALLOC): Likewise. (MEMSPACE_VALIDATE): Likewise. (predefined_ompx_gnu_alloc_mapping): Add ompx_gnu_managed_mem_space. (omp_init_allocator): Add ompx_gnu_managed_mem_space validation. * config/gcn/allocator.c (gcn_memspace_alloc): Add check for non-standard memspaces. (gcn_memspace_calloc): Likewise. (gcn_memspace_realloc): Likewise. (gcn_memspace_validate): Update to validate standard vs non-standard memspaces. * config/linux/allocator.c (linux_memspace_alloc): Add managed memory space handling. (linux_memspace_calloc): Likewise. (linux_memspace_free): Likewise. (linux_memspace_realloc): Likewise (returns NULL for fallback). * config/nvptx/allocator.c (nvptx_memspace_alloc): Add check for non-standard memspaces. (nvptx_memspace_calloc): Likewise. (nvptx_memspace_realloc): Likewise. (nvptx_memspace_validate): Update to validate standard vs non-standard memspaces. * env.c (parse_allocator): Add ompx_gnu_managed_mem_alloc, ompx_gnu_managed_mem_space, and some static asserts so I don't forget them again. * libgomp-plugin.h (GOMP_OFFLOAD_managed_alloc): New declaration. (GOMP_OFFLOAD_managed_free): New declaration. * libgomp.h (gomp_managed_alloc): New declaration. (gomp_managed_free): New declaration. (struct gomp_device_descr): Add managed_alloc_func and managed_free_func fields. * libgomp.texi: Document ompx_gnu_managed_mem_alloc and ompx_gnu_managed_mem_space, add C++ template documentation, and describe NVPTX and AMD support. * omp.h.in: Add ompx_gnu_managed_mem_space and ompx_gnu_managed_mem_alloc enumerators, and gnu_managed_mem C++ allocator template. * omp_lib.f90.in: Add Fortran bindings for new allocator and memory space. * omp_lib.h.in: Likewise. * plugin/cuda-lib.def: Add cuMemAllocManaged. * plugin/plugin-nvptx.c (nvptx_alloc): Add managed parameter to support cuMemAllocManaged. (GOMP_OFFLOAD_alloc): Move contents to ... (cleanup_and_alloc): ... this new function, and add managed support. (GOMP_OFFLOAD_managed_alloc): New function. (GOMP_OFFLOAD_managed_free): New function. * target.c (gomp_managed_alloc): New function. (gomp_managed_free): New function. (gomp_load_plugin_for_device): Load optional managed_alloc and managed_free plugin APIs. * testsuite/lib/libgomp.exp: Add check_effective_target_omp_managedmem. * testsuite/libgomp.c++/alloc-managed-1.C: New test. * testsuite/libgomp.c/alloc-managed-1.c: New test. * testsuite/libgomp.c/alloc-managed-2.c: New test. * testsuite/libgomp.c/alloc-managed-3.c: New test. * testsuite/libgomp.c/alloc-managed-4.c: New test. * testsuite/libgomp.fortran/alloc-managed-1.f90: New test. Co-authored-by: Kwok Cheung Yeung <kcyeung@baylibre.com> Co-authored-by: Thomas Schwinge <tschwinge@baylibre.com>	2025-11-13 14:16:09 +00:00
Tobias Burnus	4e47e2f833	libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async PR libgomp/120444 include/ChangeLog: * cuda/cuda.h (cuMemsetD8, cuMemsetD8Async): Declare. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_memset): Declare. * libgomp.h (struct gomp_device_descr): Add memset_func. * libgomp.map (GOMP_6.0.1): Add omp_target_memset{,_async}. * libgomp.texi (Device Memory Routines): Document them. * omp.h.in (omp_target_memset, omp_target_memset_async): Declare. * omp_lib.f90.in (omp_target_memset, omp_target_memset_async): Add interfaces. * omp_lib.h.in (omp_target_memset, omp_target_memset_async): Likewise. * plugin/cuda-lib.def: Add cuMemsetD8. * plugin/plugin-gcn.c (struct hsa_runtime_fn_info): Add hsa_amd_memory_fill_fn. (init_hsa_runtime_functions): DLSYM_OPT_FN load it. (GOMP_OFFLOAD_memset): New. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memset): New. * target.c (omp_target_memset_int, omp_target_memset, omp_target_memset_async_helper, omp_target_memset_async): New. (gomp_load_plugin_for_device): Add DLSYM (memset). * testsuite/libgomp.c-c++-common/omp_target_memset.c: New test. * testsuite/libgomp.c-c++-common/omp_target_memset-2.c: New test. * testsuite/libgomp.c-c++-common/omp_target_memset-3.c: New test. * testsuite/libgomp.fortran/omp_target_memset.f90: New test. * testsuite/libgomp.fortran/omp_target_memset-2.f90: New test.	2025-06-02 17:43:57 +02:00
Jakub Jelinek	6441eb6dc0	Update copyright years.	2025-01-02 11:59:57 +01:00
Tobias Burnus	bf4a5efa80	OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines Those TR13/OpenMP 6.0 routines permit a reproducible offloading to a specific device by mapping an OpenMP device number to a unique ID (UID). The GPU device UIDs should be universally unique, the one for the host is not. gcc/ChangeLog: * omp-general.cc (omp_runtime_api_procname): Add get_device_from_uid and omp_get_uid_from_device routines. include/ChangeLog: * cuda/cuda.h (cuDeviceGetUuid): Declare. (cuDeviceGetUuid_v2): Add prototype. libgomp/ChangeLog: * config/gcn/target.c (omp_get_uid_from_device, omp_get_device_from_uid): Add stub implementation. * config/nvptx/target.c (omp_get_uid_from_device, omp_get_device_from_uid): Likewise. * fortran.c (omp_get_uid_from_device_, omp_get_uid_from_device_8_): New functions. * libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype. * libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'. * libgomp.map (GOMP_6.0): New, includind the new UID routines. * libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'. (Device Information Routines): Document new UID routines. (Offload-Target Specifics): Document UID format. * omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device): New prototype. * omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device): New interface. * omp_lib.h.in: Likewise. * plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New. * target.c (str_omp_initial_device): New static var. (STR_OMP_DEV_PREFIX): Define. (gomp_get_uid_for_device, omp_get_uid_from_device, omp_get_device_from_uid): New. (gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'. (gomp_target_init): Set the device's 'uid' field to NULL. * testsuite/libgomp.c/device_uid.c: New test. * testsuite/libgomp.fortran/device_uid.f90: New test.	2024-09-20 09:25:33 +02:00
Tobias Burnus	4ccb3366ad	libgomp: Enable USM for some nvptx devices A few high-end nvptx devices support the attribute CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS; for those, unified shared memory is supported in hardware. This patch enables support for those - if all installed nvptx devices have this feature (as the capabilities are per device type). This exposes a bug in gomp_copy_back_icvs as it did before use omp_get_mapped_ptr to find mapped variables, but that returns the unchanged pointer in cased of shared memory. But in this case, we have a few actually mapped pointers - like the ICV variables. Additionally, there was a mismatch with regards to '-1' for the device number as gomp_copy_back_icvs and omp_get_mapped_ptr count differently. Hence, do the lookup manually. include/ChangeLog: * cuda/cuda.h (CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS): Add. libgomp/ChangeLog: * libgomp.texi (nvptx): Update USM description. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim support when requesting USM and all devices support CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS. * target.c (gomp_copy_back_icvs): Fix device ptr lookup. (gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the devices supports USM.	2024-05-29 15:14:38 +02:00
Thomas Schwinge	a02d7f0edc	GCN, nvptx: Errors during device probing are fatal Currently, we silently disable libgomp GCN and nvptx plugins/devices in presence of certain error conditions during device probing, thus typically silently resorting to host-fallback execution. Make such errors fatal, similar as for any other device access later on, so that we early and reliably notice when things go wrong. (Keep just two cases non-fatal: (a) libgomp GCN or nvptx plugins are available but 'libhsa-runtime64.so.1' or 'libcuda.so.1' are not, and (b) those are available, but the corresponding devices are not.) This resolves the issue that we've got execution test cases unexpectedly PASSing, despite: libgomp: GCN fatal error: Run-time could not be initialized Runtime message: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. ..., and therefore they were not offloaded to the GCN device, but ran in host-fallback execution mode. What happend in that scenario is that in 'init_hsa_context' during the initial 'GOMP_OFFLOAD_get_num_devices' we ran into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', but it wasn't fatal, but just silently disabled the libgomp plugin/device. Especially "entertaining" were cases where such unintended host-fallback execution happened during effective-target checks like 'offload_device_available' (host-fallback execution there meaning: no offload device available), but actual test cases then were running with an offload device available, and therefore mis-configured. include/ * cuda/cuda.h (CUresult): Add 'CUDA_ERROR_NO_DEVICE'. libgomp/ * plugin/plugin-gcn.c (init_hsa_context): Add and handle 'bool probe' parameter. Adjust all users; errors during device probing are fatal. * plugin/plugin-nvptx.c (nvptx_get_num_devices): Aside from 'CUDA_ERROR_NO_DEVICE', errors during device probing are fatal.	2024-04-08 22:08:00 +02:00
Jakub Jelinek	a945c346f5	Update copyright years.	2024-01-03 12:19:35 +01:00
Tobias Burnus	8b9e559fe7	libgomp: cuda.h and omp_target_memcpy_rect cleanup Fixes for commit r14-2792-g25072a477a56a727b369bf9b20f4d18198ff5894 "OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect", namely: In that commit, the code was changed to handle shared-memory devices; however, as pointed out, omp_target_memcpy_check already set the pointer to NULL in that case. Hence, this commit reverts to the prior version. In cuda.h, it adds cuMemcpyPeer{,Async} for symmetry for cuMemcpy3DPeer (all currently unused) and in three structs, fixes reserved-member names and remove a bogus 'const' in three structs. And it changes a DLSYM to DLSYM_OPT as not all plugins support the new functions, yet. include/ChangeLog: * cuda/cuda.h (CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): Remove bogus 'const' from 'const void dst' and fix reserved-name name in those structs. (cuMemcpyPeer, cuMemcpyPeerAsync): Add. libgomp/ChangeLog: target.c (omp_target_memcpy_rect_worker): Undo dim=1 change for GOMP_OFFLOAD_CAP_SHARED_MEM. (omp_target_memcpy_rect_copy): Likewise for lock condition. (gomp_load_plugin_for_device): Use DLSYM_OPT not DLSYM for memcpy3d/memcpy2d. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): Use memset 0 to nullify reserved and unused src/dst fields for that mem type; remove '{src,dst}LOD = 0'.	2023-07-29 13:25:03 +02:00
Tobias Burnus	25072a477a	OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect When copying a 2D or 3D rectangular memmory block, the performance is better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the data one by one. That's what this commit does. Additionally, it permits device-to-device copies, if neccessary using a temporary variable on the host. include/ChangeLog: * cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE. (CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): New typdefs. (cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer, cuMemcpy3DPeerAsync): New prototypes. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New prototypes. * libgomp.h (struct gomp_device_descr): Add memcpy2d_func and memcpy3d_func. * libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used. * oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL. * plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned, cuMemcpy3D): Invoke via CUDA_ONE_CALL. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New. * target.c (omp_target_memcpy_rect_worker): (omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy): Permit all device-to-device copyies; invoke new plugins for 2D and 3D copying when available. (gomp_load_plugin_for_device): DLSYM the new plugin functions. * testsuite/libgomp.c/target-12.c: Fix dimension bug. * testsuite/libgomp.fortran/target-12.f90: Likewise. * testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.	2023-07-26 16:22:35 +02:00
Jakub Jelinek	83ffe9cde7	Update copyright years.	2023-01-16 11:52:17 +01:00
Tobias Burnus	131d18e928	libgomp/nvptx: Prepare for reverse-offload callback handling This patch adds a stub 'gomp_target_rev' in the host's target.c, which will later handle the reverse offload. For nvptx, it adds support for forwarding the offload gomp_target_ext call to the host by setting values in a struct on the device and querying it on the host - invoking gomp_target_rev on the result. include/ChangeLog: * cuda/cuda.h (enum CUdevice_attribute): Add CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. (CU_MEMHOSTALLOC_DEVICEMAP): Define. (cuMemHostAlloc): Add prototype. libgomp/ChangeLog: * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove 'static' for this variable. * config/nvptx/libgomp-nvptx.h: New file. * config/nvptx/target.c: Include it. (GOMP_ADDITIONAL_ICVS): Declare extern var. (GOMP_REV_OFFLOAD_VAR): Declare var. (GOMP_target_ext): Handle reverse offload. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ... * target.c (gomp_target_rev): ... this new stub function. * libgomp.h (gomp_target_rev): Declare. * libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev. * plugin/cuda-lib.def (cuMemHostAlloc): Add. * plugin/plugin-nvptx.c: Include libgomp-nvptx.h. (struct ptx_device): Add rev_data member. (nvptx_open_device): Remove async_engines query, last used in r10-304-g1f4c5b9b; add unified-address assert check. (GOMP_OFFLOAD_get_num_devices): Claim unified address support. (GOMP_OFFLOAD_load_image): Free rev_fn_table if no offload functions exist. Make offload var available on host and device. (rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New. (GOMP_OFFLOAD_run): Handle reverse offload.	2022-10-24 17:04:08 +02:00
Thomas Schwinge	86f64400a5	'include/cuda/cuda.h': Add parts necessary for nvptx-tools 'nvptx-run' include/ * cuda/cuda.h (enum CUjit_option): Add 'CU_JIT_GENERATE_DEBUG_INFO', 'CU_JIT_GENERATE_LINE_INFO'. (enum CUlimit): Add 'CU_LIMIT_STACK_SIZE', 'CU_LIMIT_MALLOC_HEAP_SIZE'. (cuCtxSetLimit, cuGetErrorName): Add.	2022-05-18 12:06:20 +02:00
Thomas Schwinge	bdd1dc1bfb	'include/cuda/cuda.h': For C++, wrap in 'extern "C"' include/ * cuda/cuda.h: For C++, wrap in 'extern "C"'.	2022-05-18 12:06:20 +02:00
Thomas Schwinge	5e431ae4cc	Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h' ... so that it may be used by other projects that inherit GCC's 'include' directory. include/ * cuda/cuda.h: New file. libgomp/ * plugin/cuda/cuda.h: Remove file. * plugin/plugin-nvptx.c [PLUGIN_NVPTX_DYNAMIC]: Include "cuda/cuda.h" instead of <cuda.h>. * plugin/configfrag.ac <PLUGIN_NVPTX_DYNAMIC>: Don't set 'PLUGIN_NVPTX_CPPFLAGS'. * configure: Regenerate.	2022-04-06 22:30:14 +02:00

15 Commits