16 Commits

Author SHA1 Message Date
Andrew Stubbs
a53c5c42c6 libgomp, amdgcn, nvptx: Improve omp_target_is_accessible [PR121813]
This patch extends omp_target_is_accessible to check the actual device status
for the memory region, on amdgcn and nvptx devices (rather than just checking
if shared memory is enabled).

In both cases, we check the status of each 4k region within the given memory
range (assuming 4k pages should be safe for all the currently supported hosts)
and returns true if all of the pages report accessible.

The testcases have been modified to check that allocations marked accessible
actually are accessible (inaccessibility can't be checked without invoking
memory faults), and to understand that some parts of an array can be accessible
but other parts not (I have observed this intermittently for the stack memory
on amdgcn using the Fortran testcase, which can have the allocation span pages).

There's also new testcases for the various other memory modes, and for managed
memory.

include/ChangeLog:

	* cuda/cuda.h (CUpointer_attribute): New enum.
	(cuPointerGetAttribute): New prototype.

libgomp/ChangeLog:

	PR libgomp/121813
	PR libgomp/113213
	* libgomp-plugin.h (GOMP_OFFLOAD_is_accessible_ptr): New prototype.
	* libgomp.h
	(struct gomp_device_descr): Add GOMP_OFFLOAD_is_accessible_ptr.
	* libgomp.texi: Update omp_target_is_accessible docs.
	* plugin/cuda-lib.def (cuPointerGetAttribute): New entry.
	* plugin/plugin-gcn.c (struct hsa_runtime_fn_info): Add
	hsa_amd_svm_attributes_get_fn and hsa_amd_pointer_info_fn.
	(init_hsa_runtime_functions): Add hsa_amd_svm_attributes_get and
	hsa_amd_pointer_info.
	(enum accessible): New enum type.
	(host_memory_is_accessible): New function.
	(device_memory_is_accessible): New function.
	(GOMP_OFFLOAD_is_accessible_ptr): New function.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_is_accessible_ptr): Likewise.
	* target.c (omp_target_is_accessible): Call is_accessible_ptr_func.
	(gomp_load_plugin_for_device): Add is_accessible_ptr.
	* testsuite/libgomp.c-c++-common/target-is-accessible-1.c: Rework
	to match more details of the GPU implementation.
	* testsuite/libgomp.fortran/target-is-accessible-1.f90: Likewise.
	* testsuite/libgomp.c-c++-common/target-is-accessible-2.c: New test.
	* testsuite/libgomp.c-c++-common/target-is-accessible-3.c: New test.
	* testsuite/libgomp.c-c++-common/target-is-accessible-4.c: New test.
	* testsuite/libgomp.c-c++-common/target-is-accessible-5.c: New test.
2026-02-03 13:17:38 +00:00
Jakub Jelinek
254a858ae7 Update copyright years. 2026-01-02 09:56:11 +01:00
Andrew Stubbs
62174ec27b openmp, nvptx: ompx_gnu_managed_mem_alloc
This adds support for using Cuda Managed Memory with omp_alloc.  AMD support
will be added in a future patch.

There is one new predefined allocator, "ompx_gnu_managed_mem_alloc", plus a
corresponding memory space, which can be used to allocate memory in the
"managed" space.

The nvptx plugin is modified to make the necessary Cuda calls, via two new
(optional) plugin interfaces.

gcc/fortran/ChangeLog:

	* openmp.cc (is_predefined_allocator): Use GOMP_OMP_PREDEF_ALLOC_MAX
	and GOMP_OMPX_PREDEF_ALLOC_MIN/MAX instead of hardcoded values in the
	comment.

include/ChangeLog:

	* cuda/cuda.h (cuMemAllocManaged): Add declaration and related
	CU_MEM_ATTACH_GLOBAL flag.
	* gomp-constants.h (GOMP_OMPX_PREDEF_ALLOC_MAX): Update to 201.
	(GOMP_OMP_PREDEF_MEMSPACE_MAX): New constant.
	(GOMP_OMPX_PREDEF_MEMSPACE_MIN): New constant.
	(GOMP_OMPX_PREDEF_MEMSPACE_MAX): New constant.

libgomp/ChangeLog:

	* allocator.c (ompx_gnu_max_predefined_alloc): Update to
	ompx_gnu_managed_mem_alloc.
	(_Static_assert): Fix assertion messages for allocators and add
	new assertions for memspace constants.
	(omp_max_predefined_mem_space): New define.
	(ompx_gnu_min_predefined_mem_space): New define.
	(ompx_gnu_max_predefined_mem_space): New define.
	(MEMSPACE_ALLOC): Add check for non-standard memspaces.
	(MEMSPACE_CALLOC): Likewise.
	(MEMSPACE_REALLOC): Likewise.
	(MEMSPACE_VALIDATE): Likewise.
	(predefined_ompx_gnu_alloc_mapping): Add ompx_gnu_managed_mem_space.
	(omp_init_allocator): Add ompx_gnu_managed_mem_space validation.
	* config/gcn/allocator.c (gcn_memspace_alloc): Add check for
	non-standard memspaces.
	(gcn_memspace_calloc): Likewise.
	(gcn_memspace_realloc): Likewise.
	(gcn_memspace_validate): Update to validate standard vs non-standard
	memspaces.
	* config/linux/allocator.c (linux_memspace_alloc): Add managed
	memory space handling.
	(linux_memspace_calloc): Likewise.
	(linux_memspace_free): Likewise.
	(linux_memspace_realloc): Likewise (returns NULL for fallback).
	* config/nvptx/allocator.c (nvptx_memspace_alloc): Add check for
	non-standard memspaces.
	(nvptx_memspace_calloc): Likewise.
	(nvptx_memspace_realloc): Likewise.
	(nvptx_memspace_validate): Update to validate standard vs non-standard
	memspaces.
	* env.c (parse_allocator): Add ompx_gnu_managed_mem_alloc,
	ompx_gnu_managed_mem_space, and some static asserts so I don't forget
	them again.
	* libgomp-plugin.h (GOMP_OFFLOAD_managed_alloc): New declaration.
	(GOMP_OFFLOAD_managed_free): New declaration.
	* libgomp.h (gomp_managed_alloc): New declaration.
	(gomp_managed_free): New declaration.
	(struct gomp_device_descr): Add managed_alloc_func and
	managed_free_func fields.
	* libgomp.texi: Document ompx_gnu_managed_mem_alloc and
	ompx_gnu_managed_mem_space, add C++ template documentation, and
	describe NVPTX and AMD support.
	* omp.h.in: Add ompx_gnu_managed_mem_space and
	ompx_gnu_managed_mem_alloc enumerators, and gnu_managed_mem C++
	allocator template.
	* omp_lib.f90.in: Add Fortran bindings for new allocator and
	memory space.
	* omp_lib.h.in: Likewise.
	* plugin/cuda-lib.def: Add cuMemAllocManaged.
	* plugin/plugin-nvptx.c (nvptx_alloc): Add managed parameter to
	support cuMemAllocManaged.
	(GOMP_OFFLOAD_alloc): Move contents to ...
	(cleanup_and_alloc): ... this new function, and add managed support.
	(GOMP_OFFLOAD_managed_alloc): New function.
	(GOMP_OFFLOAD_managed_free): New function.
	* target.c (gomp_managed_alloc): New function.
	(gomp_managed_free): New function.
	(gomp_load_plugin_for_device): Load optional managed_alloc
	and managed_free plugin APIs.
	* testsuite/lib/libgomp.exp: Add check_effective_target_omp_managedmem.
	* testsuite/libgomp.c++/alloc-managed-1.C: New test.
	* testsuite/libgomp.c/alloc-managed-1.c: New test.
	* testsuite/libgomp.c/alloc-managed-2.c: New test.
	* testsuite/libgomp.c/alloc-managed-3.c: New test.
	* testsuite/libgomp.c/alloc-managed-4.c: New test.
	* testsuite/libgomp.fortran/alloc-managed-1.f90: New test.

Co-authored-by: Kwok Cheung Yeung <kcyeung@baylibre.com>
Co-authored-by: Thomas Schwinge <tschwinge@baylibre.com>
2025-11-13 14:16:09 +00:00
Tobias Burnus
4e47e2f833 libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async
PR libgomp/120444

include/ChangeLog:

	* cuda/cuda.h (cuMemsetD8, cuMemsetD8Async): Declare.

libgomp/ChangeLog:

	* libgomp-plugin.h (GOMP_OFFLOAD_memset): Declare.
	* libgomp.h (struct gomp_device_descr): Add memset_func.
	* libgomp.map (GOMP_6.0.1): Add omp_target_memset{,_async}.
	* libgomp.texi (Device Memory Routines): Document them.
	* omp.h.in (omp_target_memset, omp_target_memset_async): Declare.
	* omp_lib.f90.in (omp_target_memset, omp_target_memset_async):
	Add interfaces.
	* omp_lib.h.in (omp_target_memset, omp_target_memset_async): Likewise.
	* plugin/cuda-lib.def: Add cuMemsetD8.
	* plugin/plugin-gcn.c (struct hsa_runtime_fn_info): Add
	hsa_amd_memory_fill_fn.
	(init_hsa_runtime_functions): DLSYM_OPT_FN load it.
	(GOMP_OFFLOAD_memset): New.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memset): New.
	* target.c (omp_target_memset_int, omp_target_memset,
	omp_target_memset_async_helper, omp_target_memset_async): New.
	(gomp_load_plugin_for_device): Add DLSYM (memset).
	* testsuite/libgomp.c-c++-common/omp_target_memset.c: New test.
	* testsuite/libgomp.c-c++-common/omp_target_memset-2.c: New test.
	* testsuite/libgomp.c-c++-common/omp_target_memset-3.c: New test.
	* testsuite/libgomp.fortran/omp_target_memset.f90: New test.
	* testsuite/libgomp.fortran/omp_target_memset-2.f90: New test.
2025-06-02 17:43:57 +02:00
Jakub Jelinek
6441eb6dc0 Update copyright years. 2025-01-02 11:59:57 +01:00
Tobias Burnus
bf4a5efa80 OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines
Those TR13/OpenMP 6.0 routines permit a reproducible offloading to
a specific device by mapping an OpenMP device number to a
unique ID (UID). The GPU device UIDs should be universally unique,
the one for the host is not.

gcc/ChangeLog:

	* omp-general.cc (omp_runtime_api_procname): Add
	get_device_from_uid and omp_get_uid_from_device routines.

include/ChangeLog:

	* cuda/cuda.h (cuDeviceGetUuid): Declare.
	(cuDeviceGetUuid_v2): Add prototype.

libgomp/ChangeLog:

	* config/gcn/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Add stub implementation.
	* config/nvptx/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Likewise.
	* fortran.c (omp_get_uid_from_device_,
	omp_get_uid_from_device_8_): New functions.
	* libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype.
	* libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'.
	* libgomp.map (GOMP_6.0): New, includind the new UID routines.
	* libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'.
	(Device Information Routines): Document new UID routines.
	(Offload-Target Specifics): Document UID format.
	* omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New prototype.
	* omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New interface.
	* omp_lib.h.in: Likewise.
	* plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New.
	* target.c (str_omp_initial_device): New static var.
	(STR_OMP_DEV_PREFIX): Define.
	(gomp_get_uid_for_device, omp_get_uid_from_device,
	omp_get_device_from_uid): New.
	(gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'.
	(gomp_target_init): Set the device's 'uid' field to NULL.
	* testsuite/libgomp.c/device_uid.c: New test.
	* testsuite/libgomp.fortran/device_uid.f90: New test.
2024-09-20 09:25:33 +02:00
Tobias Burnus
4ccb3366ad libgomp: Enable USM for some nvptx devices
A few high-end nvptx devices support the attribute
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS; for those, unified shared
memory is supported in hardware. This patch enables support for those -
if all installed nvptx devices have this feature (as the capabilities
are per device type).

This exposes a bug in gomp_copy_back_icvs as it did before use
omp_get_mapped_ptr to find mapped variables, but that returns
the unchanged pointer in cased of shared memory. But in this case,
we have a few actually mapped pointers - like the ICV variables.
Additionally, there was a mismatch with regards to '-1' for the
device number as gomp_copy_back_icvs and omp_get_mapped_ptr count
differently. Hence, do the lookup manually.

include/ChangeLog:

	* cuda/cuda.h (CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS): Add.

libgomp/ChangeLog:

	* libgomp.texi (nvptx): Update USM description.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
	Claim support when requesting USM and all devices support
	CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS.
	* target.c (gomp_copy_back_icvs): Fix device ptr lookup.
	(gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the
	devices supports USM.
2024-05-29 15:14:38 +02:00
Thomas Schwinge
a02d7f0edc GCN, nvptx: Errors during device probing are fatal
Currently, we silently disable libgomp GCN and nvptx plugins/devices in
presence of certain error conditions during device probing, thus typically
silently resorting to host-fallback execution.  Make such errors fatal, similar
as for any other device access later on, so that we early and reliably notice
when things go wrong.  (Keep just two cases non-fatal: (a) libgomp GCN or nvptx
plugins are available but 'libhsa-runtime64.so.1' or 'libcuda.so.1' are not,
and (b) those are available, but the corresponding devices are not.)

This resolves the issue that we've got execution test cases unexpectedly
PASSing, despite:

    libgomp: GCN fatal error: Run-time could not be initialized
    Runtime message: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.

..., and therefore they were not offloaded to the GCN device, but ran in
host-fallback execution mode.  What happend in that scenario is that in
'init_hsa_context' during the initial 'GOMP_OFFLOAD_get_num_devices' we ran
into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', but it wasn't fatal, but just
silently disabled the libgomp plugin/device.

Especially "entertaining" were cases where such unintended host-fallback
execution happened during effective-target checks like
'offload_device_available' (host-fallback execution there meaning: no offload
device available), but actual test cases then were running with an offload
device available, and therefore mis-configured.

	include/
	* cuda/cuda.h (CUresult): Add 'CUDA_ERROR_NO_DEVICE'.
	libgomp/
	* plugin/plugin-gcn.c (init_hsa_context): Add and handle
	'bool probe' parameter.  Adjust all users; errors during device
	probing are fatal.
	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Aside from
	'CUDA_ERROR_NO_DEVICE', errors during device probing are fatal.
2024-04-08 22:08:00 +02:00
Jakub Jelinek
a945c346f5 Update copyright years. 2024-01-03 12:19:35 +01:00
Tobias Burnus
8b9e559fe7 libgomp: cuda.h and omp_target_memcpy_rect cleanup
Fixes for commit r14-2792-g25072a477a56a727b369bf9b20f4d18198ff5894
"OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect",
namely:

In that commit, the code was changed to handle shared-memory devices;
however, as pointed out, omp_target_memcpy_check already set the pointer
to NULL in that case.  Hence, this commit reverts to the prior version.

In cuda.h, it adds cuMemcpyPeer{,Async} for symmetry for cuMemcpy3DPeer
(all currently unused) and in three structs, fixes reserved-member names
and remove a bogus 'const' in three structs.

And it changes a DLSYM to DLSYM_OPT as not all plugins support the new
functions, yet.

include/ChangeLog:

	* cuda/cuda.h (CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER):
	Remove bogus 'const' from 'const void *dst' and fix reserved-name
	name in those structs.
	(cuMemcpyPeer, cuMemcpyPeerAsync): Add.

libgomp/ChangeLog:

	* target.c (omp_target_memcpy_rect_worker): Undo dim=1 change for
	GOMP_OFFLOAD_CAP_SHARED_MEM.
	(omp_target_memcpy_rect_copy): Likewise for lock condition.
	(gomp_load_plugin_for_device): Use DLSYM_OPT not DLSYM for
	memcpy3d/memcpy2d.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d,
	GOMP_OFFLOAD_memcpy3d): Use memset 0 to nullify reserved and
	unused src/dst fields for that mem type; remove '{src,dst}LOD = 0'.
2023-07-29 13:25:03 +02:00
Tobias Burnus
25072a477a OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect
When copying a 2D or 3D rectangular memmory block, the performance is
better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the
data one by one. That's what this commit does.

Additionally, it permits device-to-device copies, if neccessary using a
temporary variable on the host.

include/ChangeLog:

	* cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED,
	CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE.
	(CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D,
	CUDA_MEMCPY3D_PEER): New typdefs.
	(cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned,
	cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer,
	cuMemcpy3DPeerAsync): New prototypes.

libgomp/ChangeLog:

	* libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d,
	GOMP_OFFLOAD_memcpy3d): New prototypes.
	* libgomp.h (struct gomp_device_descr): Add memcpy2d_func
	and memcpy3d_func.
	* libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used.
	* oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL.
	* plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned,
	cuMemcpy3D): Invoke via CUDA_ONE_CALL.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d,
	GOMP_OFFLOAD_memcpy3d): New.
	* target.c (omp_target_memcpy_rect_worker):
	(omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy):
	Permit all device-to-device copyies; invoke new plugins for
	2D and 3D copying when available.
	(gomp_load_plugin_for_device): DLSYM the new plugin functions.
	* testsuite/libgomp.c/target-12.c: Fix dimension bug.
	* testsuite/libgomp.fortran/target-12.f90: Likewise.
	* testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.
2023-07-26 16:22:35 +02:00
Jakub Jelinek
83ffe9cde7 Update copyright years. 2023-01-16 11:52:17 +01:00
Tobias Burnus
131d18e928 libgomp/nvptx: Prepare for reverse-offload callback handling
This patch adds a stub 'gomp_target_rev' in the host's target.c, which will
later handle the reverse offload.
For nvptx, it adds support for forwarding the offload gomp_target_ext call
to the host by setting values in a struct on the device and querying it on
the host - invoking gomp_target_rev on the result.

include/ChangeLog:

	* cuda/cuda.h (enum CUdevice_attribute): Add
	CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING.
	(CU_MEMHOSTALLOC_DEVICEMAP): Define.
	(cuMemHostAlloc): Add prototype.

libgomp/ChangeLog:

	* config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove
	'static' for this variable.
	* config/nvptx/libgomp-nvptx.h: New file.
	* config/nvptx/target.c: Include it.
	(GOMP_ADDITIONAL_ICVS): Declare extern var.
	(GOMP_REV_OFFLOAD_VAR): Declare var.
	(GOMP_target_ext): Handle reverse offload.
	* libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype.
	* libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ...
	* target.c (gomp_target_rev): ... this new stub function.
	* libgomp.h (gomp_target_rev): Declare.
	* libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev.
	* plugin/cuda-lib.def (cuMemHostAlloc): Add.
	* plugin/plugin-nvptx.c: Include libgomp-nvptx.h.
	(struct ptx_device): Add rev_data member.
	(nvptx_open_device): Remove async_engines query, last used in
	r10-304-g1f4c5b9b; add unified-address assert check.
	(GOMP_OFFLOAD_get_num_devices): Claim unified address
	support.
	(GOMP_OFFLOAD_load_image): Free rev_fn_table if no
	offload functions exist. Make offload var available
	on host and device.
	(rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New.
	(GOMP_OFFLOAD_run): Handle reverse offload.
2022-10-24 17:04:08 +02:00
Thomas Schwinge
86f64400a5 'include/cuda/cuda.h': Add parts necessary for nvptx-tools 'nvptx-run'
include/
	* cuda/cuda.h (enum CUjit_option): Add
	'CU_JIT_GENERATE_DEBUG_INFO', 'CU_JIT_GENERATE_LINE_INFO'.
	(enum CUlimit): Add 'CU_LIMIT_STACK_SIZE',
	'CU_LIMIT_MALLOC_HEAP_SIZE'.
	(cuCtxSetLimit, cuGetErrorName): Add.
2022-05-18 12:06:20 +02:00
Thomas Schwinge
bdd1dc1bfb 'include/cuda/cuda.h': For C++, wrap in 'extern "C"'
include/
	* cuda/cuda.h: For C++, wrap in 'extern "C"'.
2022-05-18 12:06:20 +02:00
Thomas Schwinge
5e431ae4cc Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h'
... so that it may be used by other projects that inherit GCC's 'include'
directory.

	include/
	* cuda/cuda.h: New file.
	libgomp/
	* plugin/cuda/cuda.h: Remove file.
	* plugin/plugin-nvptx.c [PLUGIN_NVPTX_DYNAMIC]: Include
	"cuda/cuda.h" instead of <cuda.h>.
	* plugin/configfrag.ac <PLUGIN_NVPTX_DYNAMIC>: Don't set
	'PLUGIN_NVPTX_CPPFLAGS'.
	* configure: Regenerate.
2022-04-06 22:30:14 +02:00