aboutsummaryrefslogtreecommitdiff
path: root/libgomp/doc/implementation-status-and-implementation-defined-behavior.rst
diff options
context:
space:
mode:
Diffstat (limited to 'libgomp/doc/implementation-status-and-implementation-defined-behavior.rst')
-rw-r--r--libgomp/doc/implementation-status-and-implementation-defined-behavior.rst281
1 files changed, 281 insertions, 0 deletions
diff --git a/libgomp/doc/implementation-status-and-implementation-defined-behavior.rst b/libgomp/doc/implementation-status-and-implementation-defined-behavior.rst
new file mode 100644
index 0000000..2c65c71
--- /dev/null
+++ b/libgomp/doc/implementation-status-and-implementation-defined-behavior.rst
@@ -0,0 +1,281 @@
+..
+ Copyright 1988-2022 Free Software Foundation, Inc.
+ This is part of the GCC manual.
+ For copying conditions, see the copyright.rst file.
+
+Implementation Status and Implementation-Defined Behavior
+*********************************************************
+
+We're implementing the OpenACC Profiling Interface as defined by the
+OpenACC 2.6 specification. We're clarifying some aspects here as
+*implementation-defined behavior*, while they're still under
+discussion within the OpenACC Technical Committee.
+
+This implementation is tuned to keep the performance impact as low as
+possible for the (very common) case that the Profiling Interface is
+not enabled. This is relevant, as the Profiling Interface affects all
+the *hot* code paths (in the target code, not in the offloaded
+code). Users of the OpenACC Profiling Interface can be expected to
+understand that performance will be impacted to some degree once the
+Profiling Interface has gotten enabled: for example, because of the
+*runtime* (libgomp) calling into a third-party *library* for
+every event that has been registered.
+
+We're not yet accounting for the fact that OpenACC events may
+occur during event processing.
+We just handle one case specially, as required by CUDA 9.0
+:command:`nvprof`, that ``acc_get_device_type``
+(:ref:`acc_get_device_type`)) may be called from
+``acc_ev_device_init_start``, ``acc_ev_device_init_end``
+callbacks.
+
+We're not yet implementing initialization via a
+``acc_register_library`` function that is either statically linked
+in, or dynamically via :envvar:`LD_PRELOAD`.
+Initialization via ``acc_register_library`` functions dynamically
+loaded via the :envvar:`ACC_PROFLIB` environment variable does work, as
+does directly calling ``acc_prof_register``,
+``acc_prof_unregister``, ``acc_prof_lookup``.
+
+As currently there are no inquiry functions defined, calls to
+``acc_prof_lookup`` will always return ``NULL``.
+
+There aren't separate *start*, *stop* events defined for the
+event types ``acc_ev_create``, ``acc_ev_delete``,
+``acc_ev_alloc``, ``acc_ev_free``. It's not clear if these
+should be triggered before or after the actual device-specific call is
+made. We trigger them after.
+
+Remarks about data provided to callbacks:
+
+acc_prof_info.event_type
+ It's not clear if for *nested* event callbacks (for example,
+ ``acc_ev_enqueue_launch_start`` as part of a parent compute
+ construct), this should be set for the nested event
+ (``acc_ev_enqueue_launch_start``), or if the value of the parent
+ construct should remain (``acc_ev_compute_construct_start``). In
+ this implementation, the value will generally correspond to the
+ innermost nested event type.
+
+acc_prof_info.device_type
+ * For ``acc_ev_compute_construct_start``, and in presence of an
+ ``if`` clause with *false* argument, this will still refer to
+ the offloading device type.
+ It's not clear if that's the expected behavior.
+
+ * Complementary to the item before, for
+ ``acc_ev_compute_construct_end``, this is set to
+ ``acc_device_host`` in presence of an ``if`` clause with
+ *false* argument.
+ It's not clear if that's the expected behavior.
+
+acc_prof_info.thread_id
+ Always ``-1`` ; not yet implemented.
+
+acc_prof_info.async
+ * Not yet implemented correctly for
+ ``acc_ev_compute_construct_start``.
+
+ * In a compute construct, for host-fallback
+ execution/ ``acc_device_host`` it will always be
+ ``acc_async_sync``.
+ It's not clear if that's the expected behavior.
+
+ * For ``acc_ev_device_init_start`` and ``acc_ev_device_init_end``,
+ it will always be ``acc_async_sync``.
+ It's not clear if that's the expected behavior.
+
+acc_prof_info.async_queue
+ There is no limited number of asynchronous queues in libgomp.
+ This will always have the same value as ``acc_prof_info.async``.
+
+acc_prof_info.src_file
+ Always ``NULL`` ; not yet implemented.
+
+acc_prof_info.func_name
+ Always ``NULL`` ; not yet implemented.
+
+acc_prof_info.line_no
+ Always ``-1`` ; not yet implemented.
+
+acc_prof_info.end_line_no
+ Always ``-1`` ; not yet implemented.
+
+acc_prof_info.func_line_no
+ Always ``-1`` ; not yet implemented.
+
+acc_prof_info.func_end_line_no
+ Always ``-1`` ; not yet implemented.
+
+acc_event_info.event_type, acc_event_info.*.event_type
+ Relating to ``acc_prof_info.event_type`` discussed above, in this
+ implementation, this will always be the same value as
+ ``acc_prof_info.event_type``.
+
+acc_event_info.\*.parent_construct
+ * Will be ``acc_construct_parallel`` for all OpenACC compute
+ constructs as well as many OpenACC Runtime API calls; should be the
+ one matching the actual construct, or
+ ``acc_construct_runtime_api``, respectively.
+
+ * Will be ``acc_construct_enter_data`` or
+ ``acc_construct_exit_data`` when processing variable mappings
+ specified in OpenACC *declare* directives; should be
+ ``acc_construct_declare``.
+
+ * For implicit ``acc_ev_device_init_start``,
+ ``acc_ev_device_init_end``, and explicit as well as implicit
+ ``acc_ev_alloc``, ``acc_ev_free``,
+ ``acc_ev_enqueue_upload_start``, ``acc_ev_enqueue_upload_end``,
+ ``acc_ev_enqueue_download_start``, and
+ ``acc_ev_enqueue_download_end``, will be
+ ``acc_construct_parallel`` ; should reflect the real parent
+ construct.
+
+acc_event_info.\*.implicit
+ For ``acc_ev_alloc``, ``acc_ev_free``,
+ ``acc_ev_enqueue_upload_start``, ``acc_ev_enqueue_upload_end``,
+ ``acc_ev_enqueue_download_start``, and
+ ``acc_ev_enqueue_download_end``, this currently will be ``1``
+ also for explicit usage.
+
+acc_event_info.data_event.var_name
+ Always ``NULL`` ; not yet implemented.
+
+acc_event_info.data_event.host_ptr
+ For ``acc_ev_alloc``, and ``acc_ev_free``, this is always
+ ``NULL``.
+
+typedef union acc_api_info
+ ... as printed in 5.2.3. Third Argument: API-Specific
+ Information. This should obviously be ``typedef struct
+ acc_api_info``.
+
+acc_api_info.device_api
+ Possibly not yet implemented correctly for
+ ``acc_ev_compute_construct_start``,
+ ``acc_ev_device_init_start``, ``acc_ev_device_init_end`` :
+ will always be ``acc_device_api_none`` for these event types.
+ For ``acc_ev_enter_data_start``, it will be
+ ``acc_device_api_none`` in some cases.
+
+acc_api_info.device_type
+ Always the same as ``acc_prof_info.device_type``.
+
+acc_api_info.vendor
+ Always ``-1`` ; not yet implemented.
+
+acc_api_info.device_handle
+ Always ``NULL`` ; not yet implemented.
+
+acc_api_info.context_handle
+ Always ``NULL`` ; not yet implemented.
+
+acc_api_info.async_handle
+ Always ``NULL`` ; not yet implemented.
+
+Remarks about certain event types:
+
+acc_ev_device_init_start, acc_ev_device_init_end
+ *
+ .. See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
+ 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
+ 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
+
+ When a compute construct triggers implicit
+ ``acc_ev_device_init_start`` and ``acc_ev_device_init_end``
+ events, they currently aren't *nested within* the corresponding
+ ``acc_ev_compute_construct_start`` and
+ ``acc_ev_compute_construct_end``, but they're currently observed
+ *before* ``acc_ev_compute_construct_start``.
+ It's not clear what to do: the standard asks us provide a lot of
+ details to the ``acc_ev_compute_construct_start`` callback, without
+ (implicitly) initializing a device before?
+
+ * Callbacks for these event types will not be invoked for calls to the
+ ``acc_set_device_type`` and ``acc_set_device_num`` functions.
+ It's not clear if they should be.
+
+acc_ev_enter_data_start, acc_ev_enter_data_end, acc_ev_exit_data_start, acc_ev_exit_data_end
+ * Callbacks for these event types will also be invoked for OpenACC
+ *host_data* constructs.
+ It's not clear if they should be.
+
+ * Callbacks for these event types will also be invoked when processing
+ variable mappings specified in OpenACC *declare* directives.
+ It's not clear if they should be.
+
+Callbacks for the following event types will be invoked, but dispatch
+and information provided therein has not yet been thoroughly reviewed:
+
+* ``acc_ev_alloc``
+
+* ``acc_ev_free``
+
+* ``acc_ev_update_start``, ``acc_ev_update_end``
+
+* ``acc_ev_enqueue_upload_start``, ``acc_ev_enqueue_upload_end``
+
+* ``acc_ev_enqueue_download_start``, ``acc_ev_enqueue_download_end``
+
+During device initialization, and finalization, respectively,
+callbacks for the following event types will not yet be invoked:
+
+* ``acc_ev_alloc``
+
+* ``acc_ev_free``
+
+Callbacks for the following event types have not yet been implemented,
+so currently won't be invoked:
+
+* ``acc_ev_device_shutdown_start``, ``acc_ev_device_shutdown_end``
+
+* ``acc_ev_runtime_shutdown``
+
+* ``acc_ev_create``, ``acc_ev_delete``
+
+* ``acc_ev_wait_start``, ``acc_ev_wait_end``
+
+For the following runtime library functions, not all expected
+callbacks will be invoked (mostly concerning implicit device
+initialization):
+
+* ``acc_get_num_devices``
+
+* ``acc_set_device_type``
+
+* ``acc_get_device_type``
+
+* ``acc_set_device_num``
+
+* ``acc_get_device_num``
+
+* ``acc_init``
+
+* ``acc_shutdown``
+
+Aside from implicit device initialization, for the following runtime
+library functions, no callbacks will be invoked for shared-memory
+offloading devices (it's not clear if they should be):
+
+* ``acc_malloc``
+
+* ``acc_free``
+
+* ``acc_copyin``, ``acc_present_or_copyin``, ``acc_copyin_async``
+
+* ``acc_create``, ``acc_present_or_create``, ``acc_create_async``
+
+* ``acc_copyout``, ``acc_copyout_async``, ``acc_copyout_finalize``, ``acc_copyout_finalize_async``
+
+* ``acc_delete``, ``acc_delete_async``, ``acc_delete_finalize``, ``acc_delete_finalize_async``
+
+* ``acc_update_device``, ``acc_update_device_async``
+
+* ``acc_update_self``, ``acc_update_self_async``
+
+* ``acc_map_data``, ``acc_unmap_data``
+
+* ``acc_memcpy_to_device``, ``acc_memcpy_to_device_async``
+
+* ``acc_memcpy_from_device``, ``acc_memcpy_from_device_async`` \ No newline at end of file