diff options
Diffstat (limited to 'libgomp/doc/implementation-status-and-implementation-defined-behavior.rst')
-rw-r--r-- | libgomp/doc/implementation-status-and-implementation-defined-behavior.rst | 281 |
1 files changed, 281 insertions, 0 deletions
diff --git a/libgomp/doc/implementation-status-and-implementation-defined-behavior.rst b/libgomp/doc/implementation-status-and-implementation-defined-behavior.rst new file mode 100644 index 0000000..2c65c71 --- /dev/null +++ b/libgomp/doc/implementation-status-and-implementation-defined-behavior.rst @@ -0,0 +1,281 @@ +.. + Copyright 1988-2022 Free Software Foundation, Inc. + This is part of the GCC manual. + For copying conditions, see the copyright.rst file. + +Implementation Status and Implementation-Defined Behavior +********************************************************* + +We're implementing the OpenACC Profiling Interface as defined by the +OpenACC 2.6 specification. We're clarifying some aspects here as +*implementation-defined behavior*, while they're still under +discussion within the OpenACC Technical Committee. + +This implementation is tuned to keep the performance impact as low as +possible for the (very common) case that the Profiling Interface is +not enabled. This is relevant, as the Profiling Interface affects all +the *hot* code paths (in the target code, not in the offloaded +code). Users of the OpenACC Profiling Interface can be expected to +understand that performance will be impacted to some degree once the +Profiling Interface has gotten enabled: for example, because of the +*runtime* (libgomp) calling into a third-party *library* for +every event that has been registered. + +We're not yet accounting for the fact that OpenACC events may +occur during event processing. +We just handle one case specially, as required by CUDA 9.0 +:command:`nvprof`, that ``acc_get_device_type`` +(:ref:`acc_get_device_type`)) may be called from +``acc_ev_device_init_start``, ``acc_ev_device_init_end`` +callbacks. + +We're not yet implementing initialization via a +``acc_register_library`` function that is either statically linked +in, or dynamically via :envvar:`LD_PRELOAD`. +Initialization via ``acc_register_library`` functions dynamically +loaded via the :envvar:`ACC_PROFLIB` environment variable does work, as +does directly calling ``acc_prof_register``, +``acc_prof_unregister``, ``acc_prof_lookup``. + +As currently there are no inquiry functions defined, calls to +``acc_prof_lookup`` will always return ``NULL``. + +There aren't separate *start*, *stop* events defined for the +event types ``acc_ev_create``, ``acc_ev_delete``, +``acc_ev_alloc``, ``acc_ev_free``. It's not clear if these +should be triggered before or after the actual device-specific call is +made. We trigger them after. + +Remarks about data provided to callbacks: + +acc_prof_info.event_type + It's not clear if for *nested* event callbacks (for example, + ``acc_ev_enqueue_launch_start`` as part of a parent compute + construct), this should be set for the nested event + (``acc_ev_enqueue_launch_start``), or if the value of the parent + construct should remain (``acc_ev_compute_construct_start``). In + this implementation, the value will generally correspond to the + innermost nested event type. + +acc_prof_info.device_type + * For ``acc_ev_compute_construct_start``, and in presence of an + ``if`` clause with *false* argument, this will still refer to + the offloading device type. + It's not clear if that's the expected behavior. + + * Complementary to the item before, for + ``acc_ev_compute_construct_end``, this is set to + ``acc_device_host`` in presence of an ``if`` clause with + *false* argument. + It's not clear if that's the expected behavior. + +acc_prof_info.thread_id + Always ``-1`` ; not yet implemented. + +acc_prof_info.async + * Not yet implemented correctly for + ``acc_ev_compute_construct_start``. + + * In a compute construct, for host-fallback + execution/ ``acc_device_host`` it will always be + ``acc_async_sync``. + It's not clear if that's the expected behavior. + + * For ``acc_ev_device_init_start`` and ``acc_ev_device_init_end``, + it will always be ``acc_async_sync``. + It's not clear if that's the expected behavior. + +acc_prof_info.async_queue + There is no limited number of asynchronous queues in libgomp. + This will always have the same value as ``acc_prof_info.async``. + +acc_prof_info.src_file + Always ``NULL`` ; not yet implemented. + +acc_prof_info.func_name + Always ``NULL`` ; not yet implemented. + +acc_prof_info.line_no + Always ``-1`` ; not yet implemented. + +acc_prof_info.end_line_no + Always ``-1`` ; not yet implemented. + +acc_prof_info.func_line_no + Always ``-1`` ; not yet implemented. + +acc_prof_info.func_end_line_no + Always ``-1`` ; not yet implemented. + +acc_event_info.event_type, acc_event_info.*.event_type + Relating to ``acc_prof_info.event_type`` discussed above, in this + implementation, this will always be the same value as + ``acc_prof_info.event_type``. + +acc_event_info.\*.parent_construct + * Will be ``acc_construct_parallel`` for all OpenACC compute + constructs as well as many OpenACC Runtime API calls; should be the + one matching the actual construct, or + ``acc_construct_runtime_api``, respectively. + + * Will be ``acc_construct_enter_data`` or + ``acc_construct_exit_data`` when processing variable mappings + specified in OpenACC *declare* directives; should be + ``acc_construct_declare``. + + * For implicit ``acc_ev_device_init_start``, + ``acc_ev_device_init_end``, and explicit as well as implicit + ``acc_ev_alloc``, ``acc_ev_free``, + ``acc_ev_enqueue_upload_start``, ``acc_ev_enqueue_upload_end``, + ``acc_ev_enqueue_download_start``, and + ``acc_ev_enqueue_download_end``, will be + ``acc_construct_parallel`` ; should reflect the real parent + construct. + +acc_event_info.\*.implicit + For ``acc_ev_alloc``, ``acc_ev_free``, + ``acc_ev_enqueue_upload_start``, ``acc_ev_enqueue_upload_end``, + ``acc_ev_enqueue_download_start``, and + ``acc_ev_enqueue_download_end``, this currently will be ``1`` + also for explicit usage. + +acc_event_info.data_event.var_name + Always ``NULL`` ; not yet implemented. + +acc_event_info.data_event.host_ptr + For ``acc_ev_alloc``, and ``acc_ev_free``, this is always + ``NULL``. + +typedef union acc_api_info + ... as printed in 5.2.3. Third Argument: API-Specific + Information. This should obviously be ``typedef struct + acc_api_info``. + +acc_api_info.device_api + Possibly not yet implemented correctly for + ``acc_ev_compute_construct_start``, + ``acc_ev_device_init_start``, ``acc_ev_device_init_end`` : + will always be ``acc_device_api_none`` for these event types. + For ``acc_ev_enter_data_start``, it will be + ``acc_device_api_none`` in some cases. + +acc_api_info.device_type + Always the same as ``acc_prof_info.device_type``. + +acc_api_info.vendor + Always ``-1`` ; not yet implemented. + +acc_api_info.device_handle + Always ``NULL`` ; not yet implemented. + +acc_api_info.context_handle + Always ``NULL`` ; not yet implemented. + +acc_api_info.async_handle + Always ``NULL`` ; not yet implemented. + +Remarks about certain event types: + +acc_ev_device_init_start, acc_ev_device_init_end + * + .. See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in + 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c', + 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'. + + When a compute construct triggers implicit + ``acc_ev_device_init_start`` and ``acc_ev_device_init_end`` + events, they currently aren't *nested within* the corresponding + ``acc_ev_compute_construct_start`` and + ``acc_ev_compute_construct_end``, but they're currently observed + *before* ``acc_ev_compute_construct_start``. + It's not clear what to do: the standard asks us provide a lot of + details to the ``acc_ev_compute_construct_start`` callback, without + (implicitly) initializing a device before? + + * Callbacks for these event types will not be invoked for calls to the + ``acc_set_device_type`` and ``acc_set_device_num`` functions. + It's not clear if they should be. + +acc_ev_enter_data_start, acc_ev_enter_data_end, acc_ev_exit_data_start, acc_ev_exit_data_end + * Callbacks for these event types will also be invoked for OpenACC + *host_data* constructs. + It's not clear if they should be. + + * Callbacks for these event types will also be invoked when processing + variable mappings specified in OpenACC *declare* directives. + It's not clear if they should be. + +Callbacks for the following event types will be invoked, but dispatch +and information provided therein has not yet been thoroughly reviewed: + +* ``acc_ev_alloc`` + +* ``acc_ev_free`` + +* ``acc_ev_update_start``, ``acc_ev_update_end`` + +* ``acc_ev_enqueue_upload_start``, ``acc_ev_enqueue_upload_end`` + +* ``acc_ev_enqueue_download_start``, ``acc_ev_enqueue_download_end`` + +During device initialization, and finalization, respectively, +callbacks for the following event types will not yet be invoked: + +* ``acc_ev_alloc`` + +* ``acc_ev_free`` + +Callbacks for the following event types have not yet been implemented, +so currently won't be invoked: + +* ``acc_ev_device_shutdown_start``, ``acc_ev_device_shutdown_end`` + +* ``acc_ev_runtime_shutdown`` + +* ``acc_ev_create``, ``acc_ev_delete`` + +* ``acc_ev_wait_start``, ``acc_ev_wait_end`` + +For the following runtime library functions, not all expected +callbacks will be invoked (mostly concerning implicit device +initialization): + +* ``acc_get_num_devices`` + +* ``acc_set_device_type`` + +* ``acc_get_device_type`` + +* ``acc_set_device_num`` + +* ``acc_get_device_num`` + +* ``acc_init`` + +* ``acc_shutdown`` + +Aside from implicit device initialization, for the following runtime +library functions, no callbacks will be invoked for shared-memory +offloading devices (it's not clear if they should be): + +* ``acc_malloc`` + +* ``acc_free`` + +* ``acc_copyin``, ``acc_present_or_copyin``, ``acc_copyin_async`` + +* ``acc_create``, ``acc_present_or_create``, ``acc_create_async`` + +* ``acc_copyout``, ``acc_copyout_async``, ``acc_copyout_finalize``, ``acc_copyout_finalize_async`` + +* ``acc_delete``, ``acc_delete_async``, ``acc_delete_finalize``, ``acc_delete_finalize_async`` + +* ``acc_update_device``, ``acc_update_device_async`` + +* ``acc_update_self``, ``acc_update_self_async`` + +* ``acc_map_data``, ``acc_unmap_data`` + +* ``acc_memcpy_to_device``, ``acc_memcpy_to_device_async`` + +* ``acc_memcpy_from_device``, ``acc_memcpy_from_device_async``
\ No newline at end of file |