diff options
author | Benjamin Kosnik <bkoz@redhat.com> | 2008-03-20 14:20:49 +0000 |
---|---|---|
committer | Benjamin Kosnik <bkoz@gcc.gnu.org> | 2008-03-20 14:20:49 +0000 |
commit | 1285e2a25db39ca03eb0c0474a5d03c5a12782b4 (patch) | |
tree | f8420e783a074cdd0190903ad3f7a9c2aa2df111 /libstdc++-v3/doc | |
parent | 6fd85d214441ab1760f2d650399433fbcb7681d2 (diff) | |
download | gcc-1285e2a25db39ca03eb0c0474a5d03c5a12782b4.zip gcc-1285e2a25db39ca03eb0c0474a5d03c5a12782b4.tar.gz gcc-1285e2a25db39ca03eb0c0474a5d03c5a12782b4.tar.bz2 |
re PR libstdc++/35256 (Bad link on http://gcc.gnu.org/onlinedocs/libstdc++/parallel_mode.html)
2008-03-19 Benjamin Kosnik <bkoz@redhat.com>
PR libstdc++/35256
* doc/xml/manual/parallel_mode.xml: Correct configuration documentation.
* doc/html/manual/bk01pt12ch31s04.html: Regenerate.
From-SVN: r133378
Diffstat (limited to 'libstdc++-v3/doc')
-rw-r--r-- | libstdc++-v3/doc/html/manual/bk01pt12ch31s04.html | 139 | ||||
-rw-r--r-- | libstdc++-v3/doc/xml/manual/parallel_mode.xml | 290 |
2 files changed, 328 insertions, 101 deletions
diff --git a/libstdc++-v3/doc/html/manual/bk01pt12ch31s04.html b/libstdc++-v3/doc/html/manual/bk01pt12ch31s04.html index 99c1356..3db7d91 100644 --- a/libstdc++-v3/doc/html/manual/bk01pt12ch31s04.html +++ b/libstdc++-v3/doc/html/manual/bk01pt12ch31s04.html @@ -1,9 +1,10 @@ <?xml version="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Design</title><meta name="generator" content="DocBook XSL Stylesheets V1.73.2" /><meta name="keywords" content=" C++ , library , parallel " /><meta name="keywords" content=" ISO C++ , library " /><link rel="start" href="../spine.html" title="The GNU C++ Library Documentation" /><link rel="up" href="parallel_mode.html" title="Chapter 31. Parallel Mode" /><link rel="prev" href="bk01pt12ch31s03.html" title="Using" /><link rel="next" href="bk01pt12ch31s05.html" title="Testing" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Design</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="bk01pt12ch31s03.html">Prev</a> </td><th width="60%" align="center">Chapter 31. Parallel Mode</th><td width="20%" align="right"> <a accesskey="n" href="bk01pt12ch31s05.html">Next</a></td></tr></table><hr /></div><div class="sect1" lang="en" xml:lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="manual.ext.parallel_mode.design"></a>Design</h2></div></div></div><p> - </p><div class="sect2" lang="en" xml:lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.parallel_mode.design.intro"></a>Interface Basics</h3></div></div></div><p>All parallel algorithms are intended to have signatures that are + </p><div class="sect2" lang="en" xml:lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.parallel_mode.design.intro"></a>Interface Basics</h3></div></div></div><p> +All parallel algorithms are intended to have signatures that are equivalent to the ISO C++ algorithms replaced. For instance, the -<code class="code">std::adjacent_find</code> function is declared as: +<code class="function">std::adjacent_find</code> function is declared as: </p><pre class="programlisting"> namespace std { @@ -57,36 +58,124 @@ parallel algorithms look like this: ISO C++ signature to the correct parallel version. Also, some of the algorithms do not have support for run-time conditions, so the last overload is therefore missing. -</p></div><div class="sect2" lang="en" xml:lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.parallel_mode.design.tuning"></a>Configuration and Tuning</h3></div></div></div><p> Some algorithm variants can be enabled/disabled/selected at compile-time. -See <a class="ulink" href="latest-doxygen/compiletime__settings_8h.html" target="_top"> -<code class="code"><compiletime_settings.h></code></a> and -See <a class="ulink" href="latest-doxygen/compiletime__settings_8h.html" target="_top"> -<code class="code"><features.h></code></a> for details. +</p></div><div class="sect2" lang="en" xml:lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.parallel_mode.design.tuning"></a>Configuration and Tuning</h3></div></div></div><div class="sect3" lang="en" xml:lang="en"><div class="titlepage"><div><div><h4 class="title"><a id="parallel_mode.design.tuning.omp"></a>Setting up the OpenMP Environment</h4></div></div></div><p> +Several aspects of the overall runtime environment can be manipulated +by standard OpenMP function calls. </p><p> -To specify the number of threads to be used for an algorithm, -use <code class="code">omp_set_num_threads</code>. -To force a function to execute sequentially, -even though parallelism is switched on in general, -add <code class="code">__gnu_parallel::sequential_tag()</code> -to the end of the argument list. +To specify the number of threads to be used for an algorithm, use the +function <code class="function">omp_set_num_threads</code>. An example: +</p><pre class="programlisting"> +#include <stdlib.h> +#include <omp.h> + +int main() +{ + // Explicitly set number of threads. + const int threads_wanted = 20; + omp_set_dynamic(false); + omp_set_num_threads(threads_wanted); + if (omp_get_num_threads() != threads_wanted) + abort(); + + // Do work. + + return 0; +} +</pre><p> +Other parts of the runtime environment able to be manipulated include +nested parallelism (<code class="function">omp_set_nested</code>), schedule kind +(<code class="function">omp_set_schedule</code>), and others. See the OpenMP +documentation for more information. +</p></div><div class="sect3" lang="en" xml:lang="en"><div class="titlepage"><div><div><h4 class="title"><a id="parallel_mode.design.tuning.compile"></a>Compile Time Switches</h4></div></div></div><p> +To force an algorithm to execute sequentially, even though parallelism +is switched on in general via the macro <code class="constant">_GLIBCXX_PARALLEL</code>, +add <code class="classname">__gnu_parallel::sequential_tag()</code> to the end +of the algorithm's argument list, or explicitly qualify the algorithm +with the <code class="code">__gnu_parallel::</code> namespace. +</p><p> +Like so: +</p><pre class="programlisting"> +std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); +</pre><p> +or +</p><pre class="programlisting"> +__gnu_serial::sort(v.begin(), v.end()); +</pre><p> +In addition, some parallel algorithm variants can be enabled/disabled/selected +at compile-time. +</p><p> +See <a class="ulink" href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a00446.html" target="_top"><code class="filename">compiletime_settings.h</code></a> and +See <a class="ulink" href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a00505.html" target="_top"><code class="filename">features.h</code></a> for details. +</p></div><div class="sect3" lang="en" xml:lang="en"><div class="titlepage"><div><div><h4 class="title"><a id="parallel_mode.design.tuning.settings"></a>Run Time Settings and Defaults</h4></div></div></div><p> +The default parallization strategy, the choice of specific algorithm +strategy, the minimum threshold limits for individual parallel +algorithms, and aspects of the underlying hardware can be specified as +desired via manipulation +of <code class="classname">__gnu_parallel::_Settings</code> member data. </p><p> -Parallelism always incurs some overhead. Thus, it is not -helpful to parallelize operations on very small sets of data. -There are measures to avoid parallelizing stuff that is not worth it. -For each algorithm, a minimum problem size can be stated, -usually using the variable -<code class="code">__gnu_parallel::Settings::[algorithm]_minimal_n</code>. -Please see <a class="ulink" href="latest-doxygen/settings_8h.html" target="_top"> -<code class="code"><settings.h></code></a> for details.</p></div><div class="sect2" lang="en" xml:lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.parallel_mode.design.impl"></a>Implementation Namespaces</h3></div></div></div><p> One namespace contain versions of code that are explicitly sequential: +First off, the choice of parallelization strategy: serial, parallel, +or implementation-deduced. This corresponds +to <code class="code">__gnu_parallel::_Settings::algorithm_strategy</code> and is a +value of enum <span class="type">__gnu_parallel::_AlgorithmStrategy</span> +type. Choices +include: <span class="type">heuristic</span>, <span class="type">force_sequential</span>, +and <span class="type">force_parallel</span>. The default is +implementation-deduced, ie <span class="type">heuristic</span>. +</p><p> +Next, the sub-choices for algorithm implementation. Specific +algorithms like <code class="function">find</code> or <code class="function">sort</code> +can be implemented in multiple ways: when this is the case, +a <code class="classname">__gnu_parallel::_Settings</code> member exists to +pick the default strategy. For +example, <code class="code">__gnu_parallel::_Settings::sort_algorithm</code> can +have any values of +enum <span class="type">__gnu_parallel::_SortAlgorithm</span>: <span class="type">MWMS</span>, <span class="type">QS</span>, +or <span class="type">QS_BALANCED</span>. +</p><p> +Likewise for setting the minimal threshold for algorithm +paralleization. Parallelism always incurs some overhead. Thus, it is +not helpful to parallelize operations on very small sets of +data. Because of this, measures are taken to avoid parallelizing below +a certain, pre-determined threshold. For each algorithm, a minimum +problem size is encoded as a variable in the +active <code class="classname">__gnu_parallel::_Settings</code> object. This +threshold variable follows the following naming scheme: +<code class="code">__gnu_parallel::_Settings::[algorithm]_minimal_n</code>. So, +for <code class="function">fill</code>, the threshold variable +is <code class="code">__gnu_parallel::_Settings::fill_minimal_n</code> +</p><p> +Finally, hardware details like L1/L2 cache size can be hardwired +via <code class="code">__gnu_parallel::_Settings::L1_cache_size</code> and friends. +</p><p> +All these configuration variables can be changed by the user, if +desired. Please +see <a class="ulink" href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a00640.html" target="_top"><code class="filename">settings.h</code></a> +for complete details. +</p><p> +A small example of tuning the default: +</p><pre class="programlisting"> +#include <parallel/algorithm> +#include <parallel/settings.h> + +int main() +{ + __gnu_parallel::_Settings s; + s.algorithm_strategy = __gnu_parallel::force_parallel; + __gnu_parallel::_Settings::set(s); + + // Do work... all algorithms will be parallelized, always. + + return 0; +} +</pre></div></div><div class="sect2" lang="en" xml:lang="en"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.parallel_mode.design.impl"></a>Implementation Namespaces</h3></div></div></div><p> One namespace contain versions of code that are always +explicitly sequential: <code class="code">__gnu_serial</code>. </p><p> Two namespaces contain the parallel mode: <code class="code">std::__parallel</code> and <code class="code">__gnu_parallel</code>. </p><p> Parallel implementations of standard components, including template helpers to select parallelism, are defined in <code class="code">namespace -std::__parallel</code>. For instance, <code class="code">std::transform</code> from -<algorithm> has a parallel counterpart in -<code class="code">std::__parallel::transform</code> from -<parallel/algorithm>. In addition, these parallel +std::__parallel</code>. For instance, <code class="function">std::transform</code> from <code class="filename">algorithm</code> has a parallel counterpart in +<code class="function">std::__parallel::transform</code> from <code class="filename">parallel/algorithm</code>. In addition, these parallel implementations are injected into <code class="code">namespace __gnu_parallel</code> with using declarations. </p><p> Support and general infrastructure is in <code class="code">namespace diff --git a/libstdc++-v3/doc/xml/manual/parallel_mode.xml b/libstdc++-v3/doc/xml/manual/parallel_mode.xml index 4236f63..0bcbbca 100644 --- a/libstdc++-v3/doc/xml/manual/parallel_mode.xml +++ b/libstdc++-v3/doc/xml/manual/parallel_mode.xml @@ -28,7 +28,7 @@ implementation of many algorithms the C++ Standard Library. <para> Several of the standard algorithms, for instance -<code>std::sort</code>, are made parallel using OpenMP +<function>std::sort</function>, are made parallel using OpenMP annotations. These parallel mode constructs and can be invoked by explicit source declaration or by compiling existing sources with a specific compiler flag. @@ -39,52 +39,52 @@ specific compiler flag. <title>Intro</title> <para>The following library components in the include -<code><numeric></code> are included in the parallel mode:</para> +<filename class="headerfile">numeric</filename> are included in the parallel mode:</para> <itemizedlist> - <listitem><para><code>std::accumulate</code></para></listitem> - <listitem><para><code>std::adjacent_difference</code></para></listitem> - <listitem><para><code>std::inner_product</code></para></listitem> - <listitem><para><code>std::partial_sum</code></para></listitem> + <listitem><para><function>std::accumulate</function></para></listitem> + <listitem><para><function>std::adjacent_difference</function></para></listitem> + <listitem><para><function>std::inner_product</function></para></listitem> + <listitem><para><function>std::partial_sum</function></para></listitem> </itemizedlist> <para>The following library components in the include -<code><algorithm></code> are included in the parallel mode:</para> +<filename class="headerfile">algorithm</filename> are included in the parallel mode:</para> <itemizedlist> - <listitem><para><code>std::adjacent_find</code></para></listitem> - <listitem><para><code>std::count</code></para></listitem> - <listitem><para><code>std::count_if</code></para></listitem> - <listitem><para><code>std::equal</code></para></listitem> - <listitem><para><code>std::find</code></para></listitem> - <listitem><para><code>std::find_if</code></para></listitem> - <listitem><para><code>std::find_first_of</code></para></listitem> - <listitem><para><code>std::for_each</code></para></listitem> - <listitem><para><code>std::generate</code></para></listitem> - <listitem><para><code>std::generate_n</code></para></listitem> - <listitem><para><code>std::lexicographical_compare</code></para></listitem> - <listitem><para><code>std::mismatch</code></para></listitem> - <listitem><para><code>std::search</code></para></listitem> - <listitem><para><code>std::search_n</code></para></listitem> - <listitem><para><code>std::transform</code></para></listitem> - <listitem><para><code>std::replace</code></para></listitem> - <listitem><para><code>std::replace_if</code></para></listitem> - <listitem><para><code>std::max_element</code></para></listitem> - <listitem><para><code>std::merge</code></para></listitem> - <listitem><para><code>std::min_element</code></para></listitem> - <listitem><para><code>std::nth_element</code></para></listitem> - <listitem><para><code>std::partial_sort</code></para></listitem> - <listitem><para><code>std::partition</code></para></listitem> - <listitem><para><code>std::random_shuffle</code></para></listitem> - <listitem><para><code>std::set_union</code></para></listitem> - <listitem><para><code>std::set_intersection</code></para></listitem> - <listitem><para><code>std::set_symmetric_difference</code></para></listitem> - <listitem><para><code>std::set_difference</code></para></listitem> - <listitem><para><code>std::sort</code></para></listitem> - <listitem><para><code>std::stable_sort</code></para></listitem> - <listitem><para><code>std::unique_copy</code></para></listitem> + <listitem><para><function>std::adjacent_find</function></para></listitem> + <listitem><para><function>std::count</function></para></listitem> + <listitem><para><function>std::count_if</function></para></listitem> + <listitem><para><function>std::equal</function></para></listitem> + <listitem><para><function>std::find</function></para></listitem> + <listitem><para><function>std::find_if</function></para></listitem> + <listitem><para><function>std::find_first_of</function></para></listitem> + <listitem><para><function>std::for_each</function></para></listitem> + <listitem><para><function>std::generate</function></para></listitem> + <listitem><para><function>std::generate_n</function></para></listitem> + <listitem><para><function>std::lexicographical_compare</function></para></listitem> + <listitem><para><function>std::mismatch</function></para></listitem> + <listitem><para><function>std::search</function></para></listitem> + <listitem><para><function>std::search_n</function></para></listitem> + <listitem><para><function>std::transform</function></para></listitem> + <listitem><para><function>std::replace</function></para></listitem> + <listitem><para><function>std::replace_if</function></para></listitem> + <listitem><para><function>std::max_element</function></para></listitem> + <listitem><para><function>std::merge</function></para></listitem> + <listitem><para><function>std::min_element</function></para></listitem> + <listitem><para><function>std::nth_element</function></para></listitem> + <listitem><para><function>std::partial_sort</function></para></listitem> + <listitem><para><function>std::partition</function></para></listitem> + <listitem><para><function>std::random_shuffle</function></para></listitem> + <listitem><para><function>std::set_union</function></para></listitem> + <listitem><para><function>std::set_intersection</function></para></listitem> + <listitem><para><function>std::set_symmetric_difference</function></para></listitem> + <listitem><para><function>std::set_difference</function></para></listitem> + <listitem><para><function>std::sort</function></para></listitem> + <listitem><para><function>std::stable_sort</function></para></listitem> + <listitem><para><function>std::unique_copy</function></para></listitem> </itemizedlist> <para>The following library components in the includes -<code><set></code> and <code><map></code> are included in the parallel mode:</para> +<filename class="headerfile">set</filename> and <filename class="headerfile">map</filename> are included in the parallel mode:</para> <itemizedlist> <listitem><para><code>std::(multi_)map/set<T>::(multi_)map/set(Iterator begin, Iterator end)</code> (bulk construction)</para></listitem> <listitem><para><code>std::(multi_)map/set<T>::insert(Iterator begin, Iterator end)</code> (bulk insertion)</para></listitem> @@ -113,23 +113,25 @@ It might work with other compilers, though.</para> <sect2 id="parallel_mode.using.parallel_mode" xreflabel="using.parallel_mode"> <title>Using Parallel Mode</title> -<para>To use the libstdc++ parallel mode, compile your application with - the compiler flag <code>-D_GLIBCXX_PARALLEL -fopenmp</code>. This +<para> + To use the libstdc++ parallel mode, compile your application with + the compiler flag <constant>-D_GLIBCXX_PARALLEL -fopenmp</constant>. This will link in <code>libgomp</code>, the GNU OpenMP <ulink url="http://gcc.gnu.org/onlinedocs/libgomp">implementation</ulink>, whose presence is mandatory. In addition, hardware capable of atomic operations is mandatory. Actually activating these atomic operations may require explicit compiler flags on some targets - (like sparc and x86), such as <code>-march=i686</code>, - <code>-march=native</code> or <code>-mcpu=v9</code>. + (like sparc and x86), such as <literal>-march=i686</literal>, + <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. </para> -<para>Note that the <code>_GLIBCXX_PARALLEL</code> define may change the +<para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the sizes and behavior of standard class templates such as - <code>std::search</code>, and therefore one can only link code + <function>std::search</function>, and therefore one can only link code compiled with parallel mode and code compiled without parallel mode if no instantiation of a container is passed between the two translation units. Parallel mode functionality has distinct linkage, - and cannot be confused with normal mode symbols.</para> + and cannot be confused with normal mode symbols. +</para> </sect2> <sect2 id="manual.ext.parallel_mode.usings" xreflabel="using.specific"> @@ -420,9 +422,10 @@ It might work with other compilers, though.</para> <title>Interface Basics</title> -<para>All parallel algorithms are intended to have signatures that are +<para> +All parallel algorithms are intended to have signatures that are equivalent to the ISO C++ algorithms replaced. For instance, the -<code>std::adjacent_find</code> function is declared as: +<function>std::adjacent_find</function> function is declared as: </para> <programlisting> namespace std @@ -506,39 +509,176 @@ overload is therefore missing. <sect2 id="manual.ext.parallel_mode.design.tuning" xreflabel="Tuning"> <title>Configuration and Tuning</title> -<para> Some algorithm variants can be enabled/disabled/selected at compile-time. -See <ulink url="latest-doxygen/compiletime__settings_8h.html"> -<code><compiletime_settings.h></code></ulink> and -See <ulink url="latest-doxygen/compiletime__settings_8h.html"> -<code><features.h></code></ulink> for details. + +<sect3 id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"> + <title>Setting up the OpenMP Environment</title> + +<para> +Several aspects of the overall runtime environment can be manipulated +by standard OpenMP function calls. +</para> + +<para> +To specify the number of threads to be used for an algorithm, use the +function <function>omp_set_num_threads</function>. An example: +</para> + +<programlisting> +#include <stdlib.h> +#include <omp.h> + +int main() +{ + // Explicitly set number of threads. + const int threads_wanted = 20; + omp_set_dynamic(false); + omp_set_num_threads(threads_wanted); + if (omp_get_num_threads() != threads_wanted) + abort(); + + // Do work. + + return 0; +} +</programlisting> + +<para> +Other parts of the runtime environment able to be manipulated include +nested parallelism (<function>omp_set_nested</function>), schedule kind +(<function>omp_set_schedule</function>), and others. See the OpenMP +documentation for more information. +</para> + +</sect3> + +<sect3 id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"> + <title>Compile Time Switches</title> + +<para> +To force an algorithm to execute sequentially, even though parallelism +is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>, +add <classname>__gnu_parallel::sequential_tag()</classname> to the end +of the algorithm's argument list, or explicitly qualify the algorithm +with the <code>__gnu_parallel::</code> namespace. +</para> + +<para> +Like so: +</para> + +<programlisting> +std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); +</programlisting> + +<para> +or +</para> + +<programlisting> +__gnu_serial::sort(v.begin(), v.end()); +</programlisting> + +<para> +In addition, some parallel algorithm variants can be enabled/disabled/selected +at compile-time. +</para> + +<para> +See <ulink url="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a00446.html"><filename class="headerfile">compiletime_settings.h</filename></ulink> and +See <ulink url="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a00505.html"><filename class="headerfile">features.h</filename></ulink> for details. +</para> +</sect3> + +<sect3 id="parallel_mode.design.tuning.settings" xreflabel="_Settings"> + <title>Run Time Settings and Defaults</title> + +<para> +The default parallization strategy, the choice of specific algorithm +strategy, the minimum threshold limits for individual parallel +algorithms, and aspects of the underlying hardware can be specified as +desired via manipulation +of <classname>__gnu_parallel::_Settings</classname> member data. +</para> + +<para> +First off, the choice of parallelization strategy: serial, parallel, +or implementation-deduced. This corresponds +to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a +value of enum <type>__gnu_parallel::_AlgorithmStrategy</type> +type. Choices +include: <type>heuristic</type>, <type>force_sequential</type>, +and <type>force_parallel</type>. The default is +implementation-deduced, ie <type>heuristic</type>. +</para> + + +<para> +Next, the sub-choices for algorithm implementation. Specific +algorithms like <function>find</function> or <function>sort</function> +can be implemented in multiple ways: when this is the case, +a <classname>__gnu_parallel::_Settings</classname> member exists to +pick the default strategy. For +example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can +have any values of +enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>, +or <type>QS_BALANCED</type>. +</para> + +<para> +Likewise for setting the minimal threshold for algorithm +paralleization. Parallelism always incurs some overhead. Thus, it is +not helpful to parallelize operations on very small sets of +data. Because of this, measures are taken to avoid parallelizing below +a certain, pre-determined threshold. For each algorithm, a minimum +problem size is encoded as a variable in the +active <classname>__gnu_parallel::_Settings</classname> object. This +threshold variable follows the following naming scheme: +<code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>. So, +for <function>fill</function>, the threshold variable +is <code>__gnu_parallel::_Settings::fill_minimal_n</code> </para> <para> -To specify the number of threads to be used for an algorithm, -use <code>omp_set_num_threads</code>. -To force a function to execute sequentially, -even though parallelism is switched on in general, -add <code>__gnu_parallel::sequential_tag()</code> -to the end of the argument list. +Finally, hardware details like L1/L2 cache size can be hardwired +via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends. </para> <para> -Parallelism always incurs some overhead. Thus, it is not -helpful to parallelize operations on very small sets of data. -There are measures to avoid parallelizing stuff that is not worth it. -For each algorithm, a minimum problem size can be stated, -usually using the variable -<code>__gnu_parallel::Settings::[algorithm]_minimal_n</code>. -Please see <ulink url="latest-doxygen/settings_8h.html"> -<code><settings.h></code></ulink> for details.</para> +All these configuration variables can be changed by the user, if +desired. Please +see <ulink url="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a00640.html"><filename class="headerfile">settings.h</filename></ulink> +for complete details. +</para> + +<para> +A small example of tuning the default: +</para> + +<programlisting> +#include <parallel/algorithm> +#include <parallel/settings.h> + +int main() +{ + __gnu_parallel::_Settings s; + s.algorithm_strategy = __gnu_parallel::force_parallel; + __gnu_parallel::_Settings::set(s); + + // Do work... all algorithms will be parallelized, always. + + return 0; +} +</programlisting> +</sect3> </sect2> <sect2 id="manual.ext.parallel_mode.design.impl" xreflabel="Impl"> <title>Implementation Namespaces</title> -<para> One namespace contain versions of code that are explicitly sequential: +<para> One namespace contain versions of code that are always +explicitly sequential: <code>__gnu_serial</code>. </para> @@ -548,10 +688,8 @@ Please see <ulink url="latest-doxygen/settings_8h.html"> <para> Parallel implementations of standard components, including template helpers to select parallelism, are defined in <code>namespace -std::__parallel</code>. For instance, <code>std::transform</code> from -<algorithm> has a parallel counterpart in -<code>std::__parallel::transform</code> from -<parallel/algorithm>. In addition, these parallel +std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in +<function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel implementations are injected into <code>namespace __gnu_parallel</code> with using declarations. </para> @@ -588,7 +726,7 @@ the generated source documentation. <para> The log and summary files for conformance testing are in the - <code>testsuite/parallel</code> directory. + <filename class="directory">testsuite/parallel</filename> directory. </para> <para> @@ -596,13 +734,13 @@ the generated source documentation. </para> <screen> - <userinput>check-performance-parallel</userinput> + <userinput>make check-performance-parallel</userinput> </screen> <para> The result file for performance testing are in the - <code>testsuite</code> directory, in the file - <code>libstdc++_performance.sum</code>. In addition, the + <filename class="directory">testsuite</filename> directory, in the file + <filename>libstdc++_performance.sum</filename>. In addition, the policy-based containers have their own visualizations, which have additional software dependencies than the usual bare-boned text file, and can be generated by using the <code>make |