aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/ParallelCG.cpp
AgeCommit message (Collapse)AuthorFilesLines
2021-01-29[LTO] Update splitCodeGen to take a reference to the module. (NFC)Florian Hahn1-8/+6
splitCodeGen does not need to take ownership of the module, as it currently clones the original module for each split operation. There is an ~4 year old fixme to change that, but until this is addressed, the function can just take a reference to the module. This makes the transition of LTOCodeGenerator to use LTOBackend a bit easier, because under some circumstances, LTOCodeGenerator needs to write the original module back after codegen. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95222
2020-11-21[llvm][clang][mlir] Add checks for the return values from Target::createXXX ↵Ella Ma1-0/+2
to prevent protential null deref All these potential null pointer dereferences are reported by my static analyzer for null smart pointer dereferences, which has a different implementation from `alpha.cplusplus.SmartPtr`. The checked pointers in this patch are initialized by Target::createXXX functions. When the creator function pointer is not correctly set, a null pointer will be returned, or the creator function may originally return a null pointer. Some of them may not make sense as they may be checked before entering the function, but I fixed them all in this patch. I submit this fix because 1) similar checks are found in some other places in the LLVM codebase for the same return value of the function; and, 2) some of the pointers are dereferenced before they are checked, which may definitely trigger a null pointer dereference if the return value is nullptr. Reviewed By: tejohnson, MaskRay, jpienaar Differential Revision: https://reviews.llvm.org/D91410
2020-02-14[Support] On Windows, ensure hardware_concurrency() extends to all CPU ↵Alexandre Ganea1-1/+1
sockets and all NUMA groups The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket. == Background == Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads. By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to. This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market. == The problem == The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std::thread::hardware_concurrency() -- which can only return processors from the current "processor group". == The changes in this patch == To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO). When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead. The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used. When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once. Differential Revision: https://reviews.llvm.org/D71775
2019-11-13Move CodeGenFileType enum to Support/CodeGen.hReid Kleckner1-2/+2
Avoids the need to include TargetMachine.h from various places just for an enum. Various other enums live here, such as the optimization level, TLS model, etc. Data suggests that this change probably doesn't matter, but it seems nice to have anyway.
2019-01-19Update the file headers across all of the LLVM projects in the monorepoChandler Carruth1-4/+3
to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
2018-05-21CodeGen: Add a dwo output file argument to addPassesToEmitFile and hook it ↵Peter Collingbourne1-1/+1
up to dwo output. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47089 llvm-svn: 332881
2018-02-14Pass a reference to a module to the bitcode writer.Rafael Espindola1-2/+2
This simplifies most callers as they are already using references or std::unique_ptr. llvm-svn: 325155
2017-12-13Remove redundant includes from lib/CodeGen.Michael Zolotukhin1-1/+0
llvm-svn: 320619
2016-11-13Bitcode: Change module reader functions to return an llvm::Expected.Peter Collingbourne1-1/+1
Differential Revision: https://reviews.llvm.org/D26562 llvm-svn: 286752
2016-11-11Split Bitcode/ReaderWriter.h into separate reader and writer headersTeresa Johnson1-1/+2
Summary: Split ReaderWriter.h which contains the APIs into both the BitReader and BitWriter libraries into BitcodeReader.h and BitcodeWriter.h. This is to address Chandler's concern about sharing the same API header between multiple libraries (BitReader and BitWriter). That concern is why we create a single bitcode library in our downstream build of clang, which led to r286297 being reverted as it added a dependency that created a cycle only when there is a single bitcode library (not two as in upstream). Reviewers: mehdi_amini Subscribers: dlj, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D26502 llvm-svn: 286566
2016-06-17Apply another batch of fixes from clang-tidy's ↵Benjamin Kramer1-4/+3
performance-unnecessary-value-param. Contains some manual fixes. No functionality change intended. llvm-svn: 273047
2016-04-17[ParallelCG] SmallVector<char> -> SmallString.Davide Italiano1-2/+2
llvm-svn: 266568
2016-04-17Keep only the splitCodegen version that takes a factory.Rafael Espindola1-19/+0
This makes it much easier to see that all created TargetMachines are equivalent. llvm-svn: 266564
2016-04-15[ParallelCG] Add a new splitCodeGen() API which takes a TargetMachineFactory.Davide Italiano1-21/+27
This is a recommit of r266390 with a fix that will allow tests to pass (hopefully). Before we got a StringRef to M->getTargetTriple() and right after we moved the Module so we were referencing a dangling object. llvm-svn: 266456
2016-04-15Revert "[LTO] Add a new splitCodeGen() API which takes a TargetMachineFactory."Davide Italiano1-26/+20
This reverts commits r266390 and r266396 as they broke some bots. llvm-svn: 266408
2016-04-15[LTO] Add a new splitCodeGen() API which takes a TargetMachineFactory.Davide Italiano1-20/+26
This will be used in lld to avoid creating TargetMachine in two different places. See D18999 for a more detailed discussion. Differential Revision: http://reviews.llvm.org/D19139 llvm-svn: 266390
2016-04-06[gold] Save bitcode for module partitions (save-temps + split codegen).Evgeniy Stepanov1-9/+17
llvm-svn: 265583
2016-03-04Change split code gen to use ThreadPoolTeresa Johnson1-32/+40
Part of D15390. llvm-svn: 262719
2016-01-18 Add to the split module utility an SCC based method which allows not to ↵Sergei Larin1-2/+3
globalize any local variables. Summary: Currently llvm::SplitModule as the first step globalizes all local objects, which might not be desirable in some scenarios. This change adds a new flag to llvm::SplitModule that uses SCC approach to search for a balanced partition without the need to externalize symbols. Such partition might not be possible or fully balanced for a given number of partitions, and is a function of the module properties (global/local dependencies within the module). Joint development Tobias Edler von Koch (tobias@codeaurora.org) and Sergei Larin (slarin@codeaurora.org) Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D16124 llvm-svn: 258083
2015-11-19[LTO] Add option to emit assembly from LTOCodeGeneratorTobias Edler von Koch1-7/+8
This adds a new API, LTOCodeGenerator::setFileType, to choose the output file format for LTO CodeGen. A corresponding change to use this new API from llvm-lto and a test case is coming in a separate commit. Differential Revision: http://reviews.llvm.org/D14554 llvm-svn: 253622
2015-08-31Support: Support LLVM_ENABLE_THREADS=0 in llvm/Support/thread.h.Peter Collingbourne1-2/+2
Specifically, the header now provides llvm::thread, which is either a typedef of std::thread or a replacement that calls the function synchronously depending on the value of LLVM_ENABLE_THREADS. llvm-svn: 246402
2015-08-27CodeGen: Introduce splitCodeGen and teach LTOCodeGenerator to use it.Peter Collingbourne1-0/+95
llvm::splitCodeGen is a function that implements the core of parallel LTO code generation. It uses llvm::SplitModule to split the module into linkable partitions and spawning one code generation thread per partition. The function produces multiple object files which can be linked in the usual way. This has been threaded through to LTOCodeGenerator (and llvm-lto for testing purposes). Separate patches will add parallel LTO support to the gold plugin and lld. Differential Revision: http://reviews.llvm.org/D12260 llvm-svn: 246236