diff options
author | yabinc <yabinc@google.com> | 2024-09-24 19:06:20 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-09-24 19:06:20 -0700 |
commit | 7a086e1b2dc05f54afae3591614feede727601fa (patch) | |
tree | e15bd1089946e0352c8a2d9034c7c17ee3632002 /clang/lib/CodeGen/CodeGenModule.h | |
parent | 0a42c7c6679bcc6f7be4b3d103670197acac96a9 (diff) | |
download | llvm-7a086e1b2dc05f54afae3591614feede727601fa.zip llvm-7a086e1b2dc05f54afae3591614feede727601fa.tar.gz llvm-7a086e1b2dc05f54afae3591614feede727601fa.tar.bz2 |
[clang][CodeGen] Zero init unspecified fields in initializers in C (#97121)
When an initializer is provided to a variable, the Linux kernel relied
on the compiler to zero-initialize unspecified fields, as clarified in
https://www.spinics.net/lists/netdev/msg1007244.html.
But clang doesn't guarantee this:
1. For a union type, if an empty initializer is given, clang only
initializes bytes for the first field, left bytes for other (larger)
fields are marked as undef. Accessing those undef bytes can lead
to undefined behaviors.
2. For a union type, if an initializer explicitly sets a field, left
bytes for other (larger) fields are marked as undef.
3. When an initializer is given, clang doesn't zero initialize padding.
So this patch makes the following change:
1. In C, when an initializer is provided for a variable, zero-initialize
undef and padding fields in the initializer.
2. Document the change in LanguageExtensions.rst.
As suggested in
https://github.com/llvm/llvm-project/issues/78034#issuecomment-2183437928,
the change isn't required by C23, but it's standards conforming to do
so.
Fixes: https://github.com/llvm/llvm-project/issues/97459
Diffstat (limited to 'clang/lib/CodeGen/CodeGenModule.h')
-rw-r--r-- | clang/lib/CodeGen/CodeGenModule.h | 51 |
1 files changed, 51 insertions, 0 deletions
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h index c58bb88..fcdfef0 100644 --- a/clang/lib/CodeGen/CodeGenModule.h +++ b/clang/lib/CodeGen/CodeGenModule.h @@ -1676,6 +1676,57 @@ public: MustTailCallUndefinedGlobals.insert(Global); } + bool shouldZeroInitPadding() const { + // In C23 (N3096) $6.7.10: + // """ + // If any object is initialized with an empty iniitializer, then it is + // subject to default initialization: + // - if it is an aggregate, every member is initialized (recursively) + // according to these rules, and any padding is initialized to zero bits; + // - if it is a union, the first named member is initialized (recursively) + // according to these rules, and any padding is initialized to zero bits. + // + // If the aggregate or union contains elements or members that are + // aggregates or unions, these rules apply recursively to the subaggregates + // or contained unions. + // + // If there are fewer initializers in a brace-enclosed list than there are + // elements or members of an aggregate, or fewer characters in a string + // literal used to initialize an array of known size than there are elements + // in the array, the remainder of the aggregate is subject to default + // initialization. + // """ + // + // From my understanding, the standard is ambiguous in the following two + // areas: + // 1. For a union type with empty initializer, if the first named member is + // not the largest member, then the bytes comes after the first named member + // but before padding are left unspecified. An example is: + // union U { int a; long long b;}; + // union U u = {}; // The first 4 bytes are 0, but 4-8 bytes are left + // unspecified. + // + // 2. It only mentions padding for empty initializer, but doesn't mention + // padding for a non empty initialization list. And if the aggregation or + // union contains elements or members that are aggregates or unions, and + // some are non empty initializers, while others are empty initiailizers, + // the padding initialization is unclear. An example is: + // struct S1 { int a; long long b; }; + // struct S2 { char c; struct S1 s1; }; + // // The values for paddings between s2.c and s2.s1.a, between s2.s1.a + // and s2.s1.b are unclear. + // struct S2 s2 = { 'c' }; + // + // Here we choose to zero initiailize left bytes of a union type. Because + // projects like the Linux kernel are relying on this behavior. If we don't + // explicitly zero initialize them, the undef values can be optimized to + // return gabage data. We also choose to zero initialize paddings for + // aggregates and unions, no matter they are initialized by empty + // initializers or non empty initializers. This can provide a consistent + // behavior. So projects like the Linux kernel can rely on it. + return !getLangOpts().CPlusPlus; + } + private: bool shouldDropDLLAttribute(const Decl *D, const llvm::GlobalValue *GV) const; |