aboutsummaryrefslogtreecommitdiff
path: root/libctf/ctf-dedup.c
AgeCommit message (Collapse)AuthorFilesLines
13 dayslibctf: dedup: fix datasec entry populationNick Alcock1-1/+1
We were failing to increment component_idx when acquiring datasec info, meaning that all our variable population got the datasec info from the first entry in that datasec.
13 dayslibctf: dedup: unify extern and non-extern variables and functions (DO NOT ↵Nick Alcock1-23/+178
SQUASH) If one TU has an extern definition of a variable or function, and another has a global definition that is otherwise identical (same type, same name), we'd like to unify the two: they are not conflicting types. But a similar static definition *is* conflicting. This is quite simple for functions: introduce a new cd_linkages that contains entries giving the "linkage we'd like to emit" for a given hash: bump it from extern to global-allocated when one is seen, and use that as the linkage at emission time if present. When hashing, promote extern to non-extern always. But for variables this is more complex, because non-extern variables come with datasec info which is hashed in as well: so externs and the non-externs we'd like to emit instead have *different hashes* and are seen as conflicting types. To fix this, introduce a new cd_replacing_hashes which maps from hashes we'd like to ignore if seen to hashes that replace them: populate it when a global-allocated variable is seen, using the initial portion of the hash (shared with extern vars) as the hash value to replace. Use this to bump up name counts at conflict-marking time; avoid marking the replaced hash as conflicting if we are also avoiding marking the hash it replaces as conflicting.
13 dayslibctf: dedup: fix indentationNick Alcock1-39/+39
13 dayslibctf: dedup: conflicting types are non-root-visibleNick Alcock1-1/+4
When we encounter a conflicting type in cu-mapped mode, it's not enough to just add it as root-visible if its source type was and then mark it conflicting later: it might have a conflicting name, in which case addition will fail (if it's one of those kinds for which conflicting names are a cause for failure, which is most of them.) So mark such types non-root-visible explicitly as well.
2025-04-02libctf: dedup: hash: fix stub detectionNick Alcock1-1/+1
When calling a pure function, it helps to assign the result to something!
2025-04-01libctf: more output format controlNick Alcock1-4/+8
The new API function ctf_link_output_is_btf lets you determine whether the result of a ctf_link is likely to be written out as CTF or BTF before the write takes place: the new is_btf argument to ctf_link_write lets you find out whether it actually was (things like compression that only ctf_link_write is told about may cause last-minute changes in the decision). This requires us to split preserialization in two, with the portion that determines whether a serialized dict is BTF-compatible or not moving into a new internal function ctf_serialize_output_format, called by ctf_link_output_is_btf. We move the state used to communicate between serialization passes into a new sub-struct at the same time.
2025-03-27libctf: dedup: don't use low-level mechanisms to get the type kindNick Alcock1-1/+1
We should be looking at the visible kind, not the format-internal kind: in particular, we want forwards to enums to appear to be CTF_K_FORWARDs.
2025-03-27libctf: dedup: drop debugging left in by mistakeNick Alcock1-2/+0
2025-03-27libctf: variables need not be in datasecsNick Alcock1-44/+46
Datasecs are only used for non-extern variables (those in ELF sections). Variables added by ctf_add_variable() should not be in any datasec at all. (This means we have to drop a minor optimization where variables not mentioned in ctf_var_datasecs were assumed to be in the most-numerous one, since they might be in none at all; not a great loss.)
2025-03-25libctf: dedup: don't emit decl tags to structure members twiceNick Alcock1-0/+2
An accidental fallthrough.
2025-03-25libctf: suppress external strings when serializing as BTFNick Alcock1-2/+2
BTF doesn't have a concept of external strings in the ELF strtab, so even if the ELF linker has reported symbols, we must ignore them when doing BTF serialization. (We keep the strings around, so that if a later attempt is made to serialize as CTF we can still use them and generate external string refs as before, though this will be of limited effectiveness because the duplicated strings will already have been baked into the strtab. Serializing a dict in an ELF executable as BTF and then CTF right afterwards is rare in any case: normally such things get serialized by the linker, which does it only once per dict.)
2025-03-24libctf: fix CTF_K_FUNC_LINKAGENick Alcock1-0/+2
This has its linkage not in the vlen region, but *stuffed into the actual vlen in the info word* (why this is inconsistent with the way CTF_K_VAR works, I have no idea).
2025-03-24libctf: dedup: fix datasec trackingNick Alcock1-4/+5
CTF_DEDUP_GID() takes input *numbers*, not inputs (which are pointers).
2025-03-24libctf: dedup: initialize all the decorated-names arraysNick Alcock1-2/+2
2025-03-24libctf: dedup: comment improvementsNick Alcock1-6/+7
2025-03-24libctf: dedup: support cu-mapped emission into the parent dict (no children)Nick Alcock1-3/+6
This lets a CU added via ctf_link_add_cu_mapping (fp, foo, "") get emitted directly into the shared dict, with conflicting types going into hidden members (like conflicting types in modules in the cu-mapped portions do). This is the layout in-kernel BTF and pahole want.
2025-03-20libctf: lots and lots of compilation error fixesNick Alcock1-35/+59
More to come.
2025-03-20libctf: ctf-dedup: changes for CTFv4/BTFNick Alcock1-94/+820
Basically this consists of small tweaks to emit the new stuff, plus two nasty cases: - datasecs are skipped: instead, we compute a backmapping from var to datasec component in cd_var_datasec and chase this to hash the datasec name and info for each specific variable into the hash for that variable, then use that to emit both at once when the var is emitted (always into the same dict). TODO: this probably gets the offsets wrong: ctf_add_section_variable() and/or serialization must sort them after all. Dammit. - decl tags to struct/union members (and to structs/unions in general) have two separate nasty problems. The first is that we have to mark them conflicting if their associated struct is conflicting, but traversal from types to their parents halts at tagged structs and unions, because the type graph is sharded via stubs at those points and conflictedness ceases. But we don't want to do that here: a decl_tag to member 10 of some struct is only valid if that struct *has* ten members, and if the struct is conflicted, some may have only one. The decl tag is only valid for the specific struct-with-ten-members it was originally pointing at, anyway: other structs-with-ten-members may have entirely different members there, which are not tagged or which are tagged with something else.) So we track this by keeping track of the only thing that is knowable about struct/union stubs: their decorated name. The citers graph gains mappings from decorated SoU names to decl tags, and conflictedness marking chases that and marks accordingly. - The second problem is that we have to emit decl tags to struct members after the members are emitted, but the members are emitted late because they might refer to any types in the dict, including types added after the struct was added. So we need to accumulate decl tags to struct members in cd_emission_struct_decl_tags and add yet *another* pass that traverses that and emits all the decl tags in it. ctf_link is not yet revised: at the very least it should stop trying to emit variables itself -- but it does need to add data symbols if the variable section is not omitted, in addition to figuring out whether we want to delete existing variables if it *is* omitted. This stuff can partly wait until later: the variables section is never omitted for kernel links.
2025-03-20libctf: dedup: delete a wildly obsolete commentNick Alcock1-2/+1
ctf_dedup_rhash_type always hashes the type it's called for: the possibility to return some random other hash via an override has been gone for so long that it never even made it upstream.
2025-03-20libctf: last bit of ctf-types.cNick Alcock1-1/+1
ctf_tag, ctf_tag_next; dropping type/decl tag lookup from ctf_lookup_by_name (we can't get it for free because the name tables are unusually structured, and with multiple IDs mapping to a given tag it's not clear what we could return in any case); supporting void * (almost no changes, just a tweak to ctf_type_compat() to note that void * is assignment-compatible with itself); plus a tiny tweak in the deduplicator's error handling spotted while checking uses of ctf_dynhash_insert. (Will all be squashed together and split up in different directions in future anyway.)
2025-03-16Tiny stylistic spacing and comment tweaksNick Alcock1-1/+1
2025-02-28libctf: drop LCTF_TYPE_ISPARENT/LCTF_TYPE_ISCHILDNick Alcock1-2/+2
Parent/child determination is about to become rather more complex, making a macro impractical. Use the ctf_type_isparent/ischild function calls everywhere and remove the macro. Make them more const-correct too, to make them more widely usable. While we're about it, change several places that hand-implemented ctf_get_dict() to call it instead, and armour several functions against the null returns that were always possible in this case (but previously unprotected-against).
2025-02-28libctf, string: delete separate movable ref storage againNick Alcock1-12/+5
This was added last year to let us maintain a backpointer to the movable refs dynhash in movable ref atoms without spending space for the backpointer on the majority of (non-movable) refs and also without causing an atom which had some refs movable and some refs not movable to dereference unallocated storage when freed. The backpointer's only purpose was to let us locate the ctf_str_movable_refs dynhash during item freeing, when we had nothing but a pointer to the atom being freed. Now we have a proper freeing arg, we don't need the backpointer at all: we can just pass a pointer to the dict in to the atoms dynhash as a freeing arg for the atom freeing functions, and throw the whole backpointer and separate movable ref list complexity away.
2025-02-28libctf: dedup: describe 'citer'Nick Alcock1-0/+2
The distinction between the citer and citers variables in ctf_dedup_rhash_type is somewhat opaque (it's a micro-optimization to avoid having to allocate entire sets when we know in advance that we'll only have to store one value). Add a comment. libctf/ * ctf-dedup.c (ctf_dedup_rhash_type): Comment on citers variables.
2025-02-28libctf: dedup: add strtab deduplicatorNick Alcock1-9/+164
This is a pretty simple two-phase process (count duplicates that are actually going to end up in the strtab and aren't e.g. strings without refs, strings with external refs etc, and move them into the parent) with one wrinkle: we sorta-abuse the csa_external_offset field in the deduplicated child atom (normally used to indicate that this string is located in the ELF strtab) to indicate that this atom is in the *parent*. If you think of "external" as meaning simply "is in some other strtab, we don't care which one", this still makes enough sense to not need to change the name, I hope. This is still not called from anywhere, so strings are (still!) not deduplicated, and none of the dedup machinery added in earlier commits does anything yet. libctf/ * ctf-dedup.c (ctf_dedup_emit_struct_members): Note that strtab dedup happens (well) after struct member emission. (ctf_dedup_strings): New. * ctf-impl.h (ctf_dedup_strings): Declare.
2025-01-01Update year range in copyright notice of binutils filesAlan Modra1-1/+1
2024-07-31libctf, include: add ctf_dict_set_flag: less enum dup checking by defaultNick Alcock1-0/+1
The recent change to detect duplicate enum values and return ECTF_DUPLICATE when found turns out to perturb a great many callers. In particular, the pahole-created kernel BTF has the same problem we historically did, and gleefully emits duplicated enum constants in profusion. Handling the resulting duplicate errors from BTF -> CTF converters reasonably is unreasonably difficult (it amounts to forcing them to skip some types or reimplement the deduplicator). So let's step back a bit. What we care about mostly is that the deduplicator treat enums with conflicting enumeration constants as conflicting types: programs that want to look up enumeration constant -> value mappings using the new APIs to do so might well want the same checks to apply to any ctf_add_* operations they carry out (and since they're *using* the new APIs, added at the same time as this restriction was imposed, there is likely to be no negative consequence of this). So we want some way to allow processes that know about duplicate detection to opt into it, while allowing everyone else to stay clear of it: but we want ctf_link to get this behaviour even if its caller has opted out. So add a new concept to the API: dict-wide CTF flags, set via ctf_dict_set_flag, obtained via ctf_dict_get_flag. They are not bitflags but simple arbitrary integers and an on/off value, stored in an unspecified manner (the one current flag, we translate into an LCTF_* flag value in the internal ctf_dict ctf_flags word). If you pass in an invalid flag or value you get a new ECTF_BADFLAG error, so the caller can easily tell whether flags added in future are valid with a particular libctf or not. We check this flag in ctf_add_enumerator, and set it around the link (including on child per-CU dicts). The newish enumerator-iteration test is souped up to check the semantics of the flag as well. The fact that the flag can be set and unset at any time has curious consequences. You can unset the flag, insert a pile of duplicates, then set it and expect the new duplicates to be detected, not only by ctf_add_enumerator but also by ctf_lookup_enumerator. This means we now have to maintain the ctf_names and conflicting_enums enum-duplication tracking as new enums are added, not purely as the dict is opened. Move that code out of init_static_types_internal and into a new ctf_track_enumerator function that addition can also call. (None of this affects the file format or serialization machinery, which has to be able to handle duplicate enumeration constants no matter what.) include/ * ctf-api.h (CTF_ERRORS) [ECTF_BADFLAG]: New. (ECTF_NERR): Update. (CTF_STRICT_NO_DUP_ENUMERATORS): New flag. (ctf_dict_set_flag): New function. (ctf_dict_get_flag): Likewise. libctf/ * ctf-impl.h (LCTF_STRICT_NO_DUP_ENUMERATORS): New flag. (ctf_track_enumerator): Declare. * ctf-dedup.c (ctf_dedup_emit_type): Set it. * ctf-link.c (ctf_create_per_cu): Likewise. (ctf_link_deduplicating_per_cu): Likewise. (ctf_link): Likewise. (ctf_link_write): Likewise. * ctf-subr.c (ctf_dict_set_flag): New function. (ctf_dict_get_flag): New function. * ctf-open.c (init_static_types_internal): Move enum tracking to... * ctf-create.c (ctf_track_enumerator): ... this new function. (ctf_add_enumerator): Call it. * libctf.ver: Add the new functions. * testsuite/libctf-lookup/enumerator-iteration.c: Test them.
2024-07-31libctf: dedup: tiny tweaksNick Alcock1-4/+3
Drop an unnecessary variable, and fix a buggy comment. No effect on generated code. libctf/ * ctf-dedup.c (ctf_dedup_detect_name_ambiguity): Drop unnecessary variable. (ctf_dedup_rwalk_output_mapping): Fix comment.
2024-07-31libctf: fix linking of non-root-visible typesNick Alcock1-2/+4
If you deduplicate non-root-visible types, the resulting type should still be non-root-visible! We were promoting all such types to root-visible, and re-demoting them only if their names collided (which might happen on cu-mapped links if multiple compilation units with conflicting types are fused into one child dict). This "worked" before now, in that linking at least didn't fail (if you don't mind having your non-root flag value destroyed if you're adding non-root-visible types), but now that conflicting enumerators cause their containing enums to become conflicted (enums which might have *different names*), this caused the linker to crash when it hit two enumerators with conflicting values. Not testable in ld because cu-mapped links are not exposed to ld, but can be tested via direct creation of libraries and calls to ctf_link directly. (This also tests the ctf_dump non-root type printout, which before now was untested.) libctf/ * ctf-dedup.c (ctf_dedup_emit_type): Non-root-visible input types should be emitted as non-root-visible output types. * testsuite/libctf-writable/ctf-nonroot-linking.c: New test. * testsuite/libctf-writable/ctf-nonroot-linking.lk: New test.
2024-07-31libctf, dedup: drop unnecessary arg from ctf_dedup()Nick Alcock1-32/+26
The PARENTS arg is carefully passed down through all the layers of hash functions and then never used for anything. (In the distant past it was used for cycle detection, but the algorithm eventually committed doesn't need to do cycle detection...) The PARENTS arg is still used by ctf_dedup_emit(), but even there we can loosen the requirements and state that you can just leave entries corresponding to dicts with no parents at zero (which will be useful in an upcoming commit). libctf/ * ctf-dedup.c (ctf_dedup_hash_type): Drop PARENTS arg. (ctf_dedup_rhash_type): Likewise. (ctf_dedup): Likewise. (ctf_dedup_emit_struct_members): Mention what you can do to PARENTS entries for parent dicts. * ctf-impl.h (ctf_dedup): Adjust accordingly. * ctf-link.c (ctf_link_deduplicating_per_cu): Likewise. (ctf_link_deduplicating): Likewise.
2024-06-18libctf: dedup: enums with overlapping enumerators are conflictingNick Alcock1-6/+36
The CTF deduplicator was not considering enumerators inside enum types to be things that caused type conflicts, so if the following two TUs were linked together, you would end up with the following in the resulting dict: 1.c: enum foo { A, B }; 2.c: enum bar { A, B }; linked: enum foo { A, B }; enum bar { A, B }; This does work -- but it's not something that's valid C, and the general point of the shared dict is that it is something that you could potentially get from any valid C TU. So consider such types to be conflicting, but obviously don't consider actually identical enums to be conflicting, even though they too have (all) their identifiers in common. This involves surprisingly little code. The deduplicator detects conflicting types by counting types in a hash table of hash tables: decorated identifier -> (type hash -> count) where the COUNT is the number of times a given hash has been observed: any name with more than one hash associated with it is considered conflicting (the count is used to identify the most common such name for promotion to the shared dict). Before now, those identifiers were all the identifiers of types (possibly decorated with their namespace on the front for enumerator identifiers), but we can equally well put *enumeration constant names* in there, undecorated like the identifiers of types in the global namespace, with the type hash being the hash of each enum containing that enumerator. The existing conflicting-type-detection code will then accurately identify distinct enums with enumeration constants in common. The enum that contains the most commonly-appearing enumerators will be promoted to the shared dict. libctf/ * ctf-impl.h (ctf_dedup_t) <cd_name_counts>: Extend comment. * ctf-dedup.c (ctf_dedup_count_name): New, split out of... (ctf_dedup_populate_mappings): ... here. Call it for all * enumeration constants in an enum as well as types. ld/ * testsuite/ld-ctf/enum-3.c: New test CTF. * testsuite/ld-ctf/enum-4.c: Likewise. * testsuite/ld-ctf/overlapping-enums.d: New test. * testsuite/ld-ctf/overlapping-enums-2.d: Likewise.
2024-04-19libctf: don't pass errno into ctf_err_warn so oftenNick Alcock1-4/+4
The libctf-internal warning function ctf_err_warn() can be passed a libctf errno as a parameter, and will add its textual errmsg form to the passed-in error message. But if there is an error on the fp already, and this is specifically an error and not a warning, ctf_err_warn() will print the error out regardless: there's no need to pass in anything but 0. There are still a lot of places where we do ctf_err_warn (fp, 0, EFOO, ...); return ctf_set_errno (fp, 0, EFOO); I've left all of those alone, because fixing it makes the code a bit longer: but fixing the cases where no return is involved and the error has just been set on the fp itself costs nothing and reduces redundancy a bit. libctf/ * ctf-dedup.c (ctf_dedup_walk_output_mapping): Drop the errno arg. (ctf_dedup_emit): Likewise. (ctf_dedup_type_mapping): Likewise. * ctf-link.c (ctf_create_per_cu): Likewise. (ctf_link_deduplicating_close_inputs): Likewise. (ctf_link_deduplicating_one_symtypetab): Likewise. (ctf_link_deduplicating_per_cu): Likewise. * ctf-lookup.c (ctf_lookup_symbol_idx): Likewise. * ctf-subr.c (ctf_assert_fail_internal): Likewise.
2024-01-04Update year range in copyright notice of binutils filesAlan Modra1-1/+1
Adds two new external authors to etc/update-copyright.py to cover bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then updates copyright messages as follows: 1) Update cgen/utils.scm emitted copyrights. 2) Run "etc/update-copyright.py --this-year" with an extra external author I haven't committed, 'Kalray SA.', to cover gas testsuite files (which should have their copyright message removed). 3) Build with --enable-maintainer-mode --enable-cgen-maint=yes. 4) Check out */po/*.pot which we don't update frequently.
2023-10-17libctf: Sanitize error types for PR 30836Torbjörn SVENSSON1-16/+9
Made sure there is no implicit conversion between signed and unsigned return value for functions setting the ctf_errno value. An example of the problem is that in ctf_member_next, the "offset" value is either 0L or (ctf_id_t)-1L, but it should have been 0L or -1L. The issue was discovered while building a 64 bit ld binary to be executed on the Windows platform. Example object file that demonstrates the issue is attached in the PR. libctf/ Affected functions adjusted. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com> Co-Authored-By: Yvan ROUX <yvan.roux@foss.st.com>
2023-03-24libctf: fix a comment typoNick Alcock1-1/+1
ctf_dedup's intern() function does not return a dynamically allocated string, so I just spent ten minutes auditing for obvious memory leaks that couldn't actually happen. Update the comment to note what it actually returns (a pointer into an atoms table: i.e. possibly not a new string, and not so easily leakable). libctf/ * ctf-dedup.c (intern): Update comment.
2023-03-24libctf: fix assertion failure with no system qsort_rNick Alcock1-0/+4
If no suitable qsort_r is found in libc, we fall back to an implementation in ctf-qsort.c. But this implementation routinely calls the comparison function with two identical arguments. The comparison function that ensures that the order of output types is stable is not ready for this, misinterprets it as a type appearing more that once (a can-never-happen condition) and fails with an assertion failure. Fixed, audited for further instances of the same failure (none found) and added a no-qsort test to my regular testsuite run. libctf/: PR libctf/30013 * ctf-dedup.c (sort_output_mapping): Inputs are always equal to themselves.
2023-01-01Update year range in copyright notice of binutils filesAlan Modra1-1/+1
The newer update-copyright.py fixes file encoding too, removing cr/lf on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
2022-06-21libctf: fix linking together multiple objects derived from the same sourceNick Alcock1-0/+2
Right now, if you compile the same .c input repeatedly with CTF enabled and different compilation flags, then arrange to link all of these together, then things misbehave in various ways. libctf may conflate either inputs (if the .o files have the same name, say if they are stored in different .a archives), or per-CU outputs when conflicting types are found: the latter can lead to entirely spurious errors when it tries to produce multiple per-CU outputs with the same name (discarding all but the last, but then looking for types in the earlier ones which have just been thrown away). Fixing this is multi-pronged. Both inputs and outputs need to be differentiated in the hashtables libctf keeps them in: inputs with the same cuname and filename need to be considered distinct as long as they have different associated CTF dicts, and per-CU outputs need to be considered distinct as long as they have different associated input dicts. Right now there is nothing tying the two together other than the CU name: fix this by introducing a new field in the ctf_dict_t named ctf_link_in_out, which (for input dicts) points to the associated per-CU output dict (if any), and for output dicts points to the associated input dict. At creation time the name used is completely arbitrary: it's only important that it be distinct if CTF dicts are distinct. So, when a clash is found, adjust the CU name by sticking the number of elements in the input on the end. At output time, the CU name will appear in the linked object, so it matters a little more that it look slightly less ugly: in conflicting cases, append an incrementing integer, starting at 0. This naming scheme is not very helpful, but it's hard to see what else we can do. The input .o name may be the same. The input .a name is not even visible to ctf_link, and even *that* might be the same, because .a's can contain many members with the same name, all of which participate in the link. All we really know is that the two have distinct dictionaries with distinct types in them, and at least this way they are all represented, any any symbols, variables etc referring to those types are accurately stored. (As a side-effect this also fixes a use-after-free and double-free when errors are found during variable or symbol emission.) Use the opportunity to prevent a couple of sources of problems, to wit changing the active CU mappings when a link has already been done (no effect on ld, which doesn't use CU mappings at all), and causing multiple consecutive ctf_link's to have the same net effect as just doing the last one (no effect on ld, which only ever does one ctf_link) rather than having the links be a sort of half-incremental not-really-intended mess. libctf/ChangeLog: PR libctf/29242 * ctf-impl.h (struct ctf_dict) [ctf_link_in_out]: New. * ctf-dedup.c (ctf_dedup_emit_type): Set it. * ctf-link.c (ctf_link_add_ctf_internal): Set the input CU name uniquely when clashes are found. (ctf_link_add): Document what repeated additions do. (ctf_new_per_cu_name): New, come up with a consistent name for a new per-CU dict. (ctf_link_deduplicating): Use it. (ctf_create_per_cu): Use it, and ctf_link_in_out, and set ctf_link_in_out properly. Don't overwrite per-CU dicts with per-CU dicts relating to different inputs. (ctf_link_add_cu_mapping): Prevent per-CU mappings being set up if we already have per-CU outputs. (ctf_link_one_variable): Adjust ctf_link_per_cu call. (ctf_link_deduplicating_one_symtypetab): Likewise. (ctf_link_empty_outputs): New, delete all the ctf_link_outputs and blank out ctf_link_in_out on the corresponding inputs. (ctf_link): Clarify the effect of multiple ctf_link calls. Empty ctf_link_outputs if it already exists rather than having the old output leak into the new link. Fix a variable name. * testsuite/config/default.exp (AR): Add. (OBJDUMP): Likewise. * testsuite/libctf-regression/libctf-repeat-cu.exp: New test. * testsuite/libctf-regression/libctf-repeat-cu*: Main program, library, and expected results for the test.
2022-04-28libctf: impose an ordering on conflicting typesNick Alcock1-1/+20
When two types conflict and they are not types which can have forwards (say, two arrays of different sizes with the same name in two different TUs) the CTF deduplicator uses a popularity contest to decide what to do: the type cited by the most other types ends up put into the shared dict, while the others are relegated to per-CU child dicts. This works well as long as one type *is* most popular -- but what if there is a tie? If several types have the same popularity count, we end up picking the first we run across and promoting it, and unfortunately since we are working over a dynhash in essentially arbitrary order, this means we promote a random one. So multiple runs of ld with the same inputs can produce different outputs! All the outputs are valid, but this is still undesirable. Adjust things to use the same strategy used to sort types on the output: when there is a tie, always put the type that appears in a CU that appeared earlier on the link line (and if there is somehow still a tie, which should be impossible, pick the type with the lowest type ID). Add a testcase -- and since this emerged when trying out extern arrays, check that those work as well (this requires a newer GCC, but since all GCCs that can emit CTF at all are unreleased this is probably OK as well). Fix up one testcase that has slight type ordering changes as a result of this change. libctf/ChangeLog: * ctf-dedup.c (ctf_dedup_detect_name_ambiguity): Use cd_output_first_gid to break ties. ld/ChangeLog: * testsuite/ld-ctf/array-conflicted-ordering.d: New test, using... * testsuite/ld-ctf/array-char-conflicting-1.c: ... this... * testsuite/ld-ctf/array-char-conflicting-2.c: ... and this. * testsuite/ld-ctf/array-extern.d: New test, using... * testsuite/ld-ctf/array-extern.c: ... this. * testsuite/ld-ctf/conflicting-typedefs.d: Adjust for ordering changes.
2022-01-02Update year range in copyright notice of binutils filesAlan Modra1-1/+1
The result of running etc/update-copyright.py --this-year, fixing all the files whose mode is changed by the script, plus a build with --enable-maintainer-mode --enable-cgen-maint=yes, then checking out */po/*.pot which we don't update frequently. The copy of cgen was with commit d1dd5fcc38ead reverted as that commit breaks building of bfp opcodes files.
2021-05-09Use htab_eq_string in libctfAlan Modra1-7/+7
* ctf-impl.h (ctf_dynset_eq_string): Don't declare. * ctf-hash.c (ctf_dynset_eq_string): Delete function. * ctf-dedup.c (make_set_element): Use htab_eq_string. (ctf_dedup_atoms_init, ADD_CITER, ctf_dedup_init): Likewise. (ctf_dedup_conflictify_unshared): Likewise. (ctf_dedup_walk_output_mapping): Likewise.
2021-05-06libctf, include: support an alternative encoding for nonrepresentable typesNick Alcock1-9/+5
Before now, types that could not be encoded in CTF were represented as references to type ID 0, which does not itself appear in the dictionary. This choice is annoying in several ways, principally that it forces generators and consumers of CTF to grow special cases for types that are referenced in valid dicts but don't appear. Allow an alternative representation (which will become the only representation in format v4) whereby nonrepresentable types are encoded as actual types with kind CTF_K_UNKNOWN (an already-existing kind theoretically but not in practice used for padding, with value 0). This is backward-compatible, because CTF_K_UNKNOWN was not used anywhere before now: it was used in old-format function symtypetabs, but these were never emitted by any compiler and the code to handle them in libctf likely never worked and was removed last year, in favour of new-format symtypetabs that contain only type IDs, not type kinds. In order to link this type, we need an API addition to let us add types of unknown kind to the dict: we let them optionally have names so that GCC can emit many different unknown types and those types with identical names will be deduplicated together. There are also small tweaks to the deduplicator to actually dedup such types, to let opening of dicts with unknown types with names work, to return the ECTF_NONREPRESENTABLE error on resolution of such types (like ID 0), and to print their names as something useful but not a valid C identifier, mostly for the sake of the dumper. Tests added in the next commit. include/ChangeLog 2021-05-06 Nick Alcock <nick.alcock@oracle.com> * ctf.h (CTF_K_UNKNOWN): Document that it can be used for nonrepresentable types, not just padding. * ctf-api.h (ctf_add_unknown): New. libctf/ChangeLog 2021-05-06 Nick Alcock <nick.alcock@oracle.com> * ctf-open.c (init_types): Unknown types may have names. * ctf-types.c (ctf_type_resolve): CTF_K_UNKNOWN is as non-representable as type ID 0. (ctf_type_aname): Print unknown types. * ctf-dedup.c (ctf_dedup_hash_type): Do not early-exit for CTF_K_UNKNOWN types: they have real hash values now. (ctf_dedup_rwalk_one_output_mapping): Treat CTF_K_UNKNOWN types like other types with no referents: call the callback and do not skip them. (ctf_dedup_emit_type): Emit via... * ctf-create.c (ctf_add_unknown): ... this new function. * libctf.ver (LIBCTF_1.2): Add it.
2021-03-18libctf: a couple of small error-handling fixesNick Alcock1-9/+12
Out-of-memory errors initializing the string atoms table were disregarded (though they would have caused a segfault very shortly afterwards). Errors hashing types during deduplication were only reported if they happened on the output dict, which is almost never the case (most errors are going to be on the dict we're working over, which is going to be one of the inputs). (The error was detected in both cases, but the errno was extracted from the wrong dict.) libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-dedup.c (ctf_dedup_rhash_type): Report errors on the input dict properly. * ctf-open.c (ctf_bufopen_internal): Report errors initializing the atoms table.
2021-03-18libctf: eliminate dtd_u, part 1: int/float/sliceNick Alcock1-1/+1
This series eliminates a lot of special-case code to handle dynamic types (types added to writable dicts and not yet serialized). Historically, when such types have variable-length data in their final CTF representations, libctf has always worked by adding such types to a special union (ctf_dtdef_t.dtd_u) in the dynamic type definition structure, then picking the members out of this structure at serialization time and packing them into their final form. This has the advantage that the ctf_add_* code doesn't need to know anything about the final CTF representation, but the significant disadvantage that all code that looks up types in any way needs two code paths, one for dynamic types, one for all others. Historically libctf "handled" this by not supporting most type lookups on dynamic types at all until ctf_update was called to do a complete reserialization of the entire dict (it didn't emit an error, it just emitted wrong results). Since commit 676c3ecbad6e9c4, which eliminated ctf_update in favour of the internal-only ctf_serialize function, all the type-lookup paths grew an extra branch to handle dynamic types. We can eliminate this branch again by dropping the dtd_u stuff and simply writing out the vlen in (close to) its final form at ctf_add_* time: type lookup for types using this approach is then identical for types in writable dicts and types that are in read-only ones, and serialization is also simplified (we just need to write out the vlen we already created). The only complexity lies in type kinds for which multiple vlen representations are valid depending on properties of the type, e.g. structures. But we can start simple, adjusting ints, floats, and slices to work this way, and leaving everything else as is. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_enc>: Remove. <dtd_u.dtu_slice>: Likewise. <dtd_vlen>: New. * ctf-create.c (ctf_add_generic): Perhaps allocate it. All callers adjusted. (ctf_dtd_delete): Free it. (ctf_add_slice): Use the dtd_vlen, not dtu_enc. (ctf_add_encoded): Likewise. Assert that this must be an int or float. * ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen. * ctf-dedup.c (ctf_dedup_rhash_type): Use the dtd_vlen, not dtu_slice. * ctf-types.c (ctf_type_reference): Likewise. (ctf_type_encoding): Remove most dynamic-type-specific code: just get the vlen from the right place. Report failure to look up the underlying type's encoding.
2021-03-18libctf: fix GNU style for do {} whileNick Alcock1-32/+35
It's formatted like this: do { ... } while (...); Not like this: do { ... } while (...); or this: do { ... } while (...); We used both in various places in libctf. Fixing it necessitated some light reindentation. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-archive.c (ctf_archive_next): GNU style fix for do {} while. * ctf-dedup.c (ctf_dedup_rhash_type): Likewise. (ctf_dedup_rwalk_one_output_mapping): Likewise. * ctf-dump.c (ctf_dump_format_type): Likewise. * ctf-lookup.c (ctf_symbol_next): Likewise. * swap.h (swap_thing): Likewise.
2021-03-02libctf: minor error-handling fixesNick Alcock1-6/+11
A transient bug in the preceding change (fixed before commit) exposed a new failure, of ld/testsuite/ld-ctf/diag-parname.d. This attempts to ensure that if we link a dict with child type IDs but no attached parent, we get a suitable ECTF_NOPARENT error. This was happening before this commit, but only by chance, because ctf_variable_iter and ctf_variable_next check to see if the dict they're passed is a child dict without an associated parent. We forgot error-checking on the ctf_variable_next call, and as a result this was concealed -- and looking for the problem exposed a new bug. If any of the lookups beneath ctf_dedup_hash_type fail, the CTF link does *not* fail, but acts quite bizarrely, skipping the type but emitting an error to the CTF error/warning log -- so the linker will report an error, emit a partial CTF dict missing some types, and exit with exitcode 0 as if nothing went wrong. Since ctf_dedup_hash_type is never expected to fail in normal operation, this is surely wrong: failures at emission time do not emit partial CTF dicts, so failures at hashing time should not either. So propagate the error back up. Also fix a couple of smaller bugs where we fail to properly free things and/or propagate error codes on various rare link-time errors and out-of-memory conditions. libctf/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ctf-dedup.c (ctf_dedup): Pass on errors from ctf_dedup_hash_type. Call ctf_dedup_fini properly on other errors. (ctf_dedup_emit_type): Set the errno on dynhash insertion failure. * ctf-link.c (ctf_link_deduplicating_per_cu): Close outputs beyond output 0 when asserting because >1 output is found. (ctf_link_deduplicating): Likewise, when asserting because the shared output is not the same as the passed-in fp.
2021-03-02libctf: add a deduplicator-specific type mapping tableNick Alcock1-101/+97
When CTF linking is done, the linker has to track the association between types in the inputs and types in the outputs. The deduplicator does this via the cd_output_emission_hashes, which maps from hashes of types (valid in both the input and output) to the IDs of types in the specific dict in which the cd_emission_hashes is held. However, the nondeduplicating linker and ctf_add_type used a different mechanism, a dedicated hashtab stored in the ctf_link_type_mapping, populated via ctf_add_type_mapping and queried via the ctf_type_mapping function. To allow the same functions to be used for variable and symbol population in both the deduplicating and nondeduplicating linker, the deduplicator carefully transferred all its input->output mappings into this hashtab before returning. This is *expensive*. The number of entries in this hashtab scales as the number of input types, and unlike the hashing machinery the type mapping machinery (the only other thing which scales that way) has not been much optimized. Now the nondeduplicating linker is gone, we can throw this out, move the existing type mapping machinery to ctf-create.c and dedicate it to ctf_add_type alone, and add a new function ctf_dedup_type_mapping which uses the deduplicator's built-in knowledge of type mappings directly, without requiring an expensive repopulation phase. This speeds up a test link of nouveau.ko (a good worst-case candidate with a lot of types in each of a lot of input files) from 9.11s to 7.15s in my testing, a speedup of over 20%. libctf/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_link_type_mapping>: No longer used by the nondeduplicating linker. (ctf_add_type_mapping): Removed, now static. (ctf_type_mapping): Likewise. (ctf_dedup_type_mapping): New. (ctf_dedup_t) <cd_input_nums>: New. * ctf-dedup.c (ctf_dedup_init): Populate it. (ctf_dedup_fini): Free it again. Emphasise that this has to be the last thing called. (ctf_dedup): Populate it. (ctf_dedup_populate_type_mapping): Removed. (ctf_dedup_populate_type_mappings): Likewise. (ctf_dedup_emit): No longer call it. No longer call ctf_dedup_fini either. (ctf_dedup_type_mapping): New. * ctf-link.c (ctf_unnamed_cuname): New. (ctf_create_per_cu): Arguments must be non-null now. (ctf_in_member_cb_arg): Removed. (ctf_link): No longer populate it. No longer discard the mapping table. (ctf_link_deduplicating_one_symtypetab): Use ctf_dedup_type_mapping, not ctf_type_mapping. Use ctf_unnamed_cuname. (ctf_link_one_variable): Likewise. Pass in args individually: no longer a ctf_variable_iter callback. (empty_link_type_mapping): Removed. (ctf_link_deduplicating_variables): Use ctf_variable_next, not ctf_variable_iter. No longer pack arguments to ctf_link_one_variable into a struct. (ctf_link_deduplicating_per_cu): Call ctf_dedup_fini once all link phases are done. (ctf_link_deduplicating): Likewise. (ctf_link_intern_extern_string): Improve comment. (ctf_add_type_mapping): Migrate... (ctf_type_mapping): ... these functions... * ctf-create.c (ctf_add_type_mapping): ... here... (ctf_type_mapping): ... and make static, for the sole use of ctf_add_type.
2021-02-04libctf: always name nameless types "", never NULLNick Alcock1-4/+3
The ctf_type_name_raw and ctf_type_aname_raw functions, which return the raw, unadorned name of CTF types, have one unfortunate wrinkle: they return NULL not only on error but when returning the name of types without a name in writable dicts. This was unintended: it not only makes it impossible to reliably tell if a given call to ctf_type_name_raw failed (due to a bad string offset say), but also complicates all its callers, who now have to check for both NULL and "". The written-out form of CTF has no concept of a NULL pointer instead of a string: all null strings are strtab offset 0, "". So the more we can do to remove this distinction from the writable form, the less complex the rest of our code needs to be. Armour against NULL in multiple places, arranging to return "" from ctf_type_name_raw if offset 0 is passed in, and removing a risky optimization from ctf_str_add* that avoided doing anything if a NULL was passed in: this added needless irregularity to the functions' API surface, since "" and NULL should be treated identically, and in the case of ctf_str_add_ref, we shouldn't skip adding the passed-in REF to the list of references to be updated no matter what the content of the string happens to be. This means we can simplify the deduplicator a tiny bit, also fixing a bug (latent when used by ld) where if the input dict was writable, we failed to realise when types were nameless and could end up creating deeply unhelpful synthetic forwards with no name, which we just banned a few commits ago, so the link failed. libctf/ChangeLog 2021-01-27 Nick Alcock <nick.alcock@oracle.com> * ctf-string.c (ctf_str_add): Treat adding a NULL as adding "". (ctf_str_add_ref): Likewise. (ctf_str_add_external): Likewise. * ctf-types.c (ctf_type_name_raw): Always return "" for offset 0. * ctf-dedup.c (ctf_dedup_multiple_input_dicts): Don't armour against NULL name. (ctf_dedup_maybe_synthesize_forward): Likewise.
2021-01-05libctf, include: support unnamed structure members betterNick Alcock1-3/+3
libctf has no intrinsic support for the GCC unnamed structure member extension. This principally means that you can't look up named members inside unnamed struct or union members via ctf_member_info: you have to tiresomely find out the type ID of the unnamed members via iteration, then look in each of these. This is ridiculous. Fix it by extending ctf_member_info so that it recurses into unnamed members for you: this is still unambiguous because GCC won't let you create ambiguously-named members even in the presence of this extension. For consistency, and because the release hasn't happened and we can still do this, break the ctf_member_next API and add flags: we specify one flag, CTF_MN_RECURSE, which if set causes ctf_member_next to automatically recurse into unnamed members for you, returning not only the members themselves but all their contained members, so that you can use ctf_member_next to identify every member that it would be valid to call ctf_member_info with. New lookup tests are added for all of this. include/ChangeLog 2021-01-05 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (CTF_MN_RECURSE): New. (ctf_member_next): Add flags argument. libctf/ChangeLog 2021-01-05 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (struct ctf_next) <u.ctn_next>: Move to... <ctn_next>: ... here. * ctf-util.c (ctf_next_destroy): Unconditionally destroy it. * ctf-lookup.c (ctf_symbol_next): Adjust accordingly. * ctf-types.c (ctf_member_iter): Reimplement in terms of... (ctf_member_next): ... this. Support recursive unnamed member iteration (off by default). (ctf_member_info): Look up members in unnamed sub-structs. * ctf-dedup.c (ctf_dedup_rhash_type): Adjust ctf_member_next call. (ctf_dedup_emit_struct_members): Likewise. * testsuite/libctf-lookup/struct-iteration-ctf.c: Test empty unnamed members, and a normal member after the end. * testsuite/libctf-lookup/struct-iteration.c: Verify that ctf_member_count is consistent with the number of successful returns from a non-recursive ctf_member_next. * testsuite/libctf-lookup/struct-iteration-*: New, test iteration over struct members. * testsuite/libctf-lookup/struct-lookup.c: New test. * testsuite/libctf-lookup/struct-lookup.lk: New test.
2021-01-01Update year range in copyright notice of binutils filesAlan Modra1-1/+1