aboutsummaryrefslogtreecommitdiff
path: root/libctf/ctf-dedup.c
AgeCommit message (Collapse)AuthorFilesLines
2025-06-26libctf, dedup: reclaim space wasted by duplicate hidden typesNick Alcock1-12/+30
In normal deduplicating links, we insert every type (identified by its unique hash) precisely once. But conflicting types appear in multiple dicts, so for those, we loop, inserting them into every target dict in turn (each corresponding to an input dict that type appears in). But in cu-mapped links, some of those dicts may have been merged into one: now that we are hiding duplicate conflicting types more aggressively in such links, we are getting duplicate identical hidden types turning up in large numbers. Fix this by eliminating them in cu-mapping phase 1 (the phase in which this merging takes place), by checking to see if a type with this hash has already been inserted in this dict and skipping it if so. This is redundant and a waste of time in other cu-mapping phases and in normal links, but in cu-mapped links it saves a few tens to hundreds of kilobytes in kernel-sized links. libctf/ PR libctf/33047 * ctf-dedup.c (ctf_dedup_emit_type): Check for already-emitted types in cu-mapping phase 1.
2025-06-26libctf: dedup: preserve non-root flag across normal linksNick Alcock1-59/+184
The previous commits dropped preservation of the non-root flag in ctf_link and arranged to use it somewhat differently to track conflicting types in cu-mapped CUs when doing cu-mapped links. This was necessary to prevent entirely spuriously hidden types from appearing on the output of such links. Bring it (and the test for it) back. The problem with the previous design was that it implicitly assumed that the non-root flag it saw on the input was always meant to be preserved (when in the final phase of cu-mapped links it merely means that conflicting types were found in intermediate links), and also that it could figure out what the non-root flag on the input was by sucking in the non-root flag of the input type corresponding to an output in the output mapping (which maps type hashes to a corresponding type on some input). This method of getting properties of the input type *does* work *if* that property was one of those hashed by the ctf_dedup_hash_type process. In that case, every type with a given hash will have the same value for all hashed-in properties, so it doesn't matter which one is consulted (the output mapping points at an arbitrary one of those input types). But the non-root flag is explicitly *not* hashed in: as a comment in ctf_dedup_rhash_type notes, being non-root is not a property of a type, and two types (one non-root, one not) can perfectly well be the same type even though one is visible and one isn't. So just copying the non-root flag from the output mapping's idea of the input type will copy in a value that is not stabilized by the hash, so is more-or-less random! So we cannot do that. We have to do something else, which means we have to decide what to do if two identical types with different nonroot flag values pop up. The most sensible thing to do is probably to say that if all instances of a type are non-root-visible, the linked output should also be non-root-visible: any root-visible types in that set, and the output type is root-visible again. We implement this with a new cd_nonroot_consistency dynhash, which maps type hashes to the value 0 ("all instances root-visible"), 1 ("all instances non-root-visible") or 2 ("inconsistent"). After hashing is over, we save a bit of memory by deleting everything from this hashtab that doesn't have a value of 1 ("non-root-visible"), then use this to decide whether to emit any given type as non-root-visible or not. However... that's not quite enough. In cu-mapped links, we want to disregard this whole thing because we just hide everything -- but in phase 2, when we take the smushed-together CUs resulting from phase 1 and deduplicate them against each other, we want to do what the previous commits implemented and ignore the non-root flag entirely, instead falling back to preventing clashes by hiding anything that would be considered conflicting. We extend the existing cu_mapped parameter to various bits of ctf_dedup so that it is now tristate: 0 means a normal link, 1 means the smush-it- together phase of cu-mapped links, and 2 means the final phase of cu-mapped links. We do the hide-conflicting stuff only in phase 2, meaning that normal links by GNU ld can always respect the value of the nonroot flag put on types in the input. (One extra thing added as part of this: you can now efficiently delete the last value returned by ctf_dynhash_next() by calling ctf_dynhash_next_remove.) We bring back the ctf-nonroot-linking test with one tweak: linking now works on mingw as long as you're using the ucrt libc, so re-enable it for better test coverage on that platform. libctf/ PR libctf/33047 * ctf-hash.c (ctf_dynhash_next_remove): New. * ctf-impl.h (struct ctf_dedup) [cd_nonroot_consistency]: New. * ctf-link.c (ctf_link_deduplicating): Differentiate between cu-mapped and non-cu-mapped links, even in the final phase. * ctf-dedup.c (ctf_dedup_hash_type): Callback prototype addition. Get the non-root flag and pass it down. (ctf_dedup_rhash_type): Callback prototype addition. Document restrictions on use of the nonroot flag. (ctf_dedup_populate_mappings): Populate cd_nonroot_consistency. (ctf_dedup_hash_type_fini): New function: delete now-unnecessary values from cd_nonroot_consistency. (ctf_dedup_init): Initialize it. (ctf_dedup_fini): Destroy it. (ctf_dedup): cu_mapping is now cu_mapping_phase. Call ctf_dedup_hash_type_fini. (ctf_dedup_emit_type): Use cu_mapping_phase and cd_nonroot_consistency to propagate the non-root flag into outputs for normal links, and to do name-based conflict checking only for phase 2 of cu-mapped links. (ctf_dedup_emit): cu_mapping is now cu_mapping_phase. Adjust assertion accordingly. * testsuite/libctf-writable/ctf-nonroot-linking.c: Bring back. * testsuite/libctf-writable/ctf-nonroot-linking.lk: Likewise.
2025-06-25libctf: dedup: improve hiding of conflicting types in the same dictNick Alcock1-10/+44
If types are conflicting, they are usually moved into separate child dicts -- but not always. If they are added to the same dict by the cu-mapping mechanism (as used e.g. for multi-TU kernel modules), we can easily end up adding multiple conflicting types with the same name to the same dict. The mechanism used for turning on the non-root-visible flag in order to do this had a kludge attached which always hid types with the same name, whether or not they were conflicting. This is unnecessary and can hide types that should not be hidden, as well as hiding bugs. Remove it, and replace it with two different approaches: - for everything but cu-mapped links (the in-memory first phase of a link with ctf_link_add_cu_mapping in force), check for duplicate names if types are conflicting, and mark them as hidden if the names are found. This will never happen in normal links (in an upcoming commit we will suppress doing even this much in such cases). - for cu-mapped links, the only case that merges multiple distinct target dicts into one, we apply a big hammer and simply hide everything! The non-root flag will be ignored in the next link phase anyway (which dedups the cu-mapped pieces against each other), and this way we can be sure that merging multiple types cannot incur name clashes at this stage. The result seems to work: the only annoyance is that when enums with conflicting enumerators are found in a single cu-mapped child (so, really multiple merged children), you may end up with every instance of that enum being hidden for reasons of conflictingness. I don't see a real way to avoid that. libctf/ PR libctf/33047 * ctf-dedup.c (ctf_dedup_emit_type): Only consider non conflicting types. Improve type hiding in the presence of clashing enumerators. Hide everything when doing a cu-mapped link: they will be unhidden by the next link pass if nonconflicting.
2025-06-25Revert "libctf: fix linking of non-root-visible types"Nick Alcock1-8/+2
This reverts commit 87b2f673102884d7c69144c85a26ed5dbaa4f86a. It is based on a misconception, that hidden types in the deduplicator input should always be hidden in the output. For cu-mapped links, and final links following cu-mapped links, this is not true: we want to hide inputs if they were conflicting on the output and no more. We will reintroduce the testcase once a better fix is found. libctf/ PR libctf/33047 * ctf-dedup.c (ctf_dedup_emit_type): Don't respect the nonroot flag. * testsuite/libctf-writable/ctf-nonroot-linking.c: Removed. * testsuite/libctf-writable/ctf-nonroot-linking.lk: Removed.
2025-04-25libctf: dedup: conflicting CU names and merging into the parentNick Alcock1-4/+17
The last two dedup changes are, firstly, to use ctf_add_conflicting() to arrange that conflicting types that are hidden because they are added to the same dict as the types they conflict with (e.g. conflicting types in modules) are properly marked with the CU name that the type comes from. This could of course not be done with the old non-root flag, but now that we have proper prefix types, we can record it, and consumers can find out what CU any type comes from via ctf_type_conflicting (or, for non-kernel CTF generated by GNU ld, via the ctf_cuname of the per-cu dict). Secondly, we add a new kind of CU mapping for cu-mapped (two-stage) links (as a reminder, these carry out a second stage of dedupping in which they squash specific CUs down to a named set of child dicts, fusing named inputs into particular named outputs: the kernel linker uses this to make child dicts that represent modules rather than translation units). You can now map any CU name to "" (the null string). This indicates that types that would land in the CU in question should not be emitted into any sort of per-module dict but should instead just be emitted into the shared dict, possibly being marked conflicting as they do so. The usual popcount mechanism will be used to pick the type which is left unhidden. The usual forwarding stubs you would expect to find for conflicting structs and unions will not be emitted: instead, real structs and unions will take their place. Consumers must take care when chasing parent types that point to tagged structs to make sure that there isn't a correspondingly-named struct in the child they're looking at (but this is generally a problem with type chasing in children anyway, which I have a TODO open to find some sort of solution to: this should be being done automatically, and isn't).
2025-04-25libctf: dedup: decl tag support.Nick Alcock1-52/+397
Decl tags to types and to functions and function arguments are relatively straightforward, as are decl tags to structures as a whole or to members of untagged structures; but decl tags to specific members of tagged structs and unions have two separate nasty problems, entirely down to the use of tagged structures to break cycles in the type graph. The first is that we have to mark decl tags conflicting if their associated struct is conflicting, but traversal from types to their parents halts at tagged structs and unions, because the type graph is sharded via stubs at those points and conflictedness ceases. But we don't want to do that here: a decl_tag to member 10 of some struct is only valid if that struct *has* ten members, and if the struct is conflicted, some may have only one. The decl tag is only valid for the specific struct-with-ten-members it was originally pointing at, anyway: other structs-with-ten-members may have entirely different members there, which are not tagged or which are tagged with something else. So we track this by keeping track of the only thing that is knowable about struct/union stubs: their decorated name. The citers graph gains mappings from decorated SoU names to decl tags (where the decl tag has a component_idx), and conflictedness marking chases that and marks accordingly, via the new ctf_dedup_mark_conflicting_hash_citers. The second problem is that we have to emit decl tags to struct members of all kinds after the members are emitted, but the members are emitted later than core type deduplication because they might refer to any types in the dict, including types added after the struct was added. So we need to accumulate decl tags to struct members in a new hashtab (cd_emission_struct_decl_tags) and add yet *another* pass that traverses that and emits all the decl tags in it. (If it turns out that decl tags to other things can similarly appear before the type they refer to, we'll either have to sort them earlier or emit them at the end as well -- but this seems unlikely.) None of this complexity is properly tested, because we're not yet emitting decl tags (as far as I know). But at least it doesn't break anything else, and it's somewhere to start.
2025-04-25libctf: dedup: type tagsNick Alcock1-0/+15
Another trivial case: they're just like pointers except that they have a name (and we don't need to care about that, because names are hashed in, if present, anyway).
2025-04-25libctf: dedup: datasecs and varsNick Alcock1-17/+363
These are a bit trickier than previous things. Datasecs are unusual: the content they contain for a given variable is conceptually part of that variable, in that a variable can only appear in one datasec: so if two TUs have different datasec values for a variable, you'll want to emit two conflicting variables with different datasec entries. Equally, if they have entries in different datasecs, they're conflicting. But the *index* of a variable in a datasec has nothing to do with the variable: it's just a property of how many other variables are in the datasec. So we turn the type graph upside down for them. We track the variable -> datasec mappings for every variable we are dedupping, and use this to hash variables with datasec entries *twice*: firstly, as purely variable type, name, and promoted-to-non-extern linkage, and secondly with all of that plus the datasec name, offset and size: we indicate that the non-extern hash *replaces* the extern one, and use this later on. The datasec itself is not hashed at all! We skip it at both hashing and emission time (without breaking anything else, because nothing points at datasecs, so nothing will ever recurse down into one). The popcount code (used to find the "most popular" type, the one to put in the shared dict) changes to say that replaced types (extern vars) popcounts are added to the counts of the types that replace them (the corresponding non-extern vars). At emission time, replaced variables (extern variables) are skipped, ensuring that extern vars with non-conflicting non-extern counterparts are skipped in favour of the non-extern ones. ctf_add_section_variable then takes care of emitting both the var and its corresponding datasec for us.
2025-04-25libctf: dedup: structs with bitfields, BTF floatsNick Alcock1-3/+35
The last two trivial cases. Hash in the bitfieldness of structs and the bit-width of members (their bit-offset is already being hashed in), and emit them accordingly. BTF floats hardly have any state: emitting them is even easier.
2025-04-25libctf: dedup: enums, enum64s, functions, func linkageNick Alcock1-14/+187
These are all fairly simple and are handled together because some of the diffs are annoyingly entwined. enum and enum64 are trivial: it's just like enums used to be, except that we hash in the unsignedness value, and emit signed or unsigned enums or enum64s appropriately. (The signedness stuff on the emission side is fairly invisible: it's automatically handled for us by ctf_type_encoding and ctf_add_enum*_encoded, via the CTF_INT_SIGNED encoding.) Functions are also fairly simple: we hash in all the parameter names as well as the args, and emit them accordingly. Linkage is more difficult. We want to deduplicate extern and non-extern declarations together, while leaving static ones separate. We do this by promoting extern linkage to global at hashing time, and maintaining a cd_linkages hashmap which maps from type hash values of func linkages (and vars) to the best linkage known so far, then updating it if a better one ("less extern") comes along (relying on the fact that we are already unifying the hashes of otherwise-identical extern and non-extern types). At emission time, we use this hashtab to figure out what linkage to emit.
2025-04-25libctf: dedup: comment fixes, debug indentation changes, and a tiny leakNick Alcock1-41/+41
Getting these out of the way to avoid them wrecking the diffs for the next commits.
2025-04-25libctf: dedup: fix a broken error path in string dedupNick Alcock1-1/+1
If we run out of memory updating the string counts, set the right errno: ctf_dynhash_insert returns a *negative* error value, and we want a positive one in the ctf_errno.
2025-04-25libctf: dedup: chase API changes: use the public API moreNick Alcock1-25/+41
To get ready for the deduplicator changes, we chase the API changes to things like ctf_member_next, and add support for prefix types (using the suffix where appropriate, etc). We use the ctf-types API for things like forward lookup, using the private _tp functions to reduce overhead while centralizing knowledge of things like the encoding of enum forwards outside the deduplicator. No functional changes yet.
2025-04-25libctf: strings: no external strings in BTFNick Alcock1-4/+5
One of the things BTF doesn't have is the concept of external strings which can be shared with the ELF strtab. Therefore, even if the linker has reported strings which the dict is reusing, when we generate the strtab for a BTF dict we should emit those strings into it (and we should certainly not cause the presence of external strings to prevent BTF emission!) Note that since already-written strtab entries are never erased, writing a dict as BTF and then CTF will cause external strings to be emitted even for the CTF. This sort of repeated writing in different formats seems to be very rare: in any case, the problem can be avoided by simply doing the CTF writeout first (the following BTF writeout will spot the missing external- in-CTF strings and add them). We also throw away the internal-only function ctf_strraw_explicit(), which was used to add strings with a hardwired strtab: it was only ever used to write out the variable section, which is gone in v4.
2025-04-25libctf, serialize: preparatory stepsNick Alcock1-0/+3
The new serializer is quite a lot more customizable than the old, because it can write out BTF as well as CTF: you can ask to write out BTF or fail, write out CTF if required to avoid information loss, otherwise BTF, or always write out CTF. Callers often need to find out whether a dict could be written out as BTF before deciding how to write it out (because a dict can never be written out as BTF if it is compressed, a caller might well want to ask if there is anything else that prevents BTF writeout -- say, slices, conflicting types, or CTF_K_BIG -- before deciding whether to compress it). GNU ld will do this whenever it is passed only BTF sections on the input. Figuring out whether a dict can be written out as BTF is quite expensive: we have to traverse all the types and check them, including every member of every struct. So we'd rather do that work only once. This means making a lot of state once private to ctf_preserialize public enough that another function can initialize it; and since the whole API is available after calling this function and before serializing, we should probably arrange that if we do things we know will invalidate the results of all this checking, we are forced to do it again. This commit does that, moving all the existing serialization state into a new ctf_serialize_t and adding to it. Several functions grow force_ctf arguments that allow the caller to force CTF emission even if the type section looks BTFish: the writeout code and archive creation use this to force CTF emission if we are compressing, and archive creation uses it to force CTF emission if a CTF multi-member archive is in use, because BTF doesn't support archives at all so there's no point maintaining BTF compatibility in that case. The ctf_write* functions gain support for writing out BTF headers as well as CTF, depending on whether what was ultimately written out was actually BTF or not. Even more than most commits in this series, there is no way this is going to compile right now: we're in the middle of a major transition, completed in the next few commits.
2025-03-16Tiny stylistic spacing and comment tweaksNick Alcock1-1/+1
2025-02-28libctf: drop LCTF_TYPE_ISPARENT/LCTF_TYPE_ISCHILDNick Alcock1-2/+2
Parent/child determination is about to become rather more complex, making a macro impractical. Use the ctf_type_isparent/ischild function calls everywhere and remove the macro. Make them more const-correct too, to make them more widely usable. While we're about it, change several places that hand-implemented ctf_get_dict() to call it instead, and armour several functions against the null returns that were always possible in this case (but previously unprotected-against).
2025-02-28libctf, string: delete separate movable ref storage againNick Alcock1-12/+5
This was added last year to let us maintain a backpointer to the movable refs dynhash in movable ref atoms without spending space for the backpointer on the majority of (non-movable) refs and also without causing an atom which had some refs movable and some refs not movable to dereference unallocated storage when freed. The backpointer's only purpose was to let us locate the ctf_str_movable_refs dynhash during item freeing, when we had nothing but a pointer to the atom being freed. Now we have a proper freeing arg, we don't need the backpointer at all: we can just pass a pointer to the dict in to the atoms dynhash as a freeing arg for the atom freeing functions, and throw the whole backpointer and separate movable ref list complexity away.
2025-02-28libctf: dedup: describe 'citer'Nick Alcock1-0/+2
The distinction between the citer and citers variables in ctf_dedup_rhash_type is somewhat opaque (it's a micro-optimization to avoid having to allocate entire sets when we know in advance that we'll only have to store one value). Add a comment. libctf/ * ctf-dedup.c (ctf_dedup_rhash_type): Comment on citers variables.
2025-02-28libctf: dedup: add strtab deduplicatorNick Alcock1-9/+164
This is a pretty simple two-phase process (count duplicates that are actually going to end up in the strtab and aren't e.g. strings without refs, strings with external refs etc, and move them into the parent) with one wrinkle: we sorta-abuse the csa_external_offset field in the deduplicated child atom (normally used to indicate that this string is located in the ELF strtab) to indicate that this atom is in the *parent*. If you think of "external" as meaning simply "is in some other strtab, we don't care which one", this still makes enough sense to not need to change the name, I hope. This is still not called from anywhere, so strings are (still!) not deduplicated, and none of the dedup machinery added in earlier commits does anything yet. libctf/ * ctf-dedup.c (ctf_dedup_emit_struct_members): Note that strtab dedup happens (well) after struct member emission. (ctf_dedup_strings): New. * ctf-impl.h (ctf_dedup_strings): Declare.
2025-01-01Update year range in copyright notice of binutils filesAlan Modra1-1/+1
2024-07-31libctf, include: add ctf_dict_set_flag: less enum dup checking by defaultNick Alcock1-0/+1
The recent change to detect duplicate enum values and return ECTF_DUPLICATE when found turns out to perturb a great many callers. In particular, the pahole-created kernel BTF has the same problem we historically did, and gleefully emits duplicated enum constants in profusion. Handling the resulting duplicate errors from BTF -> CTF converters reasonably is unreasonably difficult (it amounts to forcing them to skip some types or reimplement the deduplicator). So let's step back a bit. What we care about mostly is that the deduplicator treat enums with conflicting enumeration constants as conflicting types: programs that want to look up enumeration constant -> value mappings using the new APIs to do so might well want the same checks to apply to any ctf_add_* operations they carry out (and since they're *using* the new APIs, added at the same time as this restriction was imposed, there is likely to be no negative consequence of this). So we want some way to allow processes that know about duplicate detection to opt into it, while allowing everyone else to stay clear of it: but we want ctf_link to get this behaviour even if its caller has opted out. So add a new concept to the API: dict-wide CTF flags, set via ctf_dict_set_flag, obtained via ctf_dict_get_flag. They are not bitflags but simple arbitrary integers and an on/off value, stored in an unspecified manner (the one current flag, we translate into an LCTF_* flag value in the internal ctf_dict ctf_flags word). If you pass in an invalid flag or value you get a new ECTF_BADFLAG error, so the caller can easily tell whether flags added in future are valid with a particular libctf or not. We check this flag in ctf_add_enumerator, and set it around the link (including on child per-CU dicts). The newish enumerator-iteration test is souped up to check the semantics of the flag as well. The fact that the flag can be set and unset at any time has curious consequences. You can unset the flag, insert a pile of duplicates, then set it and expect the new duplicates to be detected, not only by ctf_add_enumerator but also by ctf_lookup_enumerator. This means we now have to maintain the ctf_names and conflicting_enums enum-duplication tracking as new enums are added, not purely as the dict is opened. Move that code out of init_static_types_internal and into a new ctf_track_enumerator function that addition can also call. (None of this affects the file format or serialization machinery, which has to be able to handle duplicate enumeration constants no matter what.) include/ * ctf-api.h (CTF_ERRORS) [ECTF_BADFLAG]: New. (ECTF_NERR): Update. (CTF_STRICT_NO_DUP_ENUMERATORS): New flag. (ctf_dict_set_flag): New function. (ctf_dict_get_flag): Likewise. libctf/ * ctf-impl.h (LCTF_STRICT_NO_DUP_ENUMERATORS): New flag. (ctf_track_enumerator): Declare. * ctf-dedup.c (ctf_dedup_emit_type): Set it. * ctf-link.c (ctf_create_per_cu): Likewise. (ctf_link_deduplicating_per_cu): Likewise. (ctf_link): Likewise. (ctf_link_write): Likewise. * ctf-subr.c (ctf_dict_set_flag): New function. (ctf_dict_get_flag): New function. * ctf-open.c (init_static_types_internal): Move enum tracking to... * ctf-create.c (ctf_track_enumerator): ... this new function. (ctf_add_enumerator): Call it. * libctf.ver: Add the new functions. * testsuite/libctf-lookup/enumerator-iteration.c: Test them.
2024-07-31libctf: dedup: tiny tweaksNick Alcock1-4/+3
Drop an unnecessary variable, and fix a buggy comment. No effect on generated code. libctf/ * ctf-dedup.c (ctf_dedup_detect_name_ambiguity): Drop unnecessary variable. (ctf_dedup_rwalk_output_mapping): Fix comment.
2024-07-31libctf: fix linking of non-root-visible typesNick Alcock1-2/+4
If you deduplicate non-root-visible types, the resulting type should still be non-root-visible! We were promoting all such types to root-visible, and re-demoting them only if their names collided (which might happen on cu-mapped links if multiple compilation units with conflicting types are fused into one child dict). This "worked" before now, in that linking at least didn't fail (if you don't mind having your non-root flag value destroyed if you're adding non-root-visible types), but now that conflicting enumerators cause their containing enums to become conflicted (enums which might have *different names*), this caused the linker to crash when it hit two enumerators with conflicting values. Not testable in ld because cu-mapped links are not exposed to ld, but can be tested via direct creation of libraries and calls to ctf_link directly. (This also tests the ctf_dump non-root type printout, which before now was untested.) libctf/ * ctf-dedup.c (ctf_dedup_emit_type): Non-root-visible input types should be emitted as non-root-visible output types. * testsuite/libctf-writable/ctf-nonroot-linking.c: New test. * testsuite/libctf-writable/ctf-nonroot-linking.lk: New test.
2024-07-31libctf, dedup: drop unnecessary arg from ctf_dedup()Nick Alcock1-32/+26
The PARENTS arg is carefully passed down through all the layers of hash functions and then never used for anything. (In the distant past it was used for cycle detection, but the algorithm eventually committed doesn't need to do cycle detection...) The PARENTS arg is still used by ctf_dedup_emit(), but even there we can loosen the requirements and state that you can just leave entries corresponding to dicts with no parents at zero (which will be useful in an upcoming commit). libctf/ * ctf-dedup.c (ctf_dedup_hash_type): Drop PARENTS arg. (ctf_dedup_rhash_type): Likewise. (ctf_dedup): Likewise. (ctf_dedup_emit_struct_members): Mention what you can do to PARENTS entries for parent dicts. * ctf-impl.h (ctf_dedup): Adjust accordingly. * ctf-link.c (ctf_link_deduplicating_per_cu): Likewise. (ctf_link_deduplicating): Likewise.
2024-06-18libctf: dedup: enums with overlapping enumerators are conflictingNick Alcock1-6/+36
The CTF deduplicator was not considering enumerators inside enum types to be things that caused type conflicts, so if the following two TUs were linked together, you would end up with the following in the resulting dict: 1.c: enum foo { A, B }; 2.c: enum bar { A, B }; linked: enum foo { A, B }; enum bar { A, B }; This does work -- but it's not something that's valid C, and the general point of the shared dict is that it is something that you could potentially get from any valid C TU. So consider such types to be conflicting, but obviously don't consider actually identical enums to be conflicting, even though they too have (all) their identifiers in common. This involves surprisingly little code. The deduplicator detects conflicting types by counting types in a hash table of hash tables: decorated identifier -> (type hash -> count) where the COUNT is the number of times a given hash has been observed: any name with more than one hash associated with it is considered conflicting (the count is used to identify the most common such name for promotion to the shared dict). Before now, those identifiers were all the identifiers of types (possibly decorated with their namespace on the front for enumerator identifiers), but we can equally well put *enumeration constant names* in there, undecorated like the identifiers of types in the global namespace, with the type hash being the hash of each enum containing that enumerator. The existing conflicting-type-detection code will then accurately identify distinct enums with enumeration constants in common. The enum that contains the most commonly-appearing enumerators will be promoted to the shared dict. libctf/ * ctf-impl.h (ctf_dedup_t) <cd_name_counts>: Extend comment. * ctf-dedup.c (ctf_dedup_count_name): New, split out of... (ctf_dedup_populate_mappings): ... here. Call it for all * enumeration constants in an enum as well as types. ld/ * testsuite/ld-ctf/enum-3.c: New test CTF. * testsuite/ld-ctf/enum-4.c: Likewise. * testsuite/ld-ctf/overlapping-enums.d: New test. * testsuite/ld-ctf/overlapping-enums-2.d: Likewise.
2024-04-19libctf: don't pass errno into ctf_err_warn so oftenNick Alcock1-4/+4
The libctf-internal warning function ctf_err_warn() can be passed a libctf errno as a parameter, and will add its textual errmsg form to the passed-in error message. But if there is an error on the fp already, and this is specifically an error and not a warning, ctf_err_warn() will print the error out regardless: there's no need to pass in anything but 0. There are still a lot of places where we do ctf_err_warn (fp, 0, EFOO, ...); return ctf_set_errno (fp, 0, EFOO); I've left all of those alone, because fixing it makes the code a bit longer: but fixing the cases where no return is involved and the error has just been set on the fp itself costs nothing and reduces redundancy a bit. libctf/ * ctf-dedup.c (ctf_dedup_walk_output_mapping): Drop the errno arg. (ctf_dedup_emit): Likewise. (ctf_dedup_type_mapping): Likewise. * ctf-link.c (ctf_create_per_cu): Likewise. (ctf_link_deduplicating_close_inputs): Likewise. (ctf_link_deduplicating_one_symtypetab): Likewise. (ctf_link_deduplicating_per_cu): Likewise. * ctf-lookup.c (ctf_lookup_symbol_idx): Likewise. * ctf-subr.c (ctf_assert_fail_internal): Likewise.
2024-01-04Update year range in copyright notice of binutils filesAlan Modra1-1/+1
Adds two new external authors to etc/update-copyright.py to cover bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then updates copyright messages as follows: 1) Update cgen/utils.scm emitted copyrights. 2) Run "etc/update-copyright.py --this-year" with an extra external author I haven't committed, 'Kalray SA.', to cover gas testsuite files (which should have their copyright message removed). 3) Build with --enable-maintainer-mode --enable-cgen-maint=yes. 4) Check out */po/*.pot which we don't update frequently.
2023-10-17libctf: Sanitize error types for PR 30836Torbjörn SVENSSON1-16/+9
Made sure there is no implicit conversion between signed and unsigned return value for functions setting the ctf_errno value. An example of the problem is that in ctf_member_next, the "offset" value is either 0L or (ctf_id_t)-1L, but it should have been 0L or -1L. The issue was discovered while building a 64 bit ld binary to be executed on the Windows platform. Example object file that demonstrates the issue is attached in the PR. libctf/ Affected functions adjusted. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com> Co-Authored-By: Yvan ROUX <yvan.roux@foss.st.com>
2023-03-24libctf: fix a comment typoNick Alcock1-1/+1
ctf_dedup's intern() function does not return a dynamically allocated string, so I just spent ten minutes auditing for obvious memory leaks that couldn't actually happen. Update the comment to note what it actually returns (a pointer into an atoms table: i.e. possibly not a new string, and not so easily leakable). libctf/ * ctf-dedup.c (intern): Update comment.
2023-03-24libctf: fix assertion failure with no system qsort_rNick Alcock1-0/+4
If no suitable qsort_r is found in libc, we fall back to an implementation in ctf-qsort.c. But this implementation routinely calls the comparison function with two identical arguments. The comparison function that ensures that the order of output types is stable is not ready for this, misinterprets it as a type appearing more that once (a can-never-happen condition) and fails with an assertion failure. Fixed, audited for further instances of the same failure (none found) and added a no-qsort test to my regular testsuite run. libctf/: PR libctf/30013 * ctf-dedup.c (sort_output_mapping): Inputs are always equal to themselves.
2023-01-01Update year range in copyright notice of binutils filesAlan Modra1-1/+1
The newer update-copyright.py fixes file encoding too, removing cr/lf on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
2022-06-21libctf: fix linking together multiple objects derived from the same sourceNick Alcock1-0/+2
Right now, if you compile the same .c input repeatedly with CTF enabled and different compilation flags, then arrange to link all of these together, then things misbehave in various ways. libctf may conflate either inputs (if the .o files have the same name, say if they are stored in different .a archives), or per-CU outputs when conflicting types are found: the latter can lead to entirely spurious errors when it tries to produce multiple per-CU outputs with the same name (discarding all but the last, but then looking for types in the earlier ones which have just been thrown away). Fixing this is multi-pronged. Both inputs and outputs need to be differentiated in the hashtables libctf keeps them in: inputs with the same cuname and filename need to be considered distinct as long as they have different associated CTF dicts, and per-CU outputs need to be considered distinct as long as they have different associated input dicts. Right now there is nothing tying the two together other than the CU name: fix this by introducing a new field in the ctf_dict_t named ctf_link_in_out, which (for input dicts) points to the associated per-CU output dict (if any), and for output dicts points to the associated input dict. At creation time the name used is completely arbitrary: it's only important that it be distinct if CTF dicts are distinct. So, when a clash is found, adjust the CU name by sticking the number of elements in the input on the end. At output time, the CU name will appear in the linked object, so it matters a little more that it look slightly less ugly: in conflicting cases, append an incrementing integer, starting at 0. This naming scheme is not very helpful, but it's hard to see what else we can do. The input .o name may be the same. The input .a name is not even visible to ctf_link, and even *that* might be the same, because .a's can contain many members with the same name, all of which participate in the link. All we really know is that the two have distinct dictionaries with distinct types in them, and at least this way they are all represented, any any symbols, variables etc referring to those types are accurately stored. (As a side-effect this also fixes a use-after-free and double-free when errors are found during variable or symbol emission.) Use the opportunity to prevent a couple of sources of problems, to wit changing the active CU mappings when a link has already been done (no effect on ld, which doesn't use CU mappings at all), and causing multiple consecutive ctf_link's to have the same net effect as just doing the last one (no effect on ld, which only ever does one ctf_link) rather than having the links be a sort of half-incremental not-really-intended mess. libctf/ChangeLog: PR libctf/29242 * ctf-impl.h (struct ctf_dict) [ctf_link_in_out]: New. * ctf-dedup.c (ctf_dedup_emit_type): Set it. * ctf-link.c (ctf_link_add_ctf_internal): Set the input CU name uniquely when clashes are found. (ctf_link_add): Document what repeated additions do. (ctf_new_per_cu_name): New, come up with a consistent name for a new per-CU dict. (ctf_link_deduplicating): Use it. (ctf_create_per_cu): Use it, and ctf_link_in_out, and set ctf_link_in_out properly. Don't overwrite per-CU dicts with per-CU dicts relating to different inputs. (ctf_link_add_cu_mapping): Prevent per-CU mappings being set up if we already have per-CU outputs. (ctf_link_one_variable): Adjust ctf_link_per_cu call. (ctf_link_deduplicating_one_symtypetab): Likewise. (ctf_link_empty_outputs): New, delete all the ctf_link_outputs and blank out ctf_link_in_out on the corresponding inputs. (ctf_link): Clarify the effect of multiple ctf_link calls. Empty ctf_link_outputs if it already exists rather than having the old output leak into the new link. Fix a variable name. * testsuite/config/default.exp (AR): Add. (OBJDUMP): Likewise. * testsuite/libctf-regression/libctf-repeat-cu.exp: New test. * testsuite/libctf-regression/libctf-repeat-cu*: Main program, library, and expected results for the test.
2022-04-28libctf: impose an ordering on conflicting typesNick Alcock1-1/+20
When two types conflict and they are not types which can have forwards (say, two arrays of different sizes with the same name in two different TUs) the CTF deduplicator uses a popularity contest to decide what to do: the type cited by the most other types ends up put into the shared dict, while the others are relegated to per-CU child dicts. This works well as long as one type *is* most popular -- but what if there is a tie? If several types have the same popularity count, we end up picking the first we run across and promoting it, and unfortunately since we are working over a dynhash in essentially arbitrary order, this means we promote a random one. So multiple runs of ld with the same inputs can produce different outputs! All the outputs are valid, but this is still undesirable. Adjust things to use the same strategy used to sort types on the output: when there is a tie, always put the type that appears in a CU that appeared earlier on the link line (and if there is somehow still a tie, which should be impossible, pick the type with the lowest type ID). Add a testcase -- and since this emerged when trying out extern arrays, check that those work as well (this requires a newer GCC, but since all GCCs that can emit CTF at all are unreleased this is probably OK as well). Fix up one testcase that has slight type ordering changes as a result of this change. libctf/ChangeLog: * ctf-dedup.c (ctf_dedup_detect_name_ambiguity): Use cd_output_first_gid to break ties. ld/ChangeLog: * testsuite/ld-ctf/array-conflicted-ordering.d: New test, using... * testsuite/ld-ctf/array-char-conflicting-1.c: ... this... * testsuite/ld-ctf/array-char-conflicting-2.c: ... and this. * testsuite/ld-ctf/array-extern.d: New test, using... * testsuite/ld-ctf/array-extern.c: ... this. * testsuite/ld-ctf/conflicting-typedefs.d: Adjust for ordering changes.
2022-01-02Update year range in copyright notice of binutils filesAlan Modra1-1/+1
The result of running etc/update-copyright.py --this-year, fixing all the files whose mode is changed by the script, plus a build with --enable-maintainer-mode --enable-cgen-maint=yes, then checking out */po/*.pot which we don't update frequently. The copy of cgen was with commit d1dd5fcc38ead reverted as that commit breaks building of bfp opcodes files.
2021-05-09Use htab_eq_string in libctfAlan Modra1-7/+7
* ctf-impl.h (ctf_dynset_eq_string): Don't declare. * ctf-hash.c (ctf_dynset_eq_string): Delete function. * ctf-dedup.c (make_set_element): Use htab_eq_string. (ctf_dedup_atoms_init, ADD_CITER, ctf_dedup_init): Likewise. (ctf_dedup_conflictify_unshared): Likewise. (ctf_dedup_walk_output_mapping): Likewise.
2021-05-06libctf, include: support an alternative encoding for nonrepresentable typesNick Alcock1-9/+5
Before now, types that could not be encoded in CTF were represented as references to type ID 0, which does not itself appear in the dictionary. This choice is annoying in several ways, principally that it forces generators and consumers of CTF to grow special cases for types that are referenced in valid dicts but don't appear. Allow an alternative representation (which will become the only representation in format v4) whereby nonrepresentable types are encoded as actual types with kind CTF_K_UNKNOWN (an already-existing kind theoretically but not in practice used for padding, with value 0). This is backward-compatible, because CTF_K_UNKNOWN was not used anywhere before now: it was used in old-format function symtypetabs, but these were never emitted by any compiler and the code to handle them in libctf likely never worked and was removed last year, in favour of new-format symtypetabs that contain only type IDs, not type kinds. In order to link this type, we need an API addition to let us add types of unknown kind to the dict: we let them optionally have names so that GCC can emit many different unknown types and those types with identical names will be deduplicated together. There are also small tweaks to the deduplicator to actually dedup such types, to let opening of dicts with unknown types with names work, to return the ECTF_NONREPRESENTABLE error on resolution of such types (like ID 0), and to print their names as something useful but not a valid C identifier, mostly for the sake of the dumper. Tests added in the next commit. include/ChangeLog 2021-05-06 Nick Alcock <nick.alcock@oracle.com> * ctf.h (CTF_K_UNKNOWN): Document that it can be used for nonrepresentable types, not just padding. * ctf-api.h (ctf_add_unknown): New. libctf/ChangeLog 2021-05-06 Nick Alcock <nick.alcock@oracle.com> * ctf-open.c (init_types): Unknown types may have names. * ctf-types.c (ctf_type_resolve): CTF_K_UNKNOWN is as non-representable as type ID 0. (ctf_type_aname): Print unknown types. * ctf-dedup.c (ctf_dedup_hash_type): Do not early-exit for CTF_K_UNKNOWN types: they have real hash values now. (ctf_dedup_rwalk_one_output_mapping): Treat CTF_K_UNKNOWN types like other types with no referents: call the callback and do not skip them. (ctf_dedup_emit_type): Emit via... * ctf-create.c (ctf_add_unknown): ... this new function. * libctf.ver (LIBCTF_1.2): Add it.
2021-03-18libctf: a couple of small error-handling fixesNick Alcock1-9/+12
Out-of-memory errors initializing the string atoms table were disregarded (though they would have caused a segfault very shortly afterwards). Errors hashing types during deduplication were only reported if they happened on the output dict, which is almost never the case (most errors are going to be on the dict we're working over, which is going to be one of the inputs). (The error was detected in both cases, but the errno was extracted from the wrong dict.) libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-dedup.c (ctf_dedup_rhash_type): Report errors on the input dict properly. * ctf-open.c (ctf_bufopen_internal): Report errors initializing the atoms table.
2021-03-18libctf: eliminate dtd_u, part 1: int/float/sliceNick Alcock1-1/+1
This series eliminates a lot of special-case code to handle dynamic types (types added to writable dicts and not yet serialized). Historically, when such types have variable-length data in their final CTF representations, libctf has always worked by adding such types to a special union (ctf_dtdef_t.dtd_u) in the dynamic type definition structure, then picking the members out of this structure at serialization time and packing them into their final form. This has the advantage that the ctf_add_* code doesn't need to know anything about the final CTF representation, but the significant disadvantage that all code that looks up types in any way needs two code paths, one for dynamic types, one for all others. Historically libctf "handled" this by not supporting most type lookups on dynamic types at all until ctf_update was called to do a complete reserialization of the entire dict (it didn't emit an error, it just emitted wrong results). Since commit 676c3ecbad6e9c4, which eliminated ctf_update in favour of the internal-only ctf_serialize function, all the type-lookup paths grew an extra branch to handle dynamic types. We can eliminate this branch again by dropping the dtd_u stuff and simply writing out the vlen in (close to) its final form at ctf_add_* time: type lookup for types using this approach is then identical for types in writable dicts and types that are in read-only ones, and serialization is also simplified (we just need to write out the vlen we already created). The only complexity lies in type kinds for which multiple vlen representations are valid depending on properties of the type, e.g. structures. But we can start simple, adjusting ints, floats, and slices to work this way, and leaving everything else as is. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dtdef_t) <dtd_u.dtu_enc>: Remove. <dtd_u.dtu_slice>: Likewise. <dtd_vlen>: New. * ctf-create.c (ctf_add_generic): Perhaps allocate it. All callers adjusted. (ctf_dtd_delete): Free it. (ctf_add_slice): Use the dtd_vlen, not dtu_enc. (ctf_add_encoded): Likewise. Assert that this must be an int or float. * ctf-serialize.c (ctf_emit_type_sect): Just copy the dtd_vlen. * ctf-dedup.c (ctf_dedup_rhash_type): Use the dtd_vlen, not dtu_slice. * ctf-types.c (ctf_type_reference): Likewise. (ctf_type_encoding): Remove most dynamic-type-specific code: just get the vlen from the right place. Report failure to look up the underlying type's encoding.
2021-03-18libctf: fix GNU style for do {} whileNick Alcock1-32/+35
It's formatted like this: do { ... } while (...); Not like this: do { ... } while (...); or this: do { ... } while (...); We used both in various places in libctf. Fixing it necessitated some light reindentation. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-archive.c (ctf_archive_next): GNU style fix for do {} while. * ctf-dedup.c (ctf_dedup_rhash_type): Likewise. (ctf_dedup_rwalk_one_output_mapping): Likewise. * ctf-dump.c (ctf_dump_format_type): Likewise. * ctf-lookup.c (ctf_symbol_next): Likewise. * swap.h (swap_thing): Likewise.
2021-03-02libctf: minor error-handling fixesNick Alcock1-6/+11
A transient bug in the preceding change (fixed before commit) exposed a new failure, of ld/testsuite/ld-ctf/diag-parname.d. This attempts to ensure that if we link a dict with child type IDs but no attached parent, we get a suitable ECTF_NOPARENT error. This was happening before this commit, but only by chance, because ctf_variable_iter and ctf_variable_next check to see if the dict they're passed is a child dict without an associated parent. We forgot error-checking on the ctf_variable_next call, and as a result this was concealed -- and looking for the problem exposed a new bug. If any of the lookups beneath ctf_dedup_hash_type fail, the CTF link does *not* fail, but acts quite bizarrely, skipping the type but emitting an error to the CTF error/warning log -- so the linker will report an error, emit a partial CTF dict missing some types, and exit with exitcode 0 as if nothing went wrong. Since ctf_dedup_hash_type is never expected to fail in normal operation, this is surely wrong: failures at emission time do not emit partial CTF dicts, so failures at hashing time should not either. So propagate the error back up. Also fix a couple of smaller bugs where we fail to properly free things and/or propagate error codes on various rare link-time errors and out-of-memory conditions. libctf/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ctf-dedup.c (ctf_dedup): Pass on errors from ctf_dedup_hash_type. Call ctf_dedup_fini properly on other errors. (ctf_dedup_emit_type): Set the errno on dynhash insertion failure. * ctf-link.c (ctf_link_deduplicating_per_cu): Close outputs beyond output 0 when asserting because >1 output is found. (ctf_link_deduplicating): Likewise, when asserting because the shared output is not the same as the passed-in fp.
2021-03-02libctf: add a deduplicator-specific type mapping tableNick Alcock1-101/+97
When CTF linking is done, the linker has to track the association between types in the inputs and types in the outputs. The deduplicator does this via the cd_output_emission_hashes, which maps from hashes of types (valid in both the input and output) to the IDs of types in the specific dict in which the cd_emission_hashes is held. However, the nondeduplicating linker and ctf_add_type used a different mechanism, a dedicated hashtab stored in the ctf_link_type_mapping, populated via ctf_add_type_mapping and queried via the ctf_type_mapping function. To allow the same functions to be used for variable and symbol population in both the deduplicating and nondeduplicating linker, the deduplicator carefully transferred all its input->output mappings into this hashtab before returning. This is *expensive*. The number of entries in this hashtab scales as the number of input types, and unlike the hashing machinery the type mapping machinery (the only other thing which scales that way) has not been much optimized. Now the nondeduplicating linker is gone, we can throw this out, move the existing type mapping machinery to ctf-create.c and dedicate it to ctf_add_type alone, and add a new function ctf_dedup_type_mapping which uses the deduplicator's built-in knowledge of type mappings directly, without requiring an expensive repopulation phase. This speeds up a test link of nouveau.ko (a good worst-case candidate with a lot of types in each of a lot of input files) from 9.11s to 7.15s in my testing, a speedup of over 20%. libctf/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_link_type_mapping>: No longer used by the nondeduplicating linker. (ctf_add_type_mapping): Removed, now static. (ctf_type_mapping): Likewise. (ctf_dedup_type_mapping): New. (ctf_dedup_t) <cd_input_nums>: New. * ctf-dedup.c (ctf_dedup_init): Populate it. (ctf_dedup_fini): Free it again. Emphasise that this has to be the last thing called. (ctf_dedup): Populate it. (ctf_dedup_populate_type_mapping): Removed. (ctf_dedup_populate_type_mappings): Likewise. (ctf_dedup_emit): No longer call it. No longer call ctf_dedup_fini either. (ctf_dedup_type_mapping): New. * ctf-link.c (ctf_unnamed_cuname): New. (ctf_create_per_cu): Arguments must be non-null now. (ctf_in_member_cb_arg): Removed. (ctf_link): No longer populate it. No longer discard the mapping table. (ctf_link_deduplicating_one_symtypetab): Use ctf_dedup_type_mapping, not ctf_type_mapping. Use ctf_unnamed_cuname. (ctf_link_one_variable): Likewise. Pass in args individually: no longer a ctf_variable_iter callback. (empty_link_type_mapping): Removed. (ctf_link_deduplicating_variables): Use ctf_variable_next, not ctf_variable_iter. No longer pack arguments to ctf_link_one_variable into a struct. (ctf_link_deduplicating_per_cu): Call ctf_dedup_fini once all link phases are done. (ctf_link_deduplicating): Likewise. (ctf_link_intern_extern_string): Improve comment. (ctf_add_type_mapping): Migrate... (ctf_type_mapping): ... these functions... * ctf-create.c (ctf_add_type_mapping): ... here... (ctf_type_mapping): ... and make static, for the sole use of ctf_add_type.
2021-02-04libctf: always name nameless types "", never NULLNick Alcock1-4/+3
The ctf_type_name_raw and ctf_type_aname_raw functions, which return the raw, unadorned name of CTF types, have one unfortunate wrinkle: they return NULL not only on error but when returning the name of types without a name in writable dicts. This was unintended: it not only makes it impossible to reliably tell if a given call to ctf_type_name_raw failed (due to a bad string offset say), but also complicates all its callers, who now have to check for both NULL and "". The written-out form of CTF has no concept of a NULL pointer instead of a string: all null strings are strtab offset 0, "". So the more we can do to remove this distinction from the writable form, the less complex the rest of our code needs to be. Armour against NULL in multiple places, arranging to return "" from ctf_type_name_raw if offset 0 is passed in, and removing a risky optimization from ctf_str_add* that avoided doing anything if a NULL was passed in: this added needless irregularity to the functions' API surface, since "" and NULL should be treated identically, and in the case of ctf_str_add_ref, we shouldn't skip adding the passed-in REF to the list of references to be updated no matter what the content of the string happens to be. This means we can simplify the deduplicator a tiny bit, also fixing a bug (latent when used by ld) where if the input dict was writable, we failed to realise when types were nameless and could end up creating deeply unhelpful synthetic forwards with no name, which we just banned a few commits ago, so the link failed. libctf/ChangeLog 2021-01-27 Nick Alcock <nick.alcock@oracle.com> * ctf-string.c (ctf_str_add): Treat adding a NULL as adding "". (ctf_str_add_ref): Likewise. (ctf_str_add_external): Likewise. * ctf-types.c (ctf_type_name_raw): Always return "" for offset 0. * ctf-dedup.c (ctf_dedup_multiple_input_dicts): Don't armour against NULL name. (ctf_dedup_maybe_synthesize_forward): Likewise.
2021-01-05libctf, include: support unnamed structure members betterNick Alcock1-3/+3
libctf has no intrinsic support for the GCC unnamed structure member extension. This principally means that you can't look up named members inside unnamed struct or union members via ctf_member_info: you have to tiresomely find out the type ID of the unnamed members via iteration, then look in each of these. This is ridiculous. Fix it by extending ctf_member_info so that it recurses into unnamed members for you: this is still unambiguous because GCC won't let you create ambiguously-named members even in the presence of this extension. For consistency, and because the release hasn't happened and we can still do this, break the ctf_member_next API and add flags: we specify one flag, CTF_MN_RECURSE, which if set causes ctf_member_next to automatically recurse into unnamed members for you, returning not only the members themselves but all their contained members, so that you can use ctf_member_next to identify every member that it would be valid to call ctf_member_info with. New lookup tests are added for all of this. include/ChangeLog 2021-01-05 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (CTF_MN_RECURSE): New. (ctf_member_next): Add flags argument. libctf/ChangeLog 2021-01-05 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (struct ctf_next) <u.ctn_next>: Move to... <ctn_next>: ... here. * ctf-util.c (ctf_next_destroy): Unconditionally destroy it. * ctf-lookup.c (ctf_symbol_next): Adjust accordingly. * ctf-types.c (ctf_member_iter): Reimplement in terms of... (ctf_member_next): ... this. Support recursive unnamed member iteration (off by default). (ctf_member_info): Look up members in unnamed sub-structs. * ctf-dedup.c (ctf_dedup_rhash_type): Adjust ctf_member_next call. (ctf_dedup_emit_struct_members): Likewise. * testsuite/libctf-lookup/struct-iteration-ctf.c: Test empty unnamed members, and a normal member after the end. * testsuite/libctf-lookup/struct-iteration.c: Verify that ctf_member_count is consistent with the number of successful returns from a non-recursive ctf_member_next. * testsuite/libctf-lookup/struct-iteration-*: New, test iteration over struct members. * testsuite/libctf-lookup/struct-lookup.c: New test. * testsuite/libctf-lookup/struct-lookup.lk: New test.
2021-01-01Update year range in copyright notice of binutils filesAlan Modra1-1/+1
2020-11-20libctf, ld: properly deduplicate function typesNick Alcock1-5/+21
Some type kinds in CTF (functions, arrays, pointers, slices, and cvr-quals) are intrinsically nameless: the ctt_name field in the CTF is always zero, and the libctf API provides no way to set a name. But the compiler can and does sometimes set names for some of these kinds: in particular, the name it sets on CTF_K_FUNCTION types is the means it uses to force the name of the function into the string table so that it can point at it from the function info section. So null out the name at hashing time so that the deduplicator can correctly detect that e.g. function types identical but for name should be considered truly identical, since they will not have a name when the deduplicator re-emits them into the output. ld/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * testsuite/ld-ctf/data-func-conflicted.d: Shrink the expected size of the type section now that function types are being deduplicated properly. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-dedup.c (ctf_dedup_rhash_type): Null out the names of nameless type kinds, just in case the input has named them.
2020-11-20libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_tNick Alcock1-87/+87
The naming of the ctf_file_t type in libctf is a historical curiosity. Back in the Solaris days, CTF dictionaries were originally generated as a separate file and then (sometimes) merged into objects: hence the datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw CTF is essentially never written to a file on its own, and the datatype changed name to a "CTF dictionary" years ago. So the term "CTF file" refers to something that is never a file! This is at best confusing. The type has also historically been known as a 'CTF container", which is even more confusing now that we have CTF archives which are *also* a sort of container (they contain CTF dictionaries), but which are never referred to as containers in the source code. So fix this by completing the renaming, renaming ctf_file_t to ctf_dict_t throughout, and renaming those few functions that refer to CTF files by name (keeping compatibility aliases) to refer to dicts instead. Old users who still refer to ctf_file_t will see (harmless) pointer-compatibility warnings at compile time, but the ABI is unchanged (since C doesn't mangle names, and ctf_file_t was always an opaque type) and things will still compile fine as long as -Werror is not specified. All references to CTF containers and CTF files in the source code are fixed to refer to CTF dicts instead. Further (smaller) renamings of annoyingly-named functions to come, as part of the process of souping up queries across whole archives at once (needed for the function info and data object sections). binutils/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t. (dump_ctf_archive_member): Likewise. (dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close. * readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t. (dump_ctf_archive_member): Likewise. (dump_section_as_ctf): Likewise. Use ctf_dict_close, not ctf_file_close. gdb/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctfread.c: Change uses of ctf_file_t to ctf_dict_t. (ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_file_t): Rename to... (ctf_dict_t): ... this. Keep ctf_file_t around for compatibility. (struct ctf_file): Likewise rename to... (struct ctf_dict): ... this. (ctf_file_close): Rename to... (ctf_dict_close): ... this, keeping compatibility function. (ctf_parent_file): Rename to... (ctf_parent_dict): ... this, keeping compatibility function. All callers adjusted. * ctf.h: Rename references to ctf_file_t to ctf_dict_t. (struct ctf_archive) <ctfa_nfiles>: Rename to... <ctfa_ndicts>: ... this. ld/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ldlang.c (ctf_output): This is a ctf_dict_t now. (lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t. (ldlang_open_ctf): Adjust comment. (lang_merge_ctf): Use ctf_dict_close, not ctf_file_close. * ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to ctf_dict_t. Change opaque declaration accordingly. * ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust. * ldemul.h (examine_strtab_for_ctf): Likewise. (ldemul_examine_strtab_for_ctf): Likewise. * ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations adjusted. (ctf_fileops): Rename to... (ctf_dictops): ... this. (ctf_dedup_t) <cd_id_to_file_t>: Rename to... <cd_id_to_dict_t>: ... this. (ctf_file_t): Fix outdated comment. <ctf_fileops>: Rename to... <ctf_dictops>: ... this. (struct ctf_archive_internal) <ctfi_file>: Rename to... <ctfi_dict>: ... this. * ctf-archive.c: Rename ctf_file_t to ctf_dict_t. Rename ctf_archive.ctfa_nfiles to ctfa_ndicts. Rename ctf_file_close to ctf_dict_close. All users adjusted. * ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers. (ctf_bundle_t) <ctb_file>: Rename to... <ctb_dict): ... this. * ctf-decl.c: Rename ctf_file_t to ctf_dict_t. * ctf-dedup.c: Likewise. Rename ctf_file_close to ctf_dict_close. Refer to CTF dicts, not CTF containers. * ctf-dump.c: Likewise. * ctf-error.c: Likewise. * ctf-hash.c: Likewise. * ctf-inlines.h: Likewise. * ctf-labels.c: Likewise. * ctf-link.c: Likewise. * ctf-lookup.c: Likewise. * ctf-open-bfd.c: Likewise. * ctf-string.c: Likewise. * ctf-subr.c: Likewise. * ctf-types.c: Likewise. * ctf-util.c: Likewise. * ctf-open.c: Likewise. (ctf_file_close): Rename to... (ctf_dict_close): ...this. (ctf_file_close): New trivial wrapper around ctf_dict_close, for compatibility. (ctf_parent_file): Rename to... (ctf_parent_dict): ... this. (ctf_parent_file): New trivial wrapper around ctf_parent_dict, for compatibility. * libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-08-27libctf, binutils, include, ld: gettextize and improve error handlingNick Alcock1-146/+144
This commit follows on from the earlier commit "libctf, ld, binutils: add textual error/warning reporting for libctf" and converts every error in libctf that was reported using ctf_dprintf to use ctf_err_warn instead, gettextizing them in the process, using N_() where necessary to avoid doing gettext calls unless an error message is actually generated, and rephrasing some error messages for ease of translation. This requires a slight change in the ctf_errwarning_next API: this API is public but has not been in a release yet, so can still change freely. The problem is that many errors are emitted at open time (whether opening of a CTF dict, or opening of a CTF archive): the former of these throws away its incompletely-initialized ctf_file_t rather than return it, and the latter has no ctf_file_t at all. So errors and warnings emitted at open time cannot be stored in the ctf_file_t, and have to go elsewhere. We put them in a static local in ctf-subr.c (which is not very thread-safe: a later commit will improve things here): ctf_err_warn with a NULL fp adds to this list, and the public interface ctf_errwarning_next with a NULL fp retrieves from it. We need a slight exception from the usual iterator rules in this case: with a NULL fp, there is nowhere to store the ECTF_NEXT_END "error" which signifies the end of iteration, so we add a new err parameter to ctf_errwarning_next which is used to report such iteration-related errors. (If an fp is provided -- i.e., if not reporting open errors -- this is optional, but even if it's optional it's still an API change. This is actually useful from a usability POV as well, since ctf_errwarning_next is usually called when there's been an error, so overwriting the error code with ECTF_NEXT_END is not very helpful! So, unusually, ctf_errwarning_next now uses the passed fp for its error code *only* if no errp pointer is passed in, and leaves it untouched otherwise.) ld, objdump and readelf are adapted to call ctf_errwarning_next with a NULL fp to report open errors where appropriate. The ctf_err_warn API also has to change, gaining a new error-number parameter which is used to add the error message corresponding to that error number into the debug stream when LIBCTF_DEBUG is enabled: changing this API is easy at this point since we are already touching all existing calls to gettextize them. We need this because the debug stream should contain the errno's message, but the error reported in the error/warning stream should *not*, because the caller will probably report it themselves at failure time regardless, and reporting it in every error message that leads up to it leads to a ridiculous chattering on failure, which is likely to end up as ridiculous chattering on stderr (trimmed a bit): CTF error: `ld/testsuite/ld-ctf/A.c (0): lookup failure for type 3: flags 1: The parent CTF dictionary is unavailable' CTF error: `ld/testsuite/ld-ctf/A.c (0): struct/union member type hashing error during type hashing for type 80000001, kind 6: The parent CTF dictionary is unavailable' CTF error: `deduplicating link variable emission failed for ld/testsuite/ld-ctf/A.c: The parent CTF dictionary is unavailable' ld/.libs/lt-ld-new: warning: CTF linking failed; output will have no CTF section: `The parent CTF dictionary is unavailable' We only need to be told that the parent CTF dictionary is unavailable *once*, not over and over again! errmsgs are still emitted on warning generation, because warnings do not usually lead to a failure propagated up to the caller and reported there. Debug-stream messages are not translated. If translation is turned on, there will be a mixture of English and translated messages in the debug stream, but rather that than burden the translators with debug-only output. binutils/ChangeLog 2020-08-27 Nick Alcock <nick.alcock@oracle.com> * objdump.c (dump_ctf_archive_member): Move error- reporting... (dump_ctf_errs): ... into this separate function. (dump_ctf): Call it on open errors. * readelf.c (dump_ctf_archive_member): Move error- reporting... (dump_ctf_errs): ... into this separate function. Support calls with NULL fp. Adjust for new err parameter to ctf_errwarning_next. (dump_section_as_ctf): Call it on open errors. include/ChangeLog 2020-08-27 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_errwarning_next): New err parameter. ld/ChangeLog 2020-08-27 Nick Alcock <nick.alcock@oracle.com> * ldlang.c (lang_ctf_errs_warnings): Support calls with NULL fp. Adjust for new err parameter to ctf_errwarning_next. Only check for assertion failures when fp is non-NULL. (ldlang_open_ctf): Call it on open errors. * testsuite/ld-ctf/ctf.exp: Always use the C locale to avoid breaking the diags tests. libctf/ChangeLog 2020-08-27 Nick Alcock <nick.alcock@oracle.com> * ctf-subr.c (open_errors): New list. (ctf_err_warn): Calls with NULL fp append to open_errors. Add err parameter, and use it to decorate the debug stream with errmsgs. (ctf_err_warn_to_open): Splice errors from a CTF dict into the open_errors. (ctf_errwarning_next): Calls with NULL fp report from open_errors. New err param to report iteration errors (including end-of-iteration) when fp is NULL. (ctf_assert_fail_internal): Adjust ctf_err_warn call for new err parameter: gettextize. * ctf-impl.h (ctfo_get_vbytes): Add ctf_file_t parameter. (LCTF_VBYTES): Adjust. (ctf_err_warn_to_open): New. (ctf_err_warn): Adjust. (ctf_bundle): Used in only one place: move... * ctf-create.c: ... here. (enumcmp): Use ctf_err_warn, not ctf_dprintf, passing the err number down as needed. Don't emit the errmsg. Gettextize. (membcmp): Likewise. (ctf_add_type_internal): Likewise. (ctf_write_mem): Likewise. (ctf_compress_write): Likewise. Report errors writing the header or body. (ctf_write): Likewise. * ctf-archive.c (ctf_arc_write_fd): Use ctf_err_warn, not ctf_dprintf, and gettextize, as above. (ctf_arc_write): Likewise. (ctf_arc_bufopen): Likewise. (ctf_arc_open_internal): Likewise. * ctf-labels.c (ctf_label_iter): Likewise. * ctf-open-bfd.c (ctf_bfdclose): Likewise. (ctf_bfdopen): Likewise. (ctf_bfdopen_ctfsect): Likewise. (ctf_fdopen): Likewise. * ctf-string.c (ctf_str_write_strtab): Likewise. * ctf-types.c (ctf_type_resolve): Likewise. * ctf-open.c (get_vbytes_common): Likewise. Pass down the ctf dict. (get_vbytes_v1): Pass down the ctf dict. (get_vbytes_v2): Likewise. (flip_ctf): Likewise. (flip_types): Likewise. Use ctf_err_warn, not ctf_dprintf, and gettextize, as above. (upgrade_types_v1): Adjust calls. (init_types): Use ctf_err_warn, not ctf_dprintf, as above. (ctf_bufopen_internal): Likewise. Adjust calls. Transplant errors emitted into individual dicts into the open errors if this turns out to be a failed open in the end. * ctf-dump.c (ctf_dump_format_type): Adjust ctf_err_warn for new err argument. Gettextize. Don't emit the errmsg. (ctf_dump_funcs): Likewise. Collapse err label into its only case. (ctf_dump_type): Likewise. * ctf-link.c (ctf_create_per_cu): Adjust ctf_err_warn for new err argument. Gettextize. Don't emit the errmsg. (ctf_link_one_type): Likewise. (ctf_link_lazy_open): Likewise. (ctf_link_one_input_archive): Likewise. (ctf_link_deduplicating_count_inputs): Likewise. (ctf_link_deduplicating_open_inputs): Likewise. (ctf_link_deduplicating_close_inputs): Likewise. (ctf_link_deduplicating): Likewise. (ctf_link): Likewise. (ctf_link_deduplicating_per_cu): Likewise. Add some missed ctf_set_errnos to obscure error cases. * ctf-dedup.c (ctf_dedup_rhash_type): Adjust ctf_err_warn for new err argument. Gettextize. Don't emit the errmsg. (ctf_dedup_populate_mappings): Likewise. (ctf_dedup_detect_name_ambiguity): Likewise. (ctf_dedup_init): Likewise. (ctf_dedup_multiple_input_dicts): Likewise. (ctf_dedup_conflictify_unshared): Likewise. (ctf_dedup): Likewise. (ctf_dedup_rwalk_one_output_mapping): Likewise. (ctf_dedup_id_to_target): Likewise. (ctf_dedup_emit_type): Likewise. (ctf_dedup_emit_struct_members): Likewise. (ctf_dedup_populate_type_mapping): Likewise. (ctf_dedup_populate_type_mappings): Likewise. (ctf_dedup_emit): Likewise. (ctf_dedup_hash_type): Likewise. Fix a bit of messed-up error status setting. (ctf_dedup_rwalk_one_output_mapping): Likewise. Don't hide unknown-type-kind messages (which signify file corruption).
2020-07-22libctf, dedup: add deduplicatorNick Alcock1-0/+3155
This adds the core deduplicator that the ctf_link machinery calls (possibly repeatedly) to link the CTF sections: it takes an array of input ctf_file_t's and another array that indicates which entries in the input array are parents of which other entries, and returns an array of outputs. The first output is always the ctf_file_t on which ctf_link/ctf_dedup/etc was called: the other outputs are child dicts that have the first output as their parent. include/ * ctf-api.h (CTF_LINK_SHARE_DUPLICATED): No longer unimplemented. libctf/ * ctf-impl.h (ctf_type_id_key): New, the key in the cd_id_to_file_t. (ctf_dedup): New, core deduplicator state. (ctf_file_t) <ctf_dedup>: New. <ctf_dedup_atoms>: New. <ctf_dedup_atoms_alloc>: New. (ctf_hash_type_id_key): New prototype. (ctf_hash_eq_type_id_key): Likewise. (ctf_dedup_atoms_init): Likewise. * ctf-hash.c (ctf_hash_eq_type_id_key): New. (ctf_dedup_atoms_init): Likewise. * ctf-create.c (ctf_serialize): Adjusted. (ctf_add_encoded): No longer static. (ctf_add_reftype): Likewise. * ctf-open.c (ctf_file_close): Destroy the ctf_dedup_atoms_alloc. * ctf-dedup.c: New file. * ctf-decls.h [!HAVE_DECL_STPCPY]: Add prototype. * configure.ac: Check for stpcpy. * Makefile.am: Add it. * Makefile.in: Regenerate. * config.h.in: Regenerate. * configure: Regenerate.