aboutsummaryrefslogtreecommitdiff
path: root/libctf
AgeCommit message (Collapse)AuthorFilesLines
2025-06-27libctf: add root-visibility-addition testusers/nalcock/road-to-ctfv4Nick Alcock2-0/+39
libctf/ * testsuite/libctf-writable/ctf-nonroot-addition.*: New test.
2025-06-27libctf: create: check the right root-visible flag when adding enumerandsNick Alcock1-1/+1
The root-visible flag we're dealing with here is directly out of the dict, not a flag passed in to the API, so it does not have the values CTF_ADD_ROOT or CTF_ADD_NONROOT: instead it's simply zero for non-root-visible, nonzero otherwise. Fix the test. libctf/ * ctf-create.c (ctf_add_enumerator): Fix root-visibility test.
2025-06-27libctf: create: addition of non-root types should not return root typesNick Alcock1-3/+3
If you add a non-root type to a dict, you should always get a new, unique type ID back, even if a root-visible type with the same name already exists. Unfortunately, if the root-visible type is a forward, and you're adding a non-root-visible struct, union, or enum, the machinery to detect forwards and promote them to the concrete type fires in this case and returns the root-visible type! If this is an enum being inserted hidden because its enumerands conflict with some other enum, this will lead to failure later on: in any case, it's seriously counterintuitive to add a non-root- visible type and get a root-visible one instead. Fix this by checking the root-visible flag properly and only checking for forwards if this type is root-visible. (This may lead to a certain degree of proliferation of non-root-visible forwards: we can add a cleanup pass for those later if needed.) libctf/ * ctf-create.c (ctf_add_sou_sized): Check the root-visible flag when doing forward promotion. (ctf_add_enum_internal): Likewise. (ctf_add_enum_encoded_internal): Likewise.
2025-06-26libctf: use __attribute__((__gnu_printf__)) where appropriateNick Alcock1-0/+5
We don't use any GNU-specific printf args, but this prevents warnings about %z, observed on MinGW even though every libc anyone is likely to use there supports %z perfectly well, and we're not stopping using it just because MinGW complains. Doing this means we stand more chance of seeing *actual* problems on such platforms without them being drowned in noise. We turn this off on clang, which doesn't support __gnu_printf__. Suggested by Eli Zaretskii. libctf/ PR libctf/31863 * ctf-impl.h (_libctf_printflike_): Use __gnu_printf__.
2025-06-26libctf, dedup: reclaim space wasted by duplicate hidden typesNick Alcock1-12/+30
In normal deduplicating links, we insert every type (identified by its unique hash) precisely once. But conflicting types appear in multiple dicts, so for those, we loop, inserting them into every target dict in turn (each corresponding to an input dict that type appears in). But in cu-mapped links, some of those dicts may have been merged into one: now that we are hiding duplicate conflicting types more aggressively in such links, we are getting duplicate identical hidden types turning up in large numbers. Fix this by eliminating them in cu-mapping phase 1 (the phase in which this merging takes place), by checking to see if a type with this hash has already been inserted in this dict and skipping it if so. This is redundant and a waste of time in other cu-mapping phases and in normal links, but in cu-mapped links it saves a few tens to hundreds of kilobytes in kernel-sized links. libctf/ PR libctf/33047 * ctf-dedup.c (ctf_dedup_emit_type): Check for already-emitted types in cu-mapping phase 1.
2025-06-26libctf: dedup: preserve non-root flag across normal linksNick Alcock6-68/+353
The previous commits dropped preservation of the non-root flag in ctf_link and arranged to use it somewhat differently to track conflicting types in cu-mapped CUs when doing cu-mapped links. This was necessary to prevent entirely spuriously hidden types from appearing on the output of such links. Bring it (and the test for it) back. The problem with the previous design was that it implicitly assumed that the non-root flag it saw on the input was always meant to be preserved (when in the final phase of cu-mapped links it merely means that conflicting types were found in intermediate links), and also that it could figure out what the non-root flag on the input was by sucking in the non-root flag of the input type corresponding to an output in the output mapping (which maps type hashes to a corresponding type on some input). This method of getting properties of the input type *does* work *if* that property was one of those hashed by the ctf_dedup_hash_type process. In that case, every type with a given hash will have the same value for all hashed-in properties, so it doesn't matter which one is consulted (the output mapping points at an arbitrary one of those input types). But the non-root flag is explicitly *not* hashed in: as a comment in ctf_dedup_rhash_type notes, being non-root is not a property of a type, and two types (one non-root, one not) can perfectly well be the same type even though one is visible and one isn't. So just copying the non-root flag from the output mapping's idea of the input type will copy in a value that is not stabilized by the hash, so is more-or-less random! So we cannot do that. We have to do something else, which means we have to decide what to do if two identical types with different nonroot flag values pop up. The most sensible thing to do is probably to say that if all instances of a type are non-root-visible, the linked output should also be non-root-visible: any root-visible types in that set, and the output type is root-visible again. We implement this with a new cd_nonroot_consistency dynhash, which maps type hashes to the value 0 ("all instances root-visible"), 1 ("all instances non-root-visible") or 2 ("inconsistent"). After hashing is over, we save a bit of memory by deleting everything from this hashtab that doesn't have a value of 1 ("non-root-visible"), then use this to decide whether to emit any given type as non-root-visible or not. However... that's not quite enough. In cu-mapped links, we want to disregard this whole thing because we just hide everything -- but in phase 2, when we take the smushed-together CUs resulting from phase 1 and deduplicate them against each other, we want to do what the previous commits implemented and ignore the non-root flag entirely, instead falling back to preventing clashes by hiding anything that would be considered conflicting. We extend the existing cu_mapped parameter to various bits of ctf_dedup so that it is now tristate: 0 means a normal link, 1 means the smush-it- together phase of cu-mapped links, and 2 means the final phase of cu-mapped links. We do the hide-conflicting stuff only in phase 2, meaning that normal links by GNU ld can always respect the value of the nonroot flag put on types in the input. (One extra thing added as part of this: you can now efficiently delete the last value returned by ctf_dynhash_next() by calling ctf_dynhash_next_remove.) We bring back the ctf-nonroot-linking test with one tweak: linking now works on mingw as long as you're using the ucrt libc, so re-enable it for better test coverage on that platform. libctf/ PR libctf/33047 * ctf-hash.c (ctf_dynhash_next_remove): New. * ctf-impl.h (struct ctf_dedup) [cd_nonroot_consistency]: New. * ctf-link.c (ctf_link_deduplicating): Differentiate between cu-mapped and non-cu-mapped links, even in the final phase. * ctf-dedup.c (ctf_dedup_hash_type): Callback prototype addition. Get the non-root flag and pass it down. (ctf_dedup_rhash_type): Callback prototype addition. Document restrictions on use of the nonroot flag. (ctf_dedup_populate_mappings): Populate cd_nonroot_consistency. (ctf_dedup_hash_type_fini): New function: delete now-unnecessary values from cd_nonroot_consistency. (ctf_dedup_init): Initialize it. (ctf_dedup_fini): Destroy it. (ctf_dedup): cu_mapping is now cu_mapping_phase. Call ctf_dedup_hash_type_fini. (ctf_dedup_emit_type): Use cu_mapping_phase and cd_nonroot_consistency to propagate the non-root flag into outputs for normal links, and to do name-based conflict checking only for phase 2 of cu-mapped links. (ctf_dedup_emit): cu_mapping is now cu_mapping_phase. Adjust assertion accordingly. * testsuite/libctf-writable/ctf-nonroot-linking.c: Bring back. * testsuite/libctf-writable/ctf-nonroot-linking.lk: Likewise.
2025-06-25libctf: dedup: improve hiding of conflicting types in the same dictNick Alcock1-10/+44
If types are conflicting, they are usually moved into separate child dicts -- but not always. If they are added to the same dict by the cu-mapping mechanism (as used e.g. for multi-TU kernel modules), we can easily end up adding multiple conflicting types with the same name to the same dict. The mechanism used for turning on the non-root-visible flag in order to do this had a kludge attached which always hid types with the same name, whether or not they were conflicting. This is unnecessary and can hide types that should not be hidden, as well as hiding bugs. Remove it, and replace it with two different approaches: - for everything but cu-mapped links (the in-memory first phase of a link with ctf_link_add_cu_mapping in force), check for duplicate names if types are conflicting, and mark them as hidden if the names are found. This will never happen in normal links (in an upcoming commit we will suppress doing even this much in such cases). - for cu-mapped links, the only case that merges multiple distinct target dicts into one, we apply a big hammer and simply hide everything! The non-root flag will be ignored in the next link phase anyway (which dedups the cu-mapped pieces against each other), and this way we can be sure that merging multiple types cannot incur name clashes at this stage. The result seems to work: the only annoyance is that when enums with conflicting enumerators are found in a single cu-mapped child (so, really multiple merged children), you may end up with every instance of that enum being hidden for reasons of conflictingness. I don't see a real way to avoid that. libctf/ PR libctf/33047 * ctf-dedup.c (ctf_dedup_emit_type): Only consider non conflicting types. Improve type hiding in the presence of clashing enumerators. Hide everything when doing a cu-mapped link: they will be unhidden by the next link pass if nonconflicting.
2025-06-25Revert "libctf: fix linking of non-root-visible types"Nick Alcock3-136/+2
This reverts commit 87b2f673102884d7c69144c85a26ed5dbaa4f86a. It is based on a misconception, that hidden types in the deduplicator input should always be hidden in the output. For cu-mapped links, and final links following cu-mapped links, this is not true: we want to hide inputs if they were conflicting on the output and no more. We will reintroduce the testcase once a better fix is found. libctf/ PR libctf/33047 * ctf-dedup.c (ctf_dedup_emit_type): Don't respect the nonroot flag. * testsuite/libctf-writable/ctf-nonroot-linking.c: Removed. * testsuite/libctf-writable/ctf-nonroot-linking.lk: Removed.
2025-05-20libctf: testsuite fixes for datasec size changesNick Alcock13-15/+15
2025-05-20libctf: archive, open: when opening, always set errp to somethingNick Alcock4-2/+28
ctf_arc_import_parent, called by the cached-opening machinery used by ctf_archive_next and archive-wide lookup functions like ctf_arc_lookup_symbol, has an err-pointer parameter like all other opening functions. Unfortunately it unconditionally initializes it whenever provided, even if there was no error, which can lead to its being initialized to an uninitialized value. This is not technically an API-contract violation, since we don't define what happens to the error value except when an error happens, but it is still unpleasant. Initialize it only when there is an actual error, so we never initialize it to an uninitialized value. While we're at it, improve all the opening pathways: on success, set errp to 0, rather than leaving it what it was, reducing the likelihood of uninitialized error param returns in callers too. (This is inconsistent with the treatment of ctf_errno(), but the err value being a parameter passed in from outside makes the divergence acceptable: in open functions, you're never going to be overwriting some old error value someone might want to keep around across multiple calls, some of which are successful and some of which are not.) Soup up existing tests to verify all this. Thanks to Bruce McCulloch for the original patch, and Stephen Brennan for the report. libctf/ PR libctf/32903 * ctf-archive.c (ctf_arc_open_internal): Zero errp on success. (ctf_dict_open_sections): Zero errp at the start. (ctf_arc_import_parent): Intialize err. * ctf-open.c (ctf_bufopen): Zero errp at the start. * testsuite/libctf-lookup/add-to-opened.c: Make sure one-element archive opens update errp. * testsuite/libctf-writable/ctf-compressed.c: Make sure real archive opens update errp.
2025-04-25libctf: spec: be more specific about Solaris CTF versionsNick Alcock1-2/+2
Solaris has a CTFv3 now, modelled on FreeBSD's: be explicit that we are derived from Solaris CTFv2, not v3. (The spec is not updated for CTFv4/BTF at all yet.)
2025-04-25libctf: API change documentation (NOT FOR UPSTREAMING)Nick Alcock1-0/+148
These probably need to be turned into libctf/NEWS content once we decide (if we decide) that these changes are good. (I do hope we don't make too many changes because it'll be horribly disruptive, but I wouldn't be surprised to see a few...)
2025-04-25libctf: by-kind testsNick Alcock34-0/+486
These tiny testcases test opening-and-dumping of single type kinds, and also linking and then opening-and-dumping.
2025-04-25libctf: run_lookup_test: force BTF emission (NOT FOR UPSTREAMING)Nick Alcock1-2/+2
Pro tem as a hack until GCC supports -gctf for v4, or v3 upgrading is supported, or direct CTF-then-BTF tests are written, just emit BTF for test purposes. This breaks most of the tests: DO NOT UPSTREAM.
2025-04-25libctf: run_lookup_test: support per-test optionsNick Alcock1-2/+12
This lets you say e.g. run_lookup_test [file rootname $ctf_test ] {link: on} to turn on the {link: on} option for all tests, as if specified in every test file.
2025-04-25libctf: dump: dump conflicting CUs, when declaredNick Alcock1-1/+13
2025-04-25libctf: dump: dump struct-based bitfieldsNick Alcock1-1/+15
2025-04-25libctf: dump: dump variables and datasecsNick Alcock1-7/+87
2025-04-25libctf: dump: dump the header; dump enum64s; adapt to API changesNick Alcock1-93/+190
A bunch of dumper changes. Most importantly, adapt to the changes in the _f iteration function prototypes by no longer carrying around our own cds_fp dict pointer everywhere but just using the one we are given by the iteration function. But also, dump the v3 and v4/BTF headers separately, using the stored original v3-pre-upgrade header copy if present. The v3 dumper is not tested yet, of course, but is more or less unchanged from the old code, so probably nearly works. The v4 dumper is tested. Add enum64 support (basically just a bit of extra code to print the signedness of enums).
2025-04-25libctf: archive: allow opening BTF dicts in archives (not for upstreaming)Nick Alcock1-3/+23
BTF dicts are normally suppressed in archives, but it is possible to create them with enough cunning. If such an archive is encountered, the BTF dicts in it have no parent name, which means that ctf_arc_import_parent (used by ctf_dict_open_cached, ctf_archive_next, and all the ctf_arc_lookup functions) fails to figure out what parent to import, and fails. Kludge around it by relying on our secret knowledge that ctf_link_write always emits the parent dict into the archive first. If no name is set, import the parent dict for now. (Before upstreaming, a new archive format with a dedicated parent dict field will turn up, obviating this kludge.)
2025-04-25libctf: archive: fix ctf_dict_open_cached error handlingNick Alcock1-6/+7
We were misreporting a failure to ctf_dict_open the dict as an out-of-memory error.
2025-04-25libctf: link: improve BTF child dict namingNick Alcock1-0/+9
BTF dicts don't have a cuname, which means that when the deduplicator runs over them any child dicts that result from conflicted types found in those CUs end up with no name either. Detect such unnamed dicts and propagate in the name the linker gave them at input time instead. (There is always *some* such name, even if it's something totally useless like "#1"; usually it's much more useful.)
2025-04-25libctf: ctf-link: minor comment improvementsNick Alcock2-6/+10
2025-04-25libctf: dedup: conflicting CU names and merging into the parentNick Alcock2-5/+22
The last two dedup changes are, firstly, to use ctf_add_conflicting() to arrange that conflicting types that are hidden because they are added to the same dict as the types they conflict with (e.g. conflicting types in modules) are properly marked with the CU name that the type comes from. This could of course not be done with the old non-root flag, but now that we have proper prefix types, we can record it, and consumers can find out what CU any type comes from via ctf_type_conflicting (or, for non-kernel CTF generated by GNU ld, via the ctf_cuname of the per-cu dict). Secondly, we add a new kind of CU mapping for cu-mapped (two-stage) links (as a reminder, these carry out a second stage of dedupping in which they squash specific CUs down to a named set of child dicts, fusing named inputs into particular named outputs: the kernel linker uses this to make child dicts that represent modules rather than translation units). You can now map any CU name to "" (the null string). This indicates that types that would land in the CU in question should not be emitted into any sort of per-module dict but should instead just be emitted into the shared dict, possibly being marked conflicting as they do so. The usual popcount mechanism will be used to pick the type which is left unhidden. The usual forwarding stubs you would expect to find for conflicting structs and unions will not be emitted: instead, real structs and unions will take their place. Consumers must take care when chasing parent types that point to tagged structs to make sure that there isn't a correspondingly-named struct in the child they're looking at (but this is generally a problem with type chasing in children anyway, which I have a TODO open to find some sort of solution to: this should be being done automatically, and isn't).
2025-04-25libctf: dedup: decl tag support.Nick Alcock2-53/+405
Decl tags to types and to functions and function arguments are relatively straightforward, as are decl tags to structures as a whole or to members of untagged structures; but decl tags to specific members of tagged structs and unions have two separate nasty problems, entirely down to the use of tagged structures to break cycles in the type graph. The first is that we have to mark decl tags conflicting if their associated struct is conflicting, but traversal from types to their parents halts at tagged structs and unions, because the type graph is sharded via stubs at those points and conflictedness ceases. But we don't want to do that here: a decl_tag to member 10 of some struct is only valid if that struct *has* ten members, and if the struct is conflicted, some may have only one. The decl tag is only valid for the specific struct-with-ten-members it was originally pointing at, anyway: other structs-with-ten-members may have entirely different members there, which are not tagged or which are tagged with something else. So we track this by keeping track of the only thing that is knowable about struct/union stubs: their decorated name. The citers graph gains mappings from decorated SoU names to decl tags (where the decl tag has a component_idx), and conflictedness marking chases that and marks accordingly, via the new ctf_dedup_mark_conflicting_hash_citers. The second problem is that we have to emit decl tags to struct members of all kinds after the members are emitted, but the members are emitted later than core type deduplication because they might refer to any types in the dict, including types added after the struct was added. So we need to accumulate decl tags to struct members in a new hashtab (cd_emission_struct_decl_tags) and add yet *another* pass that traverses that and emits all the decl tags in it. (If it turns out that decl tags to other things can similarly appear before the type they refer to, we'll either have to sort them earlier or emit them at the end as well -- but this seems unlikely.) None of this complexity is properly tested, because we're not yet emitting decl tags (as far as I know). But at least it doesn't break anything else, and it's somewhere to start.
2025-04-25libctf: dedup: type tagsNick Alcock1-0/+15
Another trivial case: they're just like pointers except that they have a name (and we don't need to care about that, because names are hashed in, if present, anyway).
2025-04-25libctf: dedup: datasecs and varsNick Alcock2-19/+376
These are a bit trickier than previous things. Datasecs are unusual: the content they contain for a given variable is conceptually part of that variable, in that a variable can only appear in one datasec: so if two TUs have different datasec values for a variable, you'll want to emit two conflicting variables with different datasec entries. Equally, if they have entries in different datasecs, they're conflicting. But the *index* of a variable in a datasec has nothing to do with the variable: it's just a property of how many other variables are in the datasec. So we turn the type graph upside down for them. We track the variable -> datasec mappings for every variable we are dedupping, and use this to hash variables with datasec entries *twice*: firstly, as purely variable type, name, and promoted-to-non-extern linkage, and secondly with all of that plus the datasec name, offset and size: we indicate that the non-extern hash *replaces* the extern one, and use this later on. The datasec itself is not hashed at all! We skip it at both hashing and emission time (without breaking anything else, because nothing points at datasecs, so nothing will ever recurse down into one). The popcount code (used to find the "most popular" type, the one to put in the shared dict) changes to say that replaced types (extern vars) popcounts are added to the counts of the types that replace them (the corresponding non-extern vars). At emission time, replaced variables (extern variables) are skipped, ensuring that extern vars with non-conflicting non-extern counterparts are skipped in favour of the non-extern ones. ctf_add_section_variable then takes care of emitting both the var and its corresponding datasec for us.
2025-04-25libctf: dedup: structs with bitfields, BTF floatsNick Alcock1-3/+35
The last two trivial cases. Hash in the bitfieldness of structs and the bit-width of members (their bit-offset is already being hashed in), and emit them accordingly. BTF floats hardly have any state: emitting them is even easier.
2025-04-25libctf: dedup: enums, enum64s, functions, func linkageNick Alcock2-14/+192
These are all fairly simple and are handled together because some of the diffs are annoyingly entwined. enum and enum64 are trivial: it's just like enums used to be, except that we hash in the unsignedness value, and emit signed or unsigned enums or enum64s appropriately. (The signedness stuff on the emission side is fairly invisible: it's automatically handled for us by ctf_type_encoding and ctf_add_enum*_encoded, via the CTF_INT_SIGNED encoding.) Functions are also fairly simple: we hash in all the parameter names as well as the args, and emit them accordingly. Linkage is more difficult. We want to deduplicate extern and non-extern declarations together, while leaving static ones separate. We do this by promoting extern linkage to global at hashing time, and maintaining a cd_linkages hashmap which maps from type hash values of func linkages (and vars) to the best linkage known so far, then updating it if a better one ("less extern") comes along (relying on the fact that we are already unifying the hashes of otherwise-identical extern and non-extern types). At emission time, we use this hashtab to figure out what linkage to emit.
2025-04-25libctf: dedup: comment fixes, debug indentation changes, and a tiny leakNick Alcock1-41/+41
Getting these out of the way to avoid them wrecking the diffs for the next commits.
2025-04-25libctf: dedup: fix a broken error path in string dedupNick Alcock1-1/+1
If we run out of memory updating the string counts, set the right errno: ctf_dynhash_insert returns a *negative* error value, and we want a positive one in the ctf_errno.
2025-04-25libctf: dedup: chase API changes: use the public API moreNick Alcock1-25/+41
To get ready for the deduplicator changes, we chase the API changes to things like ctf_member_next, and add support for prefix types (using the suffix where appropriate, etc). We use the ctf-types API for things like forward lookup, using the private _tp functions to reduce overhead while centralizing knowledge of things like the encoding of enum forwards outside the deduplicator. No functional changes yet.
2025-04-25libctf: drop unnecessary macroNick Alcock1-5/+0
Every use of this macro has been deleted.
2025-04-25libctf: open-bfd: open BTF dictsNick Alcock1-12/+19
Teaching ctf_open and ctf_fdopen to open BTF dicts if passed is quite simple: we just need to check the magic number and allow BTF dicts into the lower-level ctf_simple_open machinery (which ultimately calls ctf_bufopen).
2025-04-25libctf: link: drop unnecessary back-compatibility codeNick Alcock1-53/+0
We no longer need to ensure that inputs have a new-format func info section: no such sections exist in CTFv4 (and the v3 compatibility code will throw away old-format sections).
2025-04-25libctf: link: BTF supportNick Alcock3-6/+50
This is in two parts, one new API function and one change. New API: +int ctf_link_output_is_btf (ctf_dict_t *); Changed API: unsigned char *ctf_link_write (ctf_dict_t *, size_t *size, - size_t threshold); + size_t threshold, int *is_btf); The idea here is that callers can call ctf_link_output_is_btf on a ctf_link()ed (deduplicated) dict to tell whether a link will yield BTF-compatible output before actually generating that output, so they can e.g. decide whether to avoid trying to compress the dict if they know it would be BTF otherwise (since compressing a dict renders it non-BTF-compatible). ctf_link_write() gains an optional is_btf output parameter that reports whether the dict that was finally generated is actually BTF after all, perhaps because the caller didn't call ctf_link_output_is_btf or wants to be robust against possible future changes that may add other reasons why a written-out dict can't be BTF at the last minute. These are simple wrappers around already-existing machinery earlier in this series.
2025-04-25libctf: strings: don't check for non-deduplicable atoms in the parentNick Alcock1-2/+3
Callers of ctf_str_add_no_dedup_ref are indicating that they would like the string they have added a reference to to appear in the current dict and not be deduplicated into the parent. This is true even if the string already exists in the parent, so we should not check for strings in the parent and reuse them in this case.
2025-04-25libctf: serialize: finish off the serializerNick Alcock1-83/+76
The only remaining parts of serialization that need fixing up is ctf_preserialize, which despite its name does nearly all the work of serialization: the only bit it doesn't do is write the string tables (since that has to happen across dicts after all the dicts have otherwise been laid out, in order to deduplicate the strtabs). As usual in this series, there's adjustment for various field name changes (maxtypes -> ntypes, the move into ctf_serialize, etc), and extra work to figure out whether we're emitting BTF or not and to handle the distinction between CTF and BTF headers, and not try to emit CTF-only stuff like the symtypetabs into BTF dicts; we can also throw out a bunch of old code that sets compatibility flags, everything to do with forcing variables into the dynamic state in case they changed (we're going to handle that more generally for everything in the types table at a later date, outside serialization), and everything to do with special handling of variables in general. But much of that is only a couple of lines each, and most of the changes are mechanical: this is probably the simplest serialization commit in this series.
2025-04-25libctf: open: fix closing of children with imported parentsNick Alcock1-2/+8
Closing a parent dict for the last time erases all its types and strings, which makes type and string lookups in any surviving children impossible from then on. Since children hold a reference to their parent, this can only happen in ctf_dict_close of the last child, after the parent has been closed by the caller as well. Since DTD deletion now involves doing type and string lookups in order to clean out the name tables, close the parent only after the child DTDs have been deleted.
2025-04-25libctf: open, types: ctf_import for BTFNick Alcock2-16/+40
ctf_import needs a bunch of fixes to work with pure BTF dicts -- and, for that matter, importing newly-created parent dicts that have never been written out, which may have a bunch of nonprovisional types (if types were added to it before any imports were done) or may not (if at least one ctf_import into it was done before any types were added). So we adjust things so that the values that are checked against are the nonprovisional-types values: the header revisions actually changed the name of cth_parent_typemax to cth_parent_ntypes to make this clearer, so catch up with that. In the parent, we have to use ctf_idmax, not ctf_typemax. One thing we must prohibit is that you cannot add a bunch of types to a child and then import a parent into it: the type IDs will all be wrong and the string offsets more so. This was partly prohibited: prohibit it entirely (excepting only that the not-actually-written-out void type we might add to new BTF dicts does not influence this check). Since BTF children don't have a cth_parent_ntypes or a cth_parent_strlen, we cannot check this stuff, but just set them and hope.
2025-04-25libctf: serialize: handle CTF-versus-BTF output format checksNick Alcock1-0/+49
The internal function ctf_serialize_output_format centralizes all the checks for BTF-versus-CTF, checking to see if the type section, active suppressions, and BTF-emission mode permit BTF emission, setting ctf_serialize.cs_is_btf if we are actually BTF, and raising ECTF_NOTBTF if we are requiring BTF emission but the type section is such that we can't emit it. (There is a forcing parameter in place, as with most of these serialization functions, to allow for the caller to force CTF emission if it knows the output will be compressed or will be part of multi-member archives or something else external to the type section that BTF does not support.)
2025-04-25libctf: serialize: size and emit the type sectionNick Alcock1-140/+267
As with sizing, this needs to support type suppression and CTF_K_BIG elision, and adapt to the DTD representation changes. Those changes cause a general complexity reduction because we no longer have to memcpy the vlen into place separately for every type kind, but can do it all at once using shared code above the per-kind switch statement. That statement's only job now is generating refs out of type IDs and string offsets, and translating the struct offset from gap- into non-gap representation for non-big structs. We do three distinct things: - check whether all the types in a section are BTF-compatible, after suppression of unwanted type kinds (including types with unwanted prefixes), and elision of unneeded struct/union CTF_K_BIGs - size the type section, taking suppression and CTF_K_BIG elision into account - actually emit it, again taking all the above into account These all have to come to the same conclusions for every type: if the first one gets things wrong we might try to emit something as BTF when we can't; if the latter two are inconsistent, we might have a buffer overrun. So the type emission code double-checks BTF-compatibility and raises ECTF_NOTBTF if necessary; we also aggressively check for potential overruns before every memcpy() into the buffer and raise an ECTF_INTERNAL assertion failure if need be. Thankfully there are a lot fewer memcpy()s than there used to be: there are only four places we need to check, all close to each other, which is pretty maintainable. We add a bit of debugging when --enable-libctf-hash-debugging is on, printing the translation from provisional to final type ID so that you can use it to map back to the provisional ID again when trying to track down deduplicator problems, since the IDs the deduplicator will report at its emission time are only provisional (the final parent-relative IDs are not assigned until now).
2025-04-25libctf: serialize: type section sizingNick Alcock1-40/+37
This is made much simpler by the fact that the DTD representation now tracks the size of each vlen, so we don't need per-type-kind code to track it ourselves any more. There's extra code to handle type suppression, CTF_K_BIG elision, and prefixes.
2025-04-25libctf: serialize: check the type section for BTF-incompatible typesNick Alcock1-0/+116
We add a new ctf_type_sect_is_btf function (internal to ctf-serialize.c) to check the type section against the write prohibitions list and (after write-suppression) against the set of types allowed in BTF, and determine whether this type section contains any types BTF does not allow. CTF-specific type kinds like CTF_K_FLOAT are obviously prohibited in BTF, as are CTF-specific prefixes, except that CTF_K_BIG is allowed if and only if both its ctt_size and vlen are still zero: in that case it will be elided by type section writeout and will never appear in the BTF at all. Structs are checked to make sure they don't use any nameless padding members and that (if they are bitfields) all their offsets will still fit after conversion from CTF_K_BIG gap-between-struct-members representation (if they are not bitfields, we know they will fit, but for bitfields, they might be too big).
2025-04-25libctf: strings: no external strings in BTFNick Alcock4-40/+37
One of the things BTF doesn't have is the concept of external strings which can be shared with the ELF strtab. Therefore, even if the linker has reported strings which the dict is reusing, when we generate the strtab for a BTF dict we should emit those strings into it (and we should certainly not cause the presence of external strings to prevent BTF emission!) Note that since already-written strtab entries are never erased, writing a dict as BTF and then CTF will cause external strings to be emitted even for the CTF. This sort of repeated writing in different formats seems to be very rare: in any case, the problem can be avoided by simply doing the CTF writeout first (the following BTF writeout will spot the missing external- in-CTF strings and add them). We also throw away the internal-only function ctf_strraw_explicit(), which was used to add strings with a hardwired strtab: it was only ever used to write out the variable section, which is gone in v4.
2025-04-25libctf: serialize: kind suppression and prohibitionNick Alcock3-0/+36
The CTF serialization machinery decides whether to write out a dict as BTF or CTF (or, in LIBCTF_BTM_BTF mode, whether to write out a dict or fail with ECTF_NOTBTF) in part by looking at the type kinds in the dictionary. It is possible that you'd like to extend this check and ban specific type kinds from the dictionary (possibly even if it's CTF); it's also possible that you'd like to *not* fail even if a CTF-only kind is found, but rather replace it with a still-valid stub (CTF_K_UNKNOWN / BTF_KIND_UNKNOWN) and keep going. (The kernel's btfarchive machinery does this to ensure that the compiler and previous link stages have emitted only valid BTF type kinds.) ctf_write_suppress_kind supports both these use cases: +int ctf_write_suppress_kind (ctf_dict_t *fp, int kind, int prohibited); This commit adds only the core population code: the actual suppression is spread across the serializer and will be added in the next commits.
2025-04-25libctf: serialize: user control over BTF-versus-CTF writeoutNick Alcock2-18/+46
We need some way for users to declare that they want BTF or CTF in particular to be written out when they ask for it, or that they don't mind which. Adding this to all the ctf_write functions (like the compression threshold already is) would be a bit of a nightmare: there are a great many of them and this doesn't seem like something people would want to change on a per-dict basis (even if we did, we'd need to think about archives and linking, which work on a higher level than single dicts). So we repurpose an unused, vestigial existing function, ctf_version(), which was originally intended to do some sort of rather unclear API switching at runtime, to allow switching between different CTF file format versions (not yet supported, you have to pass CTF_VERSION) and BTF writeout modes: /* BTF/CTF writeout version info. ctf_btf_mode has three levels: - LIBCTF_BTM_ALWAYS writes out full-blown CTFv4 at all times - LIBCTF_BTM_POSSIBLE writes out CTFv4 if needed to avoid information loss, BTF otherwise. If compressing, the same as LIBCTF_BTM_ALWAYS. - LIBCTF_BTM_BTF writes out BTF always, and errors otherwise. Note that no attempt is made to downgrade existing CTF dicts to BTF: if you read in a CTF dict and turn on LIBCTF_BTM_POSSIBLE, you'll get a CTF dict; if you turn on LIBCTF_BTM_BTF, you'll get an unconditional error. Thus, this is really useful only when reading in BTF dicts or when creating new dicts. */ typedef enum ctf_btf_mode { LIBCTF_BTM_BTF = 0, LIBCTF_BTM_POSSIBLE = 1, LIBCTF_BTM_ALWAYS = 2 } ctf_btf_mode_t; /* Set the CTF library client version to the specified version: this is the version of dicts written out by the ctf_write* functions. If version is zero, we just return the default library version number. The BTF version (for CTFv4 and above) is indicated via btf_hdr_len, also zero for "no change". You can influence what type kinds are written out to a CTFv4 dict via the ctf_write_suppress_kind() function. */ extern int ctf_version (int ctf_version_, size_t btf_hdr_len, ctf_btf_mode_t btf_mode); (We retain the ctf_version_ stuff to leave space in the API to let the library possibly do file format downgrades in future, since we've already had requests for such things from users.)
2025-04-25libctf, serialize: preparatory stepsNick Alcock7-67/+112
The new serializer is quite a lot more customizable than the old, because it can write out BTF as well as CTF: you can ask to write out BTF or fail, write out CTF if required to avoid information loss, otherwise BTF, or always write out CTF. Callers often need to find out whether a dict could be written out as BTF before deciding how to write it out (because a dict can never be written out as BTF if it is compressed, a caller might well want to ask if there is anything else that prevents BTF writeout -- say, slices, conflicting types, or CTF_K_BIG -- before deciding whether to compress it). GNU ld will do this whenever it is passed only BTF sections on the input. Figuring out whether a dict can be written out as BTF is quite expensive: we have to traverse all the types and check them, including every member of every struct. So we'd rather do that work only once. This means making a lot of state once private to ctf_preserialize public enough that another function can initialize it; and since the whole API is available after calling this function and before serializing, we should probably arrange that if we do things we know will invalidate the results of all this checking, we are forced to do it again. This commit does that, moving all the existing serialization state into a new ctf_serialize_t and adding to it. Several functions grow force_ctf arguments that allow the caller to force CTF emission even if the type section looks BTFish: the writeout code and archive creation use this to force CTF emission if we are compressing, and archive creation uses it to force CTF emission if a CTF multi-member archive is in use, because BTF doesn't support archives at all so there's no point maintaining BTF compatibility in that case. The ctf_write* functions gain support for writing out BTF headers as well as CTF, depending on whether what was ultimately written out was actually BTF or not. Even more than most commits in this series, there is no way this is going to compile right now: we're in the middle of a major transition, completed in the next few commits.
2025-04-25libctf: lookup, open: chase header field changesNick Alcock2-20/+18
Nothing exciting here, just header fields slightly changing name and a couple of new comments and indentation fixes.
2025-04-25libctf, open: new API for getting the size of CTF/BTF file sectionsNick Alcock2-0/+45
I wrote this for BTF type size querying programs, but it might be of more general use and it's impossible to get this info in any other way, so we might want to keep it. New API: +size_t ctf_sect_size (ctf_dict_t *, ctf_sect_names_t sect);