aboutsummaryrefslogtreecommitdiff
path: root/libctf/ctf-create.c
AgeCommit message (Collapse)AuthorFilesLines
2025-04-01libctf: more output format controlNick Alcock1-0/+6
The new API function ctf_link_output_is_btf lets you determine whether the result of a ctf_link is likely to be written out as CTF or BTF before the write takes place: the new is_btf argument to ctf_link_write lets you find out whether it actually was (things like compression that only ctf_link_write is told about may cause last-minute changes in the decision). This requires us to split preserialization in two, with the portion that determines whether a serialized dict is BTF-compatible or not moving into a new internal function ctf_serialize_output_format, called by ctf_link_output_is_btf. We move the state used to communicate between serialization passes into a new sub-struct at the same time.
2025-03-27libctf: fix lookup and emission of BTF floatsNick Alcock1-8/+16
2025-03-27libctf: create: fix prefix additionNick Alcock1-7/+15
Skip over leading prefixes and add a new one before the dtd_data: boost the dtd_data. Assert that there must be at least one non-prefix header too.
2025-03-27libctf: create: another missed realloc-adaptation in struct member additionNick Alcock1-3/+13
Every single time we call ctf_grow_vlen or anything that calls it, we have to look up pointers into the vlen and/or prefix again, since they're in the region that ctf_grow_vlen reallocs. Fix a missed case.
2025-03-27libctf: create: fix yet another ctf_type_t-versus-unsigned-char confusionNick Alcock1-1/+1
C's pointer arithmetic not always operating in terms of bytes gets me every single time.
2025-03-27libctf: variables need not be in datasecsNick Alcock1-21/+19
Datasecs are only used for non-extern variables (those in ELF sections). Variables added by ctf_add_variable() should not be in any datasec at all. (This means we have to drop a minor optimization where variables not mentioned in ctf_var_datasecs were assumed to be in the most-numerous one, since they might be in none at all; not a great loss.)
2025-03-25libctf: create: get the bit offset right for initial struct members and unionsNick Alcock1-1/+6
Even initial struct members and union members can be bitfields: the bit width shouldn't be simply ignored for them.
2025-03-25libctf: create: fix the ctf_add_section_variable return valueNick Alcock1-1/+1
We want to return the ID of the variable, not of its section!
2025-03-25libctf: create: get func linkage vbytes rightNick Alcock1-1/+1
We failed to fix the vbytes for func linkage in this one place, so they were written out with the wrong vbytes value, wrecking readin of all future types.
2025-03-25libctf: create: grow the vlen size right for enum64 additionNick Alcock1-3/+13
Don't grow it by a multiple of ctf_enum_t, but rather ctf_enum64_t.
2025-03-25libctf: create: fix enum vlenNick Alcock1-1/+1
The dtd_vlen_size is the vbytes, not the vlen...
2025-03-25libctf: create: all decl tags have the same vlen_sizeNick Alcock1-1/+2
Decl tags with nonzero component_idxes don't have a zero-sized vlen!
2025-03-25libctf: ctf_add_section_variable must return the type IDNick Alcock1-1/+1
Not doing this breaks the deduplicator (it sees a zero ID, decides the new variable doesn't need to be inserted into the emissions hash, then later decl tags that refer to that variable cannot find it anywhere even though it must have been emitted.)
2025-03-24libctf: create: emit datasec after variable if both are emittedNick Alcock1-21/+23
This is consistent with compiler output, which makes writing tests with results that can be reused for .o and linked output easier. Use the snapshot facility to remove both variable and datasec DTDs on error, if need be.
2025-03-24libctf: create: verify the range of variable linkagesNick Alcock1-0/+3
2025-03-24libctf: fix CTF_K_FUNC_LINKAGENick Alcock1-5/+1
This has its linkage not in the vlen region, but *stuffed into the actual vlen in the info word* (why this is inconsistent with the way CTF_K_VAR works, I have no idea).
2025-03-24libctf: create: boost dtd_vlen_size when the vlen is grownNick Alcock1-0/+5
ctf_grow_vlen ensures the vlen region is big enough for some operation: it does not boost the dtd_vlen_size, because the caller can (and sometimes does) choose to write less than that in there. Make sure the caller bumps the dtd_vlen_size appropriately, since serialization now trusts this value to be correct rather than recomputing all vlens from recorded-in-type-table scratch.
2025-03-24libctf: create: fix enum name emissionNick Alcock1-1/+1
2025-03-21libctf: get enum64 rightNick Alcock1-12/+40
It's got two halves of a value, not one single value. Unsigned-versus-signed types not yet properly tested.
2025-03-21libctf: more bugfixes: trivial ctf_open() and ctf_dump() now works.Nick Alcock1-2/+0
2025-03-20libctf: more compilation error fixesNick Alcock1-2/+2
Compiles now. Still doesn't work yet...
2025-03-20libctf: lots and lots of compilation error fixesNick Alcock1-5/+9
More to come.
2025-03-20libctf: add proper CTF_K_FUNCTION support in ctf_add_typeNick Alcock1-10/+59
No CTF_K_FUNC_LINKAGE yet, but at least we emit the function args and arg names.
2025-03-20libctf: compilation error fixesNick Alcock1-97/+110
2025-03-20libctf: compilation error fixesNick Alcock1-6/+6
2025-03-20libctf: create: allow addition to datasecs in any orderNick Alcock1-4/+37
ctf_add_section_variable() only permitted addition to datasecs in ascending order by offset, throwing -ECTF_DESCENDING otherwise. This is annoying to code to, and it turns out the deduplicator can emit variables into datasecs in essentially arbitrary order (and changing this would be very disruptive). So, instead, allow insertion in arbitrary order, maintain a flag on the DTD that indicates if the datasec has become unsorted (trivial to maintain), and sort it before operations that need it (serialization and query-by-offset: the deduplicator doesn't care, and ctf_datasec_var_{iter,next} already make no promises about ordering). This sort of thing is made much simpler by the weird inverse relationship between datasecs and vars: because nothing can point to the members of a datasec, we can reshuffle them without affecting anything else.
2025-03-20libctf: create: ctf_add_section_variable consistencyNick Alcock1-3/+4
Give it a non-root flag like every other adding function (other than ctf_add_variable() itself). It's not very useful, but it's not *useless* either, particularly in the deduplicator, and consistency is valuable in its own right. This is doubly true given that variables are now in the C identifier namespace (as they always should have been) so can fail to be inserted more than they used to be. (The datasec is always created root-visible, insofar as that means anything for a non-C type.)
2025-03-20libctf: make ctf_add_datasec privateNick Alcock1-21/+23
Datasecs now automatically spring into existence when you ctf_add_section_variable() to them.
2025-03-20libctf: fix ctf_type_linkage()'s APINick Alcock1-0/+1
It now matches ctf_add_function_linkage()'s et al -- returning the linkage, not passing back a ctf_linkage_t. Also fix ctf_add_function_linkage() so it actually sets the linkage.
2025-03-20libctf: a few more missed bitsNick Alcock1-7/+7
Publicize the type and decl tag functions (which were defined but never exported anywhere); add enum64 support to more missed places, like ctf_type_encoding: fix some more places where API changes were missed.
2025-03-20libctf: delete ctf_varsNick Alcock1-8/+1
Variables are in the same namespace as all other C identifiers: they shouldn't get their own special one. (That they used to was purely an accident of implementation.)
2025-03-20libctf: create: distinguish between vbytes in use and vbytes added just in caseNick Alcock1-22/+23
Before now, ctf_add_generic took a vbytes argument which both gave the size of the vlen *and* the size of any extra slack added to things like structs to keep the first few member additions from causing a flurry of realloc()s. If we split this in two, we can make the dtd_vlen into the *actual size* of the vlen, making it no longer necessary to work on a type-kind by type- kind basis to figure out how big types are. We couldn't do this before, because dtd_vlen_size was the only thing that recorded the amount of space allocated for the DTD at all, but now we have dtd_buf_size for that, and can make dtd_vlen_size a pure "actual size of vlen" field.
2025-03-20libctf: remove is-BTF dead-reckoningNick Alcock1-19/+2
Before now we've been trying to remember whether a CTF dict is representable as BTF via tracking when changes are made that make it non-BTF. This is a lot of work for nothing: the existence of write-time type suppression means that even if you add non-BTF type kinds, you might well decide to suppress them at writeout time, making the resulting dict pure BTF after all. Rip this whole thing out. (Also use some of the new macros we just added.)
2025-03-20libctf: header offset changes and associated bugfixesNick Alcock1-1/+3
Making the CTFv4 header offsets relative to the end of the CTF header has proven a recipe for disaster, because every *reference* to any header offsets now needs thought about what it's relative to. Make the whole lot relative to the end of the BTF header. ctf_buf of a CTFv4 buffer in memory now starts with an unused region which the CTFv4-specific header portion sits in, just so that the offset computations come out right. Also fix a bunch of places where field names hadn't been adjusted yet, and arrange to track whether a dict was originally opened as BTF, so we can set cth_parent_ntypes right on such a dict (in memory) when a dict is ctf_imported into it, and not have to worry elsewhere that this value might be unexpectedly zero.
2025-03-20libctf: vlen type-correctness, btf.h co-inclusionNick Alcock1-3/+14
Fairly simple things, but a few API changes as well to get things that are actually bounded by size_t in C to be sized by something *like* size_t in CTF (and, internally, to make sure that we store vlens in size_t's too, not uint32_t).
2025-03-20libctf: function argument namesNick Alcock1-6/+13
Knew there was some BTF change I'd forgotten. Printing of function types with arg names will look a bit weird because right now we're just sticking the arg name after the type. Doing this right involves putting it in the right place in the declaration, which is a good bit more work...
2025-03-20libctf: last bit of ctf-types.cNick Alcock1-40/+48
ctf_tag, ctf_tag_next; dropping type/decl tag lookup from ctf_lookup_by_name (we can't get it for free because the name tables are unusually structured, and with multiple IDs mapping to a given tag it's not clear what we could return in any case); supporting void * (almost no changes, just a tweak to ctf_type_compat() to note that void * is assignment-compatible with itself); plus a tiny tweak in the deduplicator's error handling spotted while checking uses of ctf_dynhash_insert. (Will all be squashed together and split up in different directions in future anyway.)
2025-03-20libctf: ctf-type / ctf-create: BTF / CTFv4 wipNick Alcock1-436/+942
This huge change transforms ctf-type.c and ctf-create.c to handle BTF, including datasec and tag support. I don't think I missed anything, but I haven't audited for things I missed yet... I'm still working on ctf_tag(), but that's the last piece. This is a much bigger change than expected because of type prefixes. We need type prefixes even before CTFv4 because we want to be able to pass pahole information on conflicted types, and in CTFv4 these are implemented with a CTF_K_CONFLICTING prefix type. As long as we're doing that, let's implement CTF_K_BIG as well... and with that there, suddenly we have to worry about having both at once on one type, and the existing DTD representation with one ctf_type_t starts to look seriously inadequate. So we adjust the DTD so that the ctf_type_t and vlen are contained *within* a buffer which is identical to the on-disk representation, except only that it might be unconditionally CTF_K_BIG while in memory or something like that. We can then trivially search this buffer for type headers using a simple increment (since all the type headers are at the start, followed by the vlen region), which makes all the rest much easier: e.g. most of the repetitive code in ctf-types.c's type handling and in type creation can get moved into common code, and there is finally almost no distinction at all between static and dynamic types. This also makes it easy to make types dynamic on the fly later, which will be crucial if we ever want to add variables to pre-existing dicts (since that means adding to a pre-existing datasec as well). A bunch of APIs, particularly around iter_f functions, have changed: everything that takes a type now takes a dict as well, because the lack of such was incredibly annoying. Uses in libctf (particularly in the dumper and in ctf_add_type) have been adjusted, but not yet simplified as they could be now they have a dict more easily available. Absolutely does not compile, but should show where we're going. Thanks to Bruce McCulloch <bruce.mcculloch@oracle.com> for a whole heap of creation and querying functions.
2025-03-20libctf: fix ctf_set_array type/index confusionNick Alcock1-2/+5
This could cause spurious ECTF_RDONLY if you called ctf_set_array on a type in a child dict whose parent had more static types than the child. libctf/ * ctf-create.c (ctf_set_array): Fix type/index confusion.
2025-03-20libctf: wip: header changes, file open changes for CTFv4Nick Alcock1-0/+1
Will not even compile (ctf-open-compat.c not tied in or adjusted, opening requires type lookup to be converted before it works, etc etc). But this is the basis of it. (Longer commit log comment to come.)
2025-03-16types: add some more error checkingNick Alcock1-1/+6
A few places with inadequate error checking have fallen out of the ctf_id_t work: - ctf_add_slice doesn't make sure that the type it is slicing actually exists - ctf_add_member_offset doesn't check that the type of the member exists (though it will often fail if it doesn't, it doesn't explicitly check, so if you're unlucky it can sometimes succeed, giving you a corrupted dict) - ctf_type_encoding doesn't check whether its slied type exists: it should verify it so it can return a decent error, rather than a thoroughly misleading one - ctf_type_compat has the same problem with respect to both of its arguments. It would definitely be nicer if we could call ctf_type_compat and just get a boolean answer, but it's not clear to me whether a type can be said to be compatible *or* incompatible with a nonexistent one, and we should probably alert the users to a likely bug regardless. C error checking, sigh...
2025-03-16Tiny stylistic spacing and comment tweaksNick Alcock1-3/+2
2025-03-16libctf: consecutive ctf_id_t assignmentNick Alcock1-11/+125
This change modifies type ID assignment in CTF so that it works like BTF: rather than flipping the high bit on for types in child dicts, types ascend directly from IDs in the parent to IDs in the child, without interruption (so type 0x4 in the parent is immediately followed by 0x5 in all children). Doing this while retaining useful semantics for modification of parents is challenging. By definition, child type IDs are not known until the parent is written out, but we don't want to find ourselves constrained to adding types to the parent in one go, followed by all child types: that would make the deduplicator a nightmare and would frankly make the entire ctf_add*() interface next to useless: all existing clients that add types at all add types to both parents and children without regard for ordering, and breaking that would probably necessitate redesigning all of them. So we have to be a litle cleverer. We approach this the same way as we approach strings in the recent refs rework: if a parent has children attached (or has ever had them attached since it was created or last read in), any new types created in the parent are assigned provisional IDs starting at the very top of the type space and working down. (Their indexes in the internal libctf arrays remain unchanged, so we don't suddenly need multigigabyte indexes!). At writeout (preserialization) time, we traverse the type table (and all other table containing type IDs) and assign refs to every type ID in exactly the same way we assign refs to every string offset (just a different set of refs -- we don't want to update type IDs with string offset values!). For a parent dict with children, these refs are real entities in memory: pointers to the memory locations where type IDs are stored, tracked in the DTD of each type. As we traverse the type table, we assign real IDs to each type (by simple incrementation), storing those IDs in a new dtd_final_type field in the DTD for each type. Once the type table and all other tables containing type IDs are fully traversed, we update all the refs and overwrite the IDs currently residing in each with the final IDs for each type. That fixes up IDs in the parent dict itself (including forward references in structs and the like: that's why the ref updates only happen at the end); but what about child dicts' references, both to parent types and to their own? We add armouring to enforce that parent dicts are always serialized before their children (which ctf-link.c already does, because it's a precondition for strtab deduplication), and then arrange that when a ref is added to a type whose ID has been assigned (has a dtd_final_type), we just immediately do an update rather than storing a ref for later updating. Since the parent is already serialized, all parent type IDs have a dtd_final_type by this point, and all parent IDs in the children are properly updated. The child types can now be renumbered now we now the number of types in the parent, and their refs updated identically to what was just done with the parent. One wrinkle: before the child refs are updated, while we are working over the child's type section, the type IDs in the child start from 1 (or something like that), which might seem to overlap the parent IDs. But this is not the case: when you serialize the parent, the IDs written out to disk are changed, but the only change to the representation in memory is that we remember a dtd_final_type for each type (and use it to update all the child type refs): its ID in memory is the same as it always was, a nonoverlapping provisional ID higher than any other valid ID. We enforce all of this by asserting that when you add a ref to a type, the memory location that is modified must be in the buffer being serialized: the code will not let you accidentally modify the actual DTDs in memory. We track the number of types in the parent in a new CTFv4 (not BTF) header field (the dumper is updated): we will also use this to open CTFv3 child dicts without change by simply declaring for them that the parent dict has 2^31 types in it (or 2^15, for v2 and below): the IDs in the children then naturally come out right with no other changes needed. (Right now, opening CTFv3 child dicts requires extra compatibility code that has not been written, but that code will no longer need to worry about type ID differences.) Various things are newly forbidden: - you cannot ctf_import() a child into a parent if you already ctf_add()ed types to the child, because all its IDs would change (and since you already cannot ctf_add() types to a child that hasn't had its parent imported, this in practice means only that ctf_create() must be followed immediately by a ctf_import() if this is a new child, which all sane clients were doing anyway). - You cannot import a child into a parent which has the wrong number of (non-provisional) types, again because all its IDs would be wrong: because parents only add types in the provisional space if children are attached to it, this would break the not unknown case of opening an archive, adding types to the parent, and only then importing children into it, so we add a special case: archive members which are not children in an archive with more than one member always pretend to have at least one child, so type additions in them are always provisional even before you ctf_import anything. In practice, this does exactly what we want, since all archives so far are created by the linker and have one parent and N children of that parent. Because this introduces huge gaps between index and type ID for provisional types, some extra assertions are added to ensure that the internal ctf_type_to_index() is only ever called on types in the current dict (never a parent dict): before now, this was just taken on trust, and it was often wrong (which at best led to wrong results, as wrong array indexes were used, and at worst to a buffer overflow). When hash debugging is on (suggesting that the user doesn't mind expensive checks), every ctf_type_to_index() triggers a ctf_index_to_type() to make sure that the operations are proper inverses. Lots and lots of tests are added to verify that assignment works and that updating of every type kind works fine -- existing tests suffice for type IDs in the variable and symtypetab sections. The ld-ctf tests get a bunch of largely display-based updates: various tests refer to 0x8... type IDs, which no longer exist, and because the IDs are shorter all the spacing and alignment has changed.
2025-02-28libctf: fix slices of slices and of enumsNick Alcock1-1/+5
Slices had a bunch of horrible usability problems. In particular, while towers of cv-quals are resolved away by functions that need to do it, towers of cv-quals with slices in the middle are not resolved away by functions like ctf_enum_value that can see through slices: resolving volatile -> slice -> const -> enum will leave it with a 'const', which will error pointlessly, annoying callers, who reasonably expect slices to be more invisible than this. (The user-callable ctf_type_resolve still does not resolve away slices, because this is the only way users can see that the slices are there at all.) This is induced by a fix for another wart: ctf_add_enumerator does not resolve anything away at all, so you can't even add enumerators to const or volatile enums -- and more problematically, you can't add enumerators to enums with an explicit encoding without resolving away the types by hand, since ctf_add_enum_encoded works by returning a slice! ctf_add_enumerator now resolves away all of those, so any cvr-or-typedef-or-slice-qual terminating in an enum can be added to, exactly as callers likely expect. (New tests added.) libctf/ * ctf-create.c (ctf_add_enumerator): Resolve away cvr-qualness. * ctf-types.c (ctf_type_resolve_unsliced): Don't terminate at the first slice. * testsuite/libctf-writable/slice-of-slice.*: New test.
2025-02-28libctf: string: refs reworkNick Alcock1-43/+4
This commit moves provisional (not-yet-serialized) string refs towards the scheme to be used for CTF IDs in the future. In particular - provisional string offsets now count downwards from just under the external string offset space (all bits on but the high bit). This makes it possible to detect an overflowing strtab, and also makes it trivial to determine whether any string offset (ref) updates were missed -- where before we might get a slightly corrupted or incorrect string, we now get a huge high strtab offset corresponding to no string, and an error is emitted at read time. - refs are emitted at serialization time during the pass through the types. They are strictly associated with the newly-written-out buffer: the existing opened CTF dict is not changed, though it does still get the new strtab so that new refs to the same string can just refer directly to it. The provisional strtab hash table that contains these strings is not deleted after serialization (because we might serialize again): instead, we keep track in the parent of the lowest-yet-used ("latest") provisional strtab offset, and any strtab offset above that, but not external (high-bit-on) is considered provisional. This is sort-of-enforced by moving most of the ref-addition function declarations (including ctf_str_add_ref) to a new ctf-ref.h, which is not included by ctf-create.c or ctf-open.c. - because we don't add refs when adding types, we don't need to handle the case where we add things to expanding vlens (enums, struct members) and have to realloc() them. So the entire painful movable refs system can just be deleted, along with the ability to remove refs piecemeal at all (purging all of them is still possible). Strings added during type addition are added via ctf_str_add(), which adds no refs: the strings are picked up at serialization time and refs to their final, serialized resting place added. The DTDs never have any refs in them, and their provisional strtab offsets are never updated by the ref system. This caused several bugs to fall out of the earlier work and get fixed. In particular, attempts to look up a string in a child dict now search the parent's provisional strtab too: we add some extra special casing for the null string so we don't need to worry about deduplication moving it somewhere other than offset zero. Finally, the optimization that removes an unreferenced synthetic external strtab (the record of the strings the linker has told us about, kept around internally for lookup during late serialization) is faulty: references to a strtab entry will only produce CTF-level refs if their value might change, and an external string's offset won't change, so it produces no refs: worse yet, even if we did get a ref (say, if the string was originally believed to be internal and only later were we told that the linker knew about it too), when we serialize a strtab, all its refs are dropped (since they've been updated and can no longer change); so if we serialized it a second time, its synthetic external strtab would be considered empty and dropped, even though the same external strings as before still exist, referencing it. We must keep the synthetic external strtab around as long as external strings exist that reference it, i.e. for the life of the dict. One benefit of all this: now we're emitting provisional string offsets at a really high value, it's out of the way of the consecutive, deduplicated string offsets in child dicts. So we can drop the constraint that you cannot add strings to a dict with children, which allows us to add types freely to parent dicts again. What you can't do is write that dict out again: when we serialize, we currently update the dict being serialized with the updated strtabs: when you write a dict out, its provisional strings become real strings, and suddenly the offsets would overlap once more. But opening a dict and its children, adding to it, and then writing it out again is rare indeed, and we have a workaround: anyone wanting to do this can just use ctf_link instead.
2025-02-28libctf: create: fix vlen / vbytes confusionNick Alcock1-19/+19
The initial_vlen parameter to ctf_add_generic is misnamed: it's not the initial vlen (the initial number of members of a struct, etc), but rather the initial size of the vlen region. We have a term for that, vbytes: use it. Amazingly this doesn't seem to have caused any bugs to creep in.
2025-02-28libctf: de-macroize LCTF_TYPE_TO_INDEX / LCTF_INDEX_TO_TYPENick Alcock1-15/+13
Making these functions is unnecessary right now, but will become much clearer shortly. While we're at it, we can drop the third child argument to LCTF_INDEX_TO_TYPE: it's only used for nontrivial purposes that aren't literally the same as getting the result from the fp in one place, in ctf_lookup_by_name_internal, and that place is easily fixed by just looking in the right dictionary in the first place.
2025-02-28libctf: make ctf_dynamic_type() the inverse of ctf_static_type()Nick Alcock1-2/+2
They're meant to be inverses, which makes it unfortunate that they check different bounds. No visible effect yet, since ctf_typemax and ctf_stypes currently cover the entire type ID space, but will have an effect shortly.
2025-02-28libctf: drop LCTF_TYPE_ISPARENT/LCTF_TYPE_ISCHILDNick Alcock1-25/+26
Parent/child determination is about to become rather more complex, making a macro impractical. Use the ctf_type_isparent/ischild function calls everywhere and remove the macro. Make them more const-correct too, to make them more widely usable. While we're about it, change several places that hand-implemented ctf_get_dict() to call it instead, and armour several functions against the null returns that were always possible in this case (but previously unprotected-against).
2025-02-28libctf: generalize the ref systemNick Alcock1-2/+2
Despite the removal of the separate movable ref list, the ref system as a whole is more than complex enough to be worth generalizing now that we are adding different kinds of ref. Refs now are lists of uint32_t * which can be updated through the pointer for all entries in the list and moved to new sites for all pointers in a given range: they are no longer references to string offsets in particular and can be references to other uint32_t-sized things instead (note that ctf_id_t is a typedef to a uint32_t). ctf-string.c has been adjusted accordingly (the adjustments are tiny, more or less just turning a bunch of references to atom into &atom->csa_refs).