Age | Commit message (Collapse) | Author | Files | Lines |
|
The new API function ctf_link_output_is_btf lets you determine
whether the result of a ctf_link is likely to be written out as
CTF or BTF before the write takes place: the new is_btf argument
to ctf_link_write lets you find out whether it actually was (things
like compression that only ctf_link_write is told about may cause
last-minute changes in the decision).
This requires us to split preserialization in two, with the portion that
determines whether a serialized dict is BTF-compatible or not moving into a
new internal function ctf_serialize_output_format, called by
ctf_link_output_is_btf. We move the state used to communicate between
serialization passes into a new sub-struct at the same time.
|
|
|
|
Skip over leading prefixes and add a new one before the dtd_data: boost the
dtd_data. Assert that there must be at least one non-prefix header too.
|
|
Every single time we call ctf_grow_vlen or anything that calls it, we have
to look up pointers into the vlen and/or prefix again, since they're in the
region that ctf_grow_vlen reallocs. Fix a missed case.
|
|
C's pointer arithmetic not always operating in terms of bytes gets me every
single time.
|
|
Datasecs are only used for non-extern variables (those in ELF sections).
Variables added by ctf_add_variable() should not be in any datasec at all.
(This means we have to drop a minor optimization where variables not
mentioned in ctf_var_datasecs were assumed to be in the most-numerous one,
since they might be in none at all; not a great loss.)
|
|
Even initial struct members and union members can be bitfields: the bit
width shouldn't be simply ignored for them.
|
|
We want to return the ID of the variable, not of its section!
|
|
We failed to fix the vbytes for func linkage in this one place,
so they were written out with the wrong vbytes value, wrecking
readin of all future types.
|
|
Don't grow it by a multiple of ctf_enum_t, but rather ctf_enum64_t.
|
|
The dtd_vlen_size is the vbytes, not the vlen...
|
|
Decl tags with nonzero component_idxes don't have a zero-sized vlen!
|
|
Not doing this breaks the deduplicator (it sees a zero ID, decides the new
variable doesn't need to be inserted into the emissions hash, then later
decl tags that refer to that variable cannot find it anywhere even though
it must have been emitted.)
|
|
This is consistent with compiler output, which makes writing tests
with results that can be reused for .o and linked output easier.
Use the snapshot facility to remove both variable and datasec DTDs
on error, if need be.
|
|
|
|
This has its linkage not in the vlen region, but *stuffed into the
actual vlen in the info word* (why this is inconsistent with the
way CTF_K_VAR works, I have no idea).
|
|
ctf_grow_vlen ensures the vlen region is big enough for some operation:
it does not boost the dtd_vlen_size, because the caller can (and
sometimes does) choose to write less than that in there.
Make sure the caller bumps the dtd_vlen_size appropriately, since
serialization now trusts this value to be correct rather than recomputing
all vlens from recorded-in-type-table scratch.
|
|
|
|
It's got two halves of a value, not one single value.
Unsigned-versus-signed types not yet properly tested.
|
|
|
|
Compiles now. Still doesn't work yet...
|
|
More to come.
|
|
No CTF_K_FUNC_LINKAGE yet, but at least we emit the function args
and arg names.
|
|
|
|
|
|
ctf_add_section_variable() only permitted addition to datasecs in ascending
order by offset, throwing -ECTF_DESCENDING otherwise. This is annoying to
code to, and it turns out the deduplicator can emit variables into datasecs
in essentially arbitrary order (and changing this would be very disruptive).
So, instead, allow insertion in arbitrary order, maintain a flag on the DTD
that indicates if the datasec has become unsorted (trivial to maintain), and
sort it before operations that need it (serialization and query-by-offset:
the deduplicator doesn't care, and ctf_datasec_var_{iter,next} already make
no promises about ordering).
This sort of thing is made much simpler by the weird inverse relationship
between datasecs and vars: because nothing can point to the members of a
datasec, we can reshuffle them without affecting anything else.
|
|
Give it a non-root flag like every other adding function (other than
ctf_add_variable() itself). It's not very useful, but it's not *useless*
either, particularly in the deduplicator, and consistency is valuable in its
own right. This is doubly true given that variables are now in the C
identifier namespace (as they always should have been) so can fail to be
inserted more than they used to be. (The datasec is always created
root-visible, insofar as that means anything for a non-C type.)
|
|
Datasecs now automatically spring into existence when you
ctf_add_section_variable() to them.
|
|
It now matches ctf_add_function_linkage()'s et al -- returning the
linkage, not passing back a ctf_linkage_t.
Also fix ctf_add_function_linkage() so it actually sets the linkage.
|
|
Publicize the type and decl tag functions (which were defined but never
exported anywhere); add enum64 support to more missed places, like
ctf_type_encoding: fix some more places where API changes were missed.
|
|
Variables are in the same namespace as all other C identifiers: they
shouldn't get their own special one. (That they used to was purely an
accident of implementation.)
|
|
Before now, ctf_add_generic took a vbytes argument which both gave the size
of the vlen *and* the size of any extra slack added to things like structs
to keep the first few member additions from causing a flurry of realloc()s.
If we split this in two, we can make the dtd_vlen into the *actual size*
of the vlen, making it no longer necessary to work on a type-kind by type-
kind basis to figure out how big types are. We couldn't do this before,
because dtd_vlen_size was the only thing that recorded the amount of space
allocated for the DTD at all, but now we have dtd_buf_size for that, and
can make dtd_vlen_size a pure "actual size of vlen" field.
|
|
Before now we've been trying to remember whether a CTF dict is representable
as BTF via tracking when changes are made that make it non-BTF. This is a
lot of work for nothing: the existence of write-time type suppression means
that even if you add non-BTF type kinds, you might well decide to suppress
them at writeout time, making the resulting dict pure BTF after all.
Rip this whole thing out.
(Also use some of the new macros we just added.)
|
|
Making the CTFv4 header offsets relative to the end of the CTF
header has proven a recipe for disaster, because every *reference*
to any header offsets now needs thought about what it's relative to.
Make the whole lot relative to the end of the BTF header. ctf_buf
of a CTFv4 buffer in memory now starts with an unused region
which the CTFv4-specific header portion sits in, just so that the
offset computations come out right.
Also fix a bunch of places where field names hadn't been adjusted
yet, and arrange to track whether a dict was originally opened as
BTF, so we can set cth_parent_ntypes right on such a dict (in
memory) when a dict is ctf_imported into it, and not have to
worry elsewhere that this value might be unexpectedly zero.
|
|
Fairly simple things, but a few API changes as well to get things that are
actually bounded by size_t in C to be sized by something *like* size_t in
CTF (and, internally, to make sure that we store vlens in size_t's too,
not uint32_t).
|
|
Knew there was some BTF change I'd forgotten.
Printing of function types with arg names will look a bit weird because
right now we're just sticking the arg name after the type. Doing this right
involves putting it in the right place in the declaration, which is a good
bit more work...
|
|
ctf_tag, ctf_tag_next; dropping type/decl tag lookup from ctf_lookup_by_name
(we can't get it for free because the name tables are unusually structured,
and with multiple IDs mapping to a given tag it's not clear what we could
return in any case); supporting void * (almost no changes, just a tweak to
ctf_type_compat() to note that void * is assignment-compatible with itself);
plus a tiny tweak in the deduplicator's error handling spotted while
checking uses of ctf_dynhash_insert.
(Will all be squashed together and split up in different directions in
future anyway.)
|
|
This huge change transforms ctf-type.c and ctf-create.c to handle BTF,
including datasec and tag support. I don't think I missed anything, but
I haven't audited for things I missed yet... I'm still working on
ctf_tag(), but that's the last piece.
This is a much bigger change than expected because of type prefixes. We
need type prefixes even before CTFv4 because we want to be able to pass
pahole information on conflicted types, and in CTFv4 these are implemented
with a CTF_K_CONFLICTING prefix type. As long as we're doing that, let's
implement CTF_K_BIG as well... and with that there, suddenly we have to
worry about having both at once on one type, and the existing DTD
representation with one ctf_type_t starts to look seriously inadequate.
So we adjust the DTD so that the ctf_type_t and vlen are contained *within*
a buffer which is identical to the on-disk representation, except only that
it might be unconditionally CTF_K_BIG while in memory or something like
that. We can then trivially search this buffer for type headers using
a simple increment (since all the type headers are at the start, followed by
the vlen region), which makes all the rest much easier: e.g. most of the
repetitive code in ctf-types.c's type handling and in type creation can
get moved into common code, and there is finally almost no distinction at
all between static and dynamic types. This also makes it easy to make
types dynamic on the fly later, which will be crucial if we ever want
to add variables to pre-existing dicts (since that means adding to a
pre-existing datasec as well).
A bunch of APIs, particularly around iter_f functions, have changed:
everything that takes a type now takes a dict as well, because the lack of
such was incredibly annoying. Uses in libctf (particularly in the dumper and
in ctf_add_type) have been adjusted, but not yet simplified as they could be
now they have a dict more easily available.
Absolutely does not compile, but should show where we're going.
Thanks to Bruce McCulloch <bruce.mcculloch@oracle.com> for a whole
heap of creation and querying functions.
|
|
This could cause spurious ECTF_RDONLY if you called ctf_set_array on
a type in a child dict whose parent had more static types than the
child.
libctf/
* ctf-create.c (ctf_set_array): Fix type/index confusion.
|
|
Will not even compile (ctf-open-compat.c not tied in or adjusted,
opening requires type lookup to be converted before it works,
etc etc).
But this is the basis of it.
(Longer commit log comment to come.)
|
|
A few places with inadequate error checking have fallen out of the
ctf_id_t work:
- ctf_add_slice doesn't make sure that the type it is slicing
actually exists
- ctf_add_member_offset doesn't check that the type of the member
exists (though it will often fail if it doesn't, it doesn't
explicitly check, so if you're unlucky it can sometimes succeed,
giving you a corrupted dict)
- ctf_type_encoding doesn't check whether its slied type exists:
it should verify it so it can return a decent error, rather than
a thoroughly misleading one
- ctf_type_compat has the same problem with respect to both of its
arguments. It would definitely be nicer if we could call
ctf_type_compat and just get a boolean answer, but it's not
clear to me whether a type can be said to be compatible *or*
incompatible with a nonexistent one, and we should probably alert
the users to a likely bug regardless. C error checking, sigh...
|
|
|
|
This change modifies type ID assignment in CTF so that it works like BTF:
rather than flipping the high bit on for types in child dicts, types ascend
directly from IDs in the parent to IDs in the child, without interruption
(so type 0x4 in the parent is immediately followed by 0x5 in all children).
Doing this while retaining useful semantics for modification of parents is
challenging. By definition, child type IDs are not known until the parent
is written out, but we don't want to find ourselves constrained to adding
types to the parent in one go, followed by all child types: that would make
the deduplicator a nightmare and would frankly make the entire ctf_add*()
interface next to useless: all existing clients that add types at all
add types to both parents and children without regard for ordering, and
breaking that would probably necessitate redesigning all of them.
So we have to be a litle cleverer.
We approach this the same way as we approach strings in the recent refs
rework: if a parent has children attached (or has ever had them attached
since it was created or last read in), any new types created in the parent
are assigned provisional IDs starting at the very top of the type space and
working down. (Their indexes in the internal libctf arrays remain
unchanged, so we don't suddenly need multigigabyte indexes!). At writeout
(preserialization) time, we traverse the type table (and all other table
containing type IDs) and assign refs to every type ID in exactly the same
way we assign refs to every string offset (just a different set of refs --
we don't want to update type IDs with string offset values!).
For a parent dict with children, these refs are real entities in memory:
pointers to the memory locations where type IDs are stored, tracked in the
DTD of each type. As we traverse the type table, we assign real IDs to each
type (by simple incrementation), storing those IDs in a new dtd_final_type
field in the DTD for each type. Once the type table and all other tables
containing type IDs are fully traversed, we update all the refs and
overwrite the IDs currently residing in each with the final IDs for each
type.
That fixes up IDs in the parent dict itself (including forward references in
structs and the like: that's why the ref updates only happen at the end);
but what about child dicts' references, both to parent types and to their
own? We add armouring to enforce that parent dicts are always serialized
before their children (which ctf-link.c already does, because it's a
precondition for strtab deduplication), and then arrange that when a ref is
added to a type whose ID has been assigned (has a dtd_final_type), we just
immediately do an update rather than storing a ref for later updating.
Since the parent is already serialized, all parent type IDs have a
dtd_final_type by this point, and all parent IDs in the children are
properly updated. The child types can now be renumbered now we now the
number of types in the parent, and their refs updated identically to what
was just done with the parent.
One wrinkle: before the child refs are updated, while we are working over
the child's type section, the type IDs in the child start from 1 (or
something like that), which might seem to overlap the parent IDs. But this
is not the case: when you serialize the parent, the IDs written out to disk
are changed, but the only change to the representation in memory is that we
remember a dtd_final_type for each type (and use it to update all the child
type refs): its ID in memory is the same as it always was, a nonoverlapping
provisional ID higher than any other valid ID. We enforce all of this by
asserting that when you add a ref to a type, the memory location that is
modified must be in the buffer being serialized: the code will not let you
accidentally modify the actual DTDs in memory.
We track the number of types in the parent in a new CTFv4 (not BTF) header
field (the dumper is updated): we will also use this to open CTFv3 child
dicts without change by simply declaring for them that the parent dict has
2^31 types in it (or 2^15, for v2 and below): the IDs in the children then
naturally come out right with no other changes needed. (Right now, opening
CTFv3 child dicts requires extra compatibility code that has not been
written, but that code will no longer need to worry about type ID
differences.)
Various things are newly forbidden:
- you cannot ctf_import() a child into a parent if you already ctf_add()ed
types to the child, because all its IDs would change (and since you
already cannot ctf_add() types to a child that hasn't had its parent
imported, this in practice means only that ctf_create() must be followed
immediately by a ctf_import() if this is a new child, which all sane
clients were doing anyway).
- You cannot import a child into a parent which has the wrong number of
(non-provisional) types, again because all its IDs would be wrong:
because parents only add types in the provisional space if children are
attached to it, this would break the not unknown case of opening an
archive, adding types to the parent, and only then importing children
into it, so we add a special case: archive members which are not children
in an archive with more than one member always pretend to have at least
one child, so type additions in them are always provisional even before
you ctf_import anything. In practice, this does exactly what we want,
since all archives so far are created by the linker and have one parent
and N children of that parent.
Because this introduces huge gaps between index and type ID for provisional
types, some extra assertions are added to ensure that the internal
ctf_type_to_index() is only ever called on types in the current dict (never
a parent dict): before now, this was just taken on trust, and it was often
wrong (which at best led to wrong results, as wrong array indexes were used,
and at worst to a buffer overflow). When hash debugging is on (suggesting
that the user doesn't mind expensive checks), every ctf_type_to_index()
triggers a ctf_index_to_type() to make sure that the operations are proper
inverses.
Lots and lots of tests are added to verify that assignment works and that
updating of every type kind works fine -- existing tests suffice for
type IDs in the variable and symtypetab sections.
The ld-ctf tests get a bunch of largely display-based updates: various
tests refer to 0x8... type IDs, which no longer exist, and because the
IDs are shorter all the spacing and alignment has changed.
|
|
Slices had a bunch of horrible usability problems. In particular, while
towers of cv-quals are resolved away by functions that need to do it, towers
of cv-quals with slices in the middle are not resolved away by functions
like ctf_enum_value that can see through slices: resolving volatile -> slice
-> const -> enum will leave it with a 'const', which will error pointlessly,
annoying callers, who reasonably expect slices to be more invisible than
this. (The user-callable ctf_type_resolve still does not resolve away
slices, because this is the only way users can see that the slices are there
at all.)
This is induced by a fix for another wart: ctf_add_enumerator does not
resolve anything away at all, so you can't even add enumerators to const or
volatile enums -- and more problematically, you can't add enumerators to
enums with an explicit encoding without resolving away the types by hand,
since ctf_add_enum_encoded works by returning a slice! ctf_add_enumerator
now resolves away all of those, so any cvr-or-typedef-or-slice-qual
terminating in an enum can be added to, exactly as callers likely expect.
(New tests added.)
libctf/
* ctf-create.c (ctf_add_enumerator): Resolve away cvr-qualness.
* ctf-types.c (ctf_type_resolve_unsliced): Don't terminate at
the first slice.
* testsuite/libctf-writable/slice-of-slice.*: New test.
|
|
This commit moves provisional (not-yet-serialized) string refs towards the
scheme to be used for CTF IDs in the future. In particular
- provisional string offsets now count downwards from just under the
external string offset space (all bits on but the high bit). This makes
it possible to detect an overflowing strtab, and also makes it trivial to
determine whether any string offset (ref) updates were missed -- where
before we might get a slightly corrupted or incorrect string, we now get
a huge high strtab offset corresponding to no string, and an error is
emitted at read time.
- refs are emitted at serialization time during the pass through the types.
They are strictly associated with the newly-written-out buffer: the
existing opened CTF dict is not changed, though it does still get the new
strtab so that new refs to the same string can just refer directly to it.
The provisional strtab hash table that contains these strings is not
deleted after serialization (because we might serialize again): instead,
we keep track in the parent of the lowest-yet-used ("latest") provisional
strtab offset, and any strtab offset above that, but not external
(high-bit-on) is considered provisional.
This is sort-of-enforced by moving most of the ref-addition function
declarations (including ctf_str_add_ref) to a new ctf-ref.h, which is
not included by ctf-create.c or ctf-open.c.
- because we don't add refs when adding types, we don't need to handle the
case where we add things to expanding vlens (enums, struct members) and
have to realloc() them. So the entire painful movable refs system can
just be deleted, along with the ability to remove refs piecemeal at all
(purging all of them is still possible). Strings added during type
addition are added via ctf_str_add(), which adds no refs: the strings are
picked up at serialization time and refs to their final, serialized
resting place added. The DTDs never have any refs in them, and their
provisional strtab offsets are never updated by the ref system.
This caused several bugs to fall out of the earlier work and get fixed.
In particular, attempts to look up a string in a child dict now search
the parent's provisional strtab too: we add some extra special casing
for the null string so we don't need to worry about deduplication
moving it somewhere other than offset zero.
Finally, the optimization that removes an unreferenced synthetic external
strtab (the record of the strings the linker has told us about, kept around
internally for lookup during late serialization) is faulty: references to a
strtab entry will only produce CTF-level refs if their value might change,
and an external string's offset won't change, so it produces no refs: worse
yet, even if we did get a ref (say, if the string was originally believed
to be internal and only later were we told that the linker knew about it
too), when we serialize a strtab, all its refs are dropped (since they've
been updated and can no longer change); so if we serialized it a second
time, its synthetic external strtab would be considered empty and dropped,
even though the same external strings as before still exist, referencing
it. We must keep the synthetic external strtab around as long as external
strings exist that reference it, i.e. for the life of the dict.
One benefit of all this: now we're emitting provisional string offsets at
a really high value, it's out of the way of the consecutive, deduplicated
string offsets in child dicts. So we can drop the constraint that you
cannot add strings to a dict with children, which allows us to add types
freely to parent dicts again. What you can't do is write that dict out
again: when we serialize, we currently update the dict being serialized
with the updated strtabs: when you write a dict out, its provisional
strings become real strings, and suddenly the offsets would overlap once
more. But opening a dict and its children, adding to it, and then
writing it out again is rare indeed, and we have a workaround: anyone
wanting to do this can just use ctf_link instead.
|
|
The initial_vlen parameter to ctf_add_generic is misnamed: it's not the
initial vlen (the initial number of members of a struct, etc), but rather
the initial size of the vlen region. We have a term for that, vbytes: use
it.
Amazingly this doesn't seem to have caused any bugs to creep in.
|
|
Making these functions is unnecessary right now, but will become much
clearer shortly.
While we're at it, we can drop the third child argument to
LCTF_INDEX_TO_TYPE: it's only used for nontrivial purposes that aren't
literally the same as getting the result from the fp in one place,
in ctf_lookup_by_name_internal, and that place is easily fixed by just
looking in the right dictionary in the first place.
|
|
They're meant to be inverses, which makes it unfortunate that
they check different bounds. No visible effect yet, since
ctf_typemax and ctf_stypes currently cover the entire type ID
space, but will have an effect shortly.
|
|
Parent/child determination is about to become rather more complex, making a
macro impractical. Use the ctf_type_isparent/ischild function calls
everywhere and remove the macro. Make them more const-correct too, to
make them more widely usable.
While we're about it, change several places that hand-implemented
ctf_get_dict() to call it instead, and armour several functions against
the null returns that were always possible in this case (but previously
unprotected-against).
|
|
Despite the removal of the separate movable ref list, the ref system as
a whole is more than complex enough to be worth generalizing now that
we are adding different kinds of ref.
Refs now are lists of uint32_t * which can be updated through the
pointer for all entries in the list and moved to new sites for all
pointers in a given range: they are no longer references to string
offsets in particular and can be references to other uint32_t-sized
things instead (note that ctf_id_t is a typedef to a uint32_t).
ctf-string.c has been adjusted accordingly (the adjustments are tiny,
more or less just turning a bunch of references to atom into
&atom->csa_refs).
|