libctf: consecutive ctf_id_t assignment

This change modifies type ID assignment in CTF so that it works like BTF: rather than flipping the high bit on for types in child dicts, types ascend directly from IDs in the parent to IDs in the child, without interruption (so type 0x4 in the parent is immediately followed by 0x5 in all children). Doing this while retaining useful semantics for modification of parents is challenging. By definition, child type IDs are not known until the parent is written out, but we don't want to find ourselves constrained to adding types to the parent in one go, followed by all child types: that would make the deduplicator a nightmare and would frankly make the entire ctf_add*() interface next to useless: all existing clients that add types at all add types to both parents and children without regard for ordering, and breaking that would probably necessitate redesigning all of them. So we have to be a litle cleverer. We approach this the same way as we approach strings in the recent refs rework: if a parent has children attached (or has ever had them attached since it was created or last read in), any new types created in the parent are assigned provisional IDs starting at the very top of the type space and working down. (Their indexes in the internal libctf arrays remain unchanged, so we don't suddenly need multigigabyte indexes!). At writeout (preserialization) time, we traverse the type table (and all other table containing type IDs) and assign refs to every type ID in exactly the same way we assign refs to every string offset (just a different set of refs -- we don't want to update type IDs with string offset values!). For a parent dict with children, these refs are real entities in memory: pointers to the memory locations where type IDs are stored, tracked in the DTD of each type. As we traverse the type table, we assign real IDs to each type (by simple incrementation), storing those IDs in a new dtd_final_type field in the DTD for each type. Once the type table and all other tables containing type IDs are fully traversed, we update all the refs and overwrite the IDs currently residing in each with the final IDs for each type. That fixes up IDs in the parent dict itself (including forward references in structs and the like: that's why the ref updates only happen at the end); but what about child dicts' references, both to parent types and to their own? We add armouring to enforce that parent dicts are always serialized before their children (which ctf-link.c already does, because it's a precondition for strtab deduplication), and then arrange that when a ref is added to a type whose ID has been assigned (has a dtd_final_type), we just immediately do an update rather than storing a ref for later updating. Since the parent is already serialized, all parent type IDs have a dtd_final_type by this point, and all parent IDs in the children are properly updated. The child types can now be renumbered now we now the number of types in the parent, and their refs updated identically to what was just done with the parent. One wrinkle: before the child refs are updated, while we are working over the child's type section, the type IDs in the child start from 1 (or something like that), which might seem to overlap the parent IDs. But this is not the case: when you serialize the parent, the IDs written out to disk are changed, but the only change to the representation in memory is that we remember a dtd_final_type for each type (and use it to update all the child type refs): its ID in memory is the same as it always was, a nonoverlapping provisional ID higher than any other valid ID. We enforce all of this by asserting that when you add a ref to a type, the memory location that is modified must be in the buffer being serialized: the code will not let you accidentally modify the actual DTDs in memory. We track the number of types in the parent in a new CTFv4 (not BTF) header field (the dumper is updated): we will also use this to open CTFv3 child dicts without change by simply declaring for them that the parent dict has 2^31 types in it (or 2^15, for v2 and below): the IDs in the children then naturally come out right with no other changes needed. (Right now, opening CTFv3 child dicts requires extra compatibility code that has not been written, but that code will no longer need to worry about type ID differences.) Various things are newly forbidden: - you cannot ctf_import() a child into a parent if you already ctf_add()ed types to the child, because all its IDs would change (and since you already cannot ctf_add() types to a child that hasn't had its parent imported, this in practice means only that ctf_create() must be followed immediately by a ctf_import() if this is a new child, which all sane clients were doing anyway). - You cannot import a child into a parent which has the wrong number of (non-provisional) types, again because all its IDs would be wrong: because parents only add types in the provisional space if children are attached to it, this would break the not unknown case of opening an archive, adding types to the parent, and only then importing children into it, so we add a special case: archive members which are not children in an archive with more than one member always pretend to have at least one child, so type additions in them are always provisional even before you ctf_import anything. In practice, this does exactly what we want, since all archives so far are created by the linker and have one parent and N children of that parent. Because this introduces huge gaps between index and type ID for provisional types, some extra assertions are added to ensure that the internal ctf_type_to_index() is only ever called on types in the current dict (never a parent dict): before now, this was just taken on trust, and it was often wrong (which at best led to wrong results, as wrong array indexes were used, and at worst to a buffer overflow). When hash debugging is on (suggesting that the user doesn't mind expensive checks), every ctf_type_to_index() triggers a ctf_index_to_type() to make sure that the operations are proper inverses. Lots and lots of tests are added to verify that assignment works and that updating of every type kind works fine -- existing tests suffice for type IDs in the variable and symtypetab sections. The ld-ctf tests get a bunch of largely display-based updates: various tests refer to 0x8... type IDs, which no longer exist, and because the IDs are shorter all the spacing and alignment has changed.
author: Nick Alcock <nick.alcock@oracle.com> 2025-02-16 19:55:11 +0000
committer: Nick Alcock <nick.alcock@oracle.com> 2025-03-16 15:25:27 +0000
commit: b5d3790c6684a85552e0f57372a2b3b7fae0ed96 (patch)
tree: 4fff67e1ed9cac370730b4ab9bcdac03bc4e0213 /libctf/ctf-create.c
parent: 274cc1f13d67712bdcb749c105a5b3db3c0a8cc0 (diff)
download: binutils-b5d3790c6684a85552e0f57372a2b3b7fae0ed96.zip
binutils-b5d3790c6684a85552e0f57372a2b3b7fae0ed96.tar.gz
binutils-b5d3790c6684a85552e0f57372a2b3b7fae0ed96.tar.bz2
1 files changed, 125 insertions, 11 deletions
diff --git a/libctf/ctf-create.c b/libctf/ctf-create.c
index 1782655..2212c2f 100644
--- a/libctf/ctf-create.c
+++ b/libctf/ctf-create.c
@@ -387,6 +387,99 @@ ctf_rollback (ctf_dict_t *fp, ctf_snapshot_id_t id)
   return 0;
 }
 
+/* Assign an ID to a newly-created type.
+
+   The type ID assignment scheme in libctf divides types into three
+   classes.
+
+   - static types are types read in from an already-existing dict.  They are
+     stored only in the ctf_buf and have type indexes ranging from 1 up to
+     fp->ctf_typemax (usually the same as fp->ctf_stypes, but may be differnt
+     for newly-created children just imported to parents with already-present
+     dynamic types).  Their IDs are derived from their index in the ctf_buf and
+     are not explicitly assigned, though serialization tracks them in order to
+     update type IDs that reference them.
+
+     Type IDs in a child dict start from fp->ctf_header->ctf_parent_typemax
+     (fp->ctf_stypes in the parent).  There is no gap as in CTFv3 and below:
+     the IDs run continuously.
+
+   - dynamic types are added by ctf_add_*() (ultimately, ctf_add_generic) and
+     have DTDs: their type IDs are stored in dtd->dtd_type, and the DTD hashtab
+     is indexed by type ID.
+
+     The simplest form of these types, nonprovisionally-numbered dynamic types,
+     have type IDs stretching from fp->ctf_stypes up to fp->ctf_idmax, and
+     corresponding indexes.  Such types only exist for child dicts and for
+     parent dicts which had types added before any children were imported.
+
+   - As soon as a child is imported, the parent starts allocating provisionally-
+     numbered dynamic types from the top of the type space down, updating
+     ctf_provtypemax and ctf_nprovtypes as it goes, and bumping ctf_typemax:
+     ctf_idmax is no longer bumped.  The child continues to allocate in lower
+     type space starting from the parent's ctf_idmax + 1.  Obviously all
+     references to provisional types can't stick around: so at serialization
+     time we note down the position of every reference to a provisional type ID
+     and all child type IDs, then lay out the type table by going over the
+     nonprovisional types and then the provisional ones and dropping them in
+     place in their serialized buffers, work out what the final type IDs will
+     be, and update all the refs accordingly, changing every type ID that refers
+     to the old type to refer to the new one instead.  (See ctf_serialize.)
+
+     The indexes of provisional types run identically to the indexes of
+     non-provisional types, i.e. straight upwards without breaks or
+     discontinuities, even though this probably overlaps type IDs in the child.
+     Indexes and type IDs are not the same!
+
+   At serialization time, we track references to type IDs in the same dict via
+   the refs system while the type table et al are being built (during
+   preserialization), and update them with the real type IDs at final
+   serialization time; the final type IDs are recorded in the dtd_final_type,
+   and we assert if a future serialization would assign a different ID (which
+   should be impossible).  When child dicts are serialized, references to parent
+   types are updated with the dtd_final_type of that type whenever one is set.
+   It is considered an error to try to serialize a child while its parent has
+   provisional types that have not yet had IDs assigned.
+
+   (The refs system is not employed to track references from child dicts to
+   parents, since forward references are not possible between dicts: the parent
+   dict must have been completely serialized when serializing a child.  We can't
+   be halfway through, which is the case the refs system is there to handle:
+   refs from structure members to types not yet known, etc.)
+
+   Only parents have provisional type IDs!  Child IDs are always simply assigned
+   straight in the child.  This means that the provisional ID space is not
+   sparse, and we don't need to worry about child and parent IDs being
+   interspersed in it.  (Not yet, anyway: if we get multilevel parents this will
+   become a concern).
+
+   Note that you can add types to a parent at any time, even after children have
+   been serialized.  This works fine, except that you cannot use the
+   newly-written dict as a parent for the same children, since they were written
+   out assuming a smaller number of types in the parent.  */
+
+static ctf_id_t
+ctf_assign_id (ctf_dict_t *fp)
+{
+  uint32_t idx;
+
+  /* All type additions increase the max index.  */
+
+  idx = ++fp->ctf_typemax;
+
+  /* Is this a parent with an attached child?  Provisional type.  */
+
+  if (!(fp->ctf_flags & LCTF_CHILD) && (fp->ctf_max_children > 0))
+    {
+      fp->ctf_provtypemax--;
+      fp->ctf_nprovtypes++;
+    }
+  else
+    fp->ctf_idmax++;
+
+  return ctf_index_to_type (fp, idx);
+}
+
 /* Note: vlen is the amount of space *allocated* for the vlen.  It may well not
    be the amount of space used (yet): the space used is declared in per-kind
    fashion in the dtd_data's info word.  */
@@ -396,19 +489,28 @@ ctf_add_generic (ctf_dict_t *fp, uint32_t flag, const char *name, int kind,
 {
   ctf_dtdef_t *dtd;
   ctf_id_t type;
+  ctf_dict_t *pfp = fp;
+
+  if (fp->ctf_parent)
+    pfp = fp->ctf_parent;
 
   if (flag != CTF_ADD_NONROOT && flag != CTF_ADD_ROOT)
     return (ctf_set_typed_errno (fp, EINVAL));
 
-  if (ctf_index_to_type (fp, fp->ctf_typemax) >= CTF_MAX_TYPE)
+  if (fp->ctf_typemax + 1 >= pfp->ctf_provtypemax)
     return (ctf_set_typed_errno (fp, ECTF_FULL));
 
-  if (ctf_index_to_type (fp, fp->ctf_typemax) == (CTF_MAX_PTYPE - 1))
-    return (ctf_set_typed_errno (fp, ECTF_FULL));
+  /* Prohibit addition of types in the middle of serialization.  */
+
+  if (fp->ctf_flags & LCTF_NO_TYPE)
+    return (ctf_set_errno (fp, ECTF_NOTSERIALIZED));
 
   if (fp->ctf_flags & LCTF_NO_STR)
     return (ctf_set_errno (fp, ECTF_NOPARENT));
 
+  if (fp->ctf_flags & LCTF_CHILD && fp->ctf_parent == NULL)
+    return (ctf_set_errno (fp, ECTF_NOPARENT));
+
   /* Prohibit addition of a root-visible type that is already present
      in the non-dynamic portion. */
 
@@ -424,7 +526,7 @@ ctf_add_generic (ctf_dict_t *fp, uint32_t flag, const char *name, int kind,
 
   /* Make sure ptrtab always grows to be big enough for all types.  */
   if (ctf_grow_ptrtab (fp) < 0)
-      return CTF_ERR;				/* errno is set for us. */
+    return CTF_ERR;				/* errno is set for us. */
 
   if ((dtd = calloc (1, sizeof (ctf_dtdef_t))) == NULL)
     return (ctf_set_typed_errno (fp, EAGAIN));
@@ -438,8 +540,7 @@ ctf_add_generic (ctf_dict_t *fp, uint32_t flag, const char *name, int kind,
   else
     dtd->dtd_vlen = NULL;
 
-  type = ++fp->ctf_typemax;
-  type = ctf_index_to_type (fp, type);
+  type = ctf_assign_id (fp);
 
   dtd->dtd_data.ctt_name = ctf_str_add (fp, name);
   dtd->dtd_type = type;
@@ -525,13 +626,14 @@ ctf_add_reftype (ctf_dict_t *fp, uint32_t flag, ctf_id_t ref, uint32_t kind)
 {
   ctf_dtdef_t *dtd;
   ctf_id_t type;
-  ctf_dict_t *tmp = fp;
+  ctf_dict_t *typedict = fp;
+  ctf_dict_t *refdict = fp;
   int child = fp->ctf_flags & LCTF_CHILD;
 
   if (ref == CTF_ERR || ref > CTF_MAX_TYPE)
     return (ctf_set_typed_errno (fp, EINVAL));
 
-  if (ref != 0 && ctf_lookup_by_id (&tmp, ref) == NULL)
+  if (ref != 0 && ctf_lookup_by_id (&refdict, ref) == NULL)
     return CTF_ERR;		/* errno is set for us.  */
 
   if ((type = ctf_add_generic (fp, flag, NULL, kind, 0, &dtd)) == CTF_ERR)
@@ -549,8 +651,9 @@ ctf_add_reftype (ctf_dict_t *fp, uint32_t flag, ctf_id_t ref, uint32_t kind)
      addition of this type.  The pptrtab is lazily-updated as needed, so is not
      touched here.  */
 
-  uint32_t type_idx = ctf_type_to_index (fp, type);
-  uint32_t ref_idx = ctf_type_to_index (fp, ref);
+  typedict = ctf_get_dict (fp, type);
+  uint32_t type_idx = ctf_type_to_index (typedict, type);
+  uint32_t ref_idx = ctf_type_to_index (refdict, ref);
 
   if (ctf_type_ischild (fp, ref) == child
       && ref_idx < fp->ctf_typemax)
@@ -1137,6 +1240,9 @@ ctf_add_member_offset (ctf_dict_t *fp, ctf_id_t souid, const char *name,
   if (fp->ctf_flags & LCTF_NO_STR)
     return (ctf_set_errno (fp, ECTF_NOPARENT));
 
+  if (fp->ctf_flags & LCTF_NO_TYPE)
+    return (ctf_set_errno (fp, ECTF_NOTSERIALIZED));
+
   if ((fp->ctf_flags & LCTF_CHILD) && ctf_type_isparent (fp, souid))
     {
       /* Adding a child type to a parent, even via the child, is prohibited.
@@ -1367,6 +1473,9 @@ ctf_add_variable (ctf_dict_t *fp, const char *name, ctf_id_t ref)
   if (fp->ctf_flags & LCTF_NO_STR)
     return (ctf_set_errno (fp, ECTF_NOPARENT));
 
+  if (fp->ctf_flags & LCTF_NO_TYPE)
+    return (ctf_set_errno (fp, ECTF_NOTSERIALIZED));
+
   if (ctf_lookup_variable_here (fp, name) != CTF_ERR)
     return (ctf_set_errno (fp, ECTF_DUPLICATE));
 
@@ -1390,6 +1499,9 @@ ctf_add_funcobjt_sym_forced (ctf_dict_t *fp, int is_function, const char *name,
   if (fp->ctf_flags & LCTF_NO_STR)
     return (ctf_set_errno (fp, ECTF_NOPARENT));
 
+  if (fp->ctf_flags & LCTF_NO_TYPE)
+    return (ctf_set_errno (fp, ECTF_NOTSERIALIZED));
+
   if (ctf_lookup_by_id (&tmp, id) == NULL)
     return -1;				/* errno is set for us.  */
 
@@ -1541,7 +1653,9 @@ membcmp (const char *name, ctf_id_t type _libctf_unused_, unsigned long offset,
 
    Our OOM handling here is just to not do anything, because this is called deep
    enough in the call stack that doing anything useful is painfully difficult:
-   the worst consequence if we do OOM is a bit of type duplication anyway.  */
+   the worst consequence if we do OOM is a bit of type duplication anyway.
+   The non-imported checks are just paranoia and should never be able to
+   happen, but if they do we don't want a coredump.  */
 
 static void
 ctf_add_type_mapping (ctf_dict_t *src_fp, ctf_id_t src_type,
author	Nick Alcock <nick.alcock@oracle.com>	2025-02-16 19:55:11 +0000
committer	Nick Alcock <nick.alcock@oracle.com>	2025-03-16 15:25:27 +0000
commit	b5d3790c6684a85552e0f57372a2b3b7fae0ed96 (patch)
tree	4fff67e1ed9cac370730b4ab9bcdac03bc4e0213 /libctf/ctf-create.c
parent	274cc1f13d67712bdcb749c105a5b3db3c0a8cc0 (diff)
download	binutils-b5d3790c6684a85552e0f57372a2b3b7fae0ed96.zip binutils-b5d3790c6684a85552e0f57372a2b3b7fae0ed96.tar.gz binutils-b5d3790c6684a85552e0f57372a2b3b7fae0ed96.tar.bz2