aboutsummaryrefslogtreecommitdiff
path: root/libctf/ctf-link.c
AgeCommit message (Collapse)AuthorFilesLines
13 dayslibctf: link: suppress symtypetab emission when emitting BTFNick Alcock1-0/+15
BTF has no symtypetabs, so we should skip symtypetab emission entirely when we know in advance that we are emitting it: there is no point wasting time collecting linker symbol info or strtab content; and doing so can result in lookups of external symbols in the external strtab or emission of offsets into it, and BTF doesn't have an external strtab. Simply suppressing the collection of any external strtab info eliminates the possibility of accidentally emitting offsets into that nonexistent strtab.
13 dayslibctf: link: fix we-wrote-BTF reportingNick Alcock1-1/+1
This was always returning 0, even if it wrote BTF successfully.
2025-04-01libctf: more output format controlNick Alcock1-2/+39
The new API function ctf_link_output_is_btf lets you determine whether the result of a ctf_link is likely to be written out as CTF or BTF before the write takes place: the new is_btf argument to ctf_link_write lets you find out whether it actually was (things like compression that only ctf_link_write is told about may cause last-minute changes in the decision). This requires us to split preserialization in two, with the portion that determines whether a serialized dict is BTF-compatible or not moving into a new internal function ctf_serialize_output_format, called by ctf_link_output_is_btf. We move the state used to communicate between serialization passes into a new sub-struct at the same time.
2025-03-24fixup! libctf: dedup: support cu-mapped emission into the parent dict (no ↵Nick Alcock1-1/+5
children)
2025-03-20libctf: lots and lots of compilation error fixesNick Alcock1-4/+4
More to come.
2025-03-20libctf: drop most variable deletionNick Alcock1-245/+8
Needs reimplementation at late dedup (writeout) time: we can't do it when serializing any more, since we have to do it by *not emitting* variables that duplicate entries in the symtypetab, and we can't do that until we know which symbols (if any) the linker has reported. Will do later. Not needed for kernel links anyway, since they have empty symtypetabs (since the ctfarchive/btfarchive "linker" reports no symbols).
2025-03-20libctf: ctf-type / ctf-create: BTF / CTFv4 wipNick Alcock1-3/+4
This huge change transforms ctf-type.c and ctf-create.c to handle BTF, including datasec and tag support. I don't think I missed anything, but I haven't audited for things I missed yet... I'm still working on ctf_tag(), but that's the last piece. This is a much bigger change than expected because of type prefixes. We need type prefixes even before CTFv4 because we want to be able to pass pahole information on conflicted types, and in CTFv4 these are implemented with a CTF_K_CONFLICTING prefix type. As long as we're doing that, let's implement CTF_K_BIG as well... and with that there, suddenly we have to worry about having both at once on one type, and the existing DTD representation with one ctf_type_t starts to look seriously inadequate. So we adjust the DTD so that the ctf_type_t and vlen are contained *within* a buffer which is identical to the on-disk representation, except only that it might be unconditionally CTF_K_BIG while in memory or something like that. We can then trivially search this buffer for type headers using a simple increment (since all the type headers are at the start, followed by the vlen region), which makes all the rest much easier: e.g. most of the repetitive code in ctf-types.c's type handling and in type creation can get moved into common code, and there is finally almost no distinction at all between static and dynamic types. This also makes it easy to make types dynamic on the fly later, which will be crucial if we ever want to add variables to pre-existing dicts (since that means adding to a pre-existing datasec as well). A bunch of APIs, particularly around iter_f functions, have changed: everything that takes a type now takes a dict as well, because the lack of such was incredibly annoying. Uses in libctf (particularly in the dumper and in ctf_add_type) have been adjusted, but not yet simplified as they could be now they have a dict more easily available. Absolutely does not compile, but should show where we're going. Thanks to Bruce McCulloch <bruce.mcculloch@oracle.com> for a whole heap of creation and querying functions.
2025-02-28libctf: move string deduplication into ctf-archiveNick Alcock1-42/+6
This means that any archive containing dicts can get its strings dedupped together, rather than only those that are ctf_linked. (For now, we are still constrained to ctf_linked archives, since fixing that requires further changes to ctf_dedup_strings: but this gives us the first half of what is necessary.) libctf/ * ctf-link.c (ctf_link_write): Move string dedup into... * ctf-archive.c (ctf_arc_preserialize): ... this new function. (ctf_arc_write_fd): Call it.
2025-02-28libctf: string: refs reworkNick Alcock1-3/+5
This commit moves provisional (not-yet-serialized) string refs towards the scheme to be used for CTF IDs in the future. In particular - provisional string offsets now count downwards from just under the external string offset space (all bits on but the high bit). This makes it possible to detect an overflowing strtab, and also makes it trivial to determine whether any string offset (ref) updates were missed -- where before we might get a slightly corrupted or incorrect string, we now get a huge high strtab offset corresponding to no string, and an error is emitted at read time. - refs are emitted at serialization time during the pass through the types. They are strictly associated with the newly-written-out buffer: the existing opened CTF dict is not changed, though it does still get the new strtab so that new refs to the same string can just refer directly to it. The provisional strtab hash table that contains these strings is not deleted after serialization (because we might serialize again): instead, we keep track in the parent of the lowest-yet-used ("latest") provisional strtab offset, and any strtab offset above that, but not external (high-bit-on) is considered provisional. This is sort-of-enforced by moving most of the ref-addition function declarations (including ctf_str_add_ref) to a new ctf-ref.h, which is not included by ctf-create.c or ctf-open.c. - because we don't add refs when adding types, we don't need to handle the case where we add things to expanding vlens (enums, struct members) and have to realloc() them. So the entire painful movable refs system can just be deleted, along with the ability to remove refs piecemeal at all (purging all of them is still possible). Strings added during type addition are added via ctf_str_add(), which adds no refs: the strings are picked up at serialization time and refs to their final, serialized resting place added. The DTDs never have any refs in them, and their provisional strtab offsets are never updated by the ref system. This caused several bugs to fall out of the earlier work and get fixed. In particular, attempts to look up a string in a child dict now search the parent's provisional strtab too: we add some extra special casing for the null string so we don't need to worry about deduplication moving it somewhere other than offset zero. Finally, the optimization that removes an unreferenced synthetic external strtab (the record of the strings the linker has told us about, kept around internally for lookup during late serialization) is faulty: references to a strtab entry will only produce CTF-level refs if their value might change, and an external string's offset won't change, so it produces no refs: worse yet, even if we did get a ref (say, if the string was originally believed to be internal and only later were we told that the linker knew about it too), when we serialize a strtab, all its refs are dropped (since they've been updated and can no longer change); so if we serialized it a second time, its synthetic external strtab would be considered empty and dropped, even though the same external strings as before still exist, referencing it. We must keep the synthetic external strtab around as long as external strings exist that reference it, i.e. for the life of the dict. One benefit of all this: now we're emitting provisional string offsets at a really high value, it's out of the way of the consecutive, deduplicated string offsets in child dicts. So we can drop the constraint that you cannot add strings to a dict with children, which allows us to add types freely to parent dicts again. What you can't do is write that dict out again: when we serialize, we currently update the dict being serialized with the updated strtabs: when you write a dict out, its provisional strings become real strings, and suddenly the offsets would overlap once more. But opening a dict and its children, adding to it, and then writing it out again is rare indeed, and we have a workaround: anyone wanting to do this can just use ctf_link instead.
2025-02-28libctf: move ctf_elf*_to_link_sym to ctf-link.cNick Alcock1-0/+86
Everything in ctf-util.c is in some way associated with data structures in general in some way, except for the ctf_*_to_link_sym functions, which are straight translators between the Elf*_Sym type and the ctf_link_sym_t type used by ctf-link.c. Move them into ctf-link.c where they belong.
2025-02-28libctf: actually deduplicate the strtabNick Alcock1-7/+43
This commit finally implements strtab deduplication, putting together all the pieces assembled in the earlier commits. The magic is entirely localized to ctf_link_write, which preserializes all the dicts (parent first), and calls ctf_dedup_strings on the parent. (The error paths get tweaked a bit too.) Calling ctf_dedup_strings has implications elsewhere: the lifetime rules for the inputs versus outputs change a bit now that the child output dicts contain references to the parent dict's atoms table. We also pre-purge movable refs from all the deduplicated strings before freeing any of this because movable refs contain backreferences into the dict they came from, which means the parent contains references to all the children! Purging the refs first makes those references go away so we can free the children without creating any wild pointers, even temporarily. There's a new testcase that identifies a regression whereby offset 0 (the null string) and index 0 (in children now often the parent dict name, ".ctf") got mixed up, leading to anonymous structs and unions getting the not entirely C-valid name ".ctf" instead. May other testcases get adjusted to no longer depend on the precise layout of the strtab. TODO: add new tests to verify that strings are actually being deduplicated. libctf/ * ctf-link.c (ctf_link_write): Deduplicate strings. * ctf-open.c (ctf_dict_close): Free refs, then the link outputs, then the out cu_mapping, then the inputs, in that order. * ctf-string.c (ctf_str_purge_refs): Not static any more. * ctf-impl.h: Declare it. ld/ * testsuite/ld-ctf/conflicting-cycle-2.A-1.d: Don't depend on strtab contents. * testsuite/ld-ctf/conflicting-cycle-2.A-2.d: Likewise. * testsuite/ld-ctf/conflicting-cycle-2.parent.d: Likewise. * testsuite/ld-ctf/conflicting-cycle-3.C-1.d: Likewise. * testsuite/ld-ctf/conflicting-cycle-3.C-2.d: Likewise. * testsuite/ld-ctf/anonymous-conflicts*: New test.
2025-02-28libctf, archive, link: fix parent importingNick Alcock1-38/+72
We are about to move to a regime where there are very few things you can do with most dicts before you ctf_import them. So emit a warning if ctf_archive_next()'s convenience ctf_import of parents fails. Rip out the buggy code in ctf_link_deduplicating_open_inputs which opened the parent by hand (with a hardwired name), and instead rely on ctf_archive_next to do it for us (which also means we don't end up opening it twice, once in ctf_archive_next, once in ctf_link_deduplicating_open_inputs). While we're there, arrange to close the inputs we already opened if opening of some inputs fails, rather than leaking them. (There are still some leaks here, so add a comment to remind us to clean them up later.) libctf/ * ctf-archive.c (ctf_arc_import_parent): Emit a warning if importing fails. * ctf-link.c (ctf_link_deduplicating_open_inputs): Rely on the ctf_archive_next to open parent dicts.
2025-01-01Update year range in copyright notice of binutils filesAlan Modra1-1/+1
2024-07-31libctf, include: add ctf_dict_set_flag: less enum dup checking by defaultNick Alcock1-8/+23
The recent change to detect duplicate enum values and return ECTF_DUPLICATE when found turns out to perturb a great many callers. In particular, the pahole-created kernel BTF has the same problem we historically did, and gleefully emits duplicated enum constants in profusion. Handling the resulting duplicate errors from BTF -> CTF converters reasonably is unreasonably difficult (it amounts to forcing them to skip some types or reimplement the deduplicator). So let's step back a bit. What we care about mostly is that the deduplicator treat enums with conflicting enumeration constants as conflicting types: programs that want to look up enumeration constant -> value mappings using the new APIs to do so might well want the same checks to apply to any ctf_add_* operations they carry out (and since they're *using* the new APIs, added at the same time as this restriction was imposed, there is likely to be no negative consequence of this). So we want some way to allow processes that know about duplicate detection to opt into it, while allowing everyone else to stay clear of it: but we want ctf_link to get this behaviour even if its caller has opted out. So add a new concept to the API: dict-wide CTF flags, set via ctf_dict_set_flag, obtained via ctf_dict_get_flag. They are not bitflags but simple arbitrary integers and an on/off value, stored in an unspecified manner (the one current flag, we translate into an LCTF_* flag value in the internal ctf_dict ctf_flags word). If you pass in an invalid flag or value you get a new ECTF_BADFLAG error, so the caller can easily tell whether flags added in future are valid with a particular libctf or not. We check this flag in ctf_add_enumerator, and set it around the link (including on child per-CU dicts). The newish enumerator-iteration test is souped up to check the semantics of the flag as well. The fact that the flag can be set and unset at any time has curious consequences. You can unset the flag, insert a pile of duplicates, then set it and expect the new duplicates to be detected, not only by ctf_add_enumerator but also by ctf_lookup_enumerator. This means we now have to maintain the ctf_names and conflicting_enums enum-duplication tracking as new enums are added, not purely as the dict is opened. Move that code out of init_static_types_internal and into a new ctf_track_enumerator function that addition can also call. (None of this affects the file format or serialization machinery, which has to be able to handle duplicate enumeration constants no matter what.) include/ * ctf-api.h (CTF_ERRORS) [ECTF_BADFLAG]: New. (ECTF_NERR): Update. (CTF_STRICT_NO_DUP_ENUMERATORS): New flag. (ctf_dict_set_flag): New function. (ctf_dict_get_flag): Likewise. libctf/ * ctf-impl.h (LCTF_STRICT_NO_DUP_ENUMERATORS): New flag. (ctf_track_enumerator): Declare. * ctf-dedup.c (ctf_dedup_emit_type): Set it. * ctf-link.c (ctf_create_per_cu): Likewise. (ctf_link_deduplicating_per_cu): Likewise. (ctf_link): Likewise. (ctf_link_write): Likewise. * ctf-subr.c (ctf_dict_set_flag): New function. (ctf_dict_get_flag): New function. * ctf-open.c (init_static_types_internal): Move enum tracking to... * ctf-create.c (ctf_track_enumerator): ... this new function. (ctf_add_enumerator): Call it. * libctf.ver: Add the new functions. * testsuite/libctf-lookup/enumerator-iteration.c: Test them.
2024-07-31libctf: link: remember to turn off the LCTF_LINKING flag after ctf_link_writeNick Alcock1-0/+4
We set this flag at the top of ctf_link_write (to tell ctf_serialize, way down under the archive file writing functions, to do the various link- time serialization things like symbol filtering and the like), but we never remember to clear it except on error. This is probably bad if you want to serialize the dict yourself directly in the future after linking it (which is... definitely a *possible* use of the API, if rather strange). libctf/ * ctf-link.c (ctf_link_write): Clear LCTF_LINKING before exit.
2024-07-31libctf: link: fix error handlingNick Alcock1-1/+1
We were calling the wrong error function if opening failed, causing leaks. libctf/ * ctf-link.c (ctf_link_deduplicating_per_cu): Fix error handling.
2024-07-31libctf, dedup: drop unnecessary arg from ctf_dedup()Nick Alcock1-2/+2
The PARENTS arg is carefully passed down through all the layers of hash functions and then never used for anything. (In the distant past it was used for cycle detection, but the algorithm eventually committed doesn't need to do cycle detection...) The PARENTS arg is still used by ctf_dedup_emit(), but even there we can loosen the requirements and state that you can just leave entries corresponding to dicts with no parents at zero (which will be useful in an upcoming commit). libctf/ * ctf-dedup.c (ctf_dedup_hash_type): Drop PARENTS arg. (ctf_dedup_rhash_type): Likewise. (ctf_dedup): Likewise. (ctf_dedup_emit_struct_members): Mention what you can do to PARENTS entries for parent dicts. * ctf-impl.h (ctf_dedup): Adjust accordingly. * ctf-link.c (ctf_link_deduplicating_per_cu): Likewise. (ctf_link_deduplicating): Likewise.
2024-04-19libctf: don't pass errno into ctf_err_warn so oftenNick Alcock1-11/+11
The libctf-internal warning function ctf_err_warn() can be passed a libctf errno as a parameter, and will add its textual errmsg form to the passed-in error message. But if there is an error on the fp already, and this is specifically an error and not a warning, ctf_err_warn() will print the error out regardless: there's no need to pass in anything but 0. There are still a lot of places where we do ctf_err_warn (fp, 0, EFOO, ...); return ctf_set_errno (fp, 0, EFOO); I've left all of those alone, because fixing it makes the code a bit longer: but fixing the cases where no return is involved and the error has just been set on the fp itself costs nothing and reduces redundancy a bit. libctf/ * ctf-dedup.c (ctf_dedup_walk_output_mapping): Drop the errno arg. (ctf_dedup_emit): Likewise. (ctf_dedup_type_mapping): Likewise. * ctf-link.c (ctf_create_per_cu): Likewise. (ctf_link_deduplicating_close_inputs): Likewise. (ctf_link_deduplicating_one_symtypetab): Likewise. (ctf_link_deduplicating_per_cu): Likewise. * ctf-lookup.c (ctf_lookup_symbol_idx): Likewise. * ctf-subr.c (ctf_assert_fail_internal): Likewise.
2024-04-19libctf: delete LCTF_DIRTYNick Alcock1-2/+0
This flag was meant as an optimization to avoid reserializing dicts unnecessarily. It was critically necessary back when serialization was done by ctf_update() and you had to call that every time you wanted any new modifications to the type table to be usable by other types, but that has been unnecessary for years now, and serialization is only done once when writing out, which one would naturally assume would always serialize the dict. Worse, it never really worked: it only tracked newly-added types, not things like added symbols which might equally well require reserialization, and it gets in the way of an upcoming change. Delete entirely. libctf/ * ctf-create.c (ctf_create): Drop LCTF_DIRTY. (ctf_discard): Likewise. (ctf_rollback): Likewise. (ctf_add_generic): Likewise. (ctf_set_array): Likewise. (ctf_add_enumerator): Likewise. (ctf_add_member_offset): Likewise. (ctf_add_variable_forced): Likewise. * ctf-link.c (ctf_link_intern_extern_string): Likewise. (ctf_link_add_strtab): Likewise. * ctf-serialize.c (ctf_serialize): Likewise. * ctf-impl.h (LCTF_DIRTY): Likewise. (LCTF_LINKING): Renumber.
2024-04-19libctf: support addition of types to dicts read via ctf_open()Nick Alcock1-1/+13
libctf has long declared deserialized dictionaries (out of files or ELF sections or memory buffers or whatever) to be read-only: back in the furthest prehistory this was not the case, in that you could add a few sorts of type to such dicts, but attempting to do so often caused horrible memory corruption, so I banned the lot. But it turns out real consumers want it (notably DTrace, which synthesises pointers to types that don't have them and adds them to the ctf_open()ed dicts if it needs them). Let's bring it back again, but without the memory corruption and without the massive code duplication required in days of yore to distinguish between static and dynamic types: the representation of both types has been identical for a few years, with the only difference being that types as a whole are stored in a big buffer for types read in via ctf_open and per-type hashtables for newly-added types. So we discard the internally-visible concept of "readonly dictionaries" in favour of declaring the *range of types* that were already present when the dict was read in to be read-only: you can't modify them (say, by adding members to them if they're structs, or calling ctf_set_array on them), but you can add more types and point to them. (The API remains the same, with calls sometimes returning ECTF_RDONLY, but now they do so less often.) This is a fairly invasive change, mostly because code written since the ban was introduced didn't take the possibility of a static/dynamic split into account. Some of these irregularities were hard to define as anything but bugs. Notably: - The symbol handling was assuming that symbols only needed to be looked for in dynamic hashtabs or static linker-laid-out indexed/ nonindexed layouts, but now we want to check both in case people added more symbols to a dict they opened. - The code that handles type additions wasn't checking to see if types with the same name existed *at all* (so you could do ctf_add_typedef (fp, "foo", bar) repeatedly without error). This seems reasonable for types you just added, but we probably *do* want to ban addition of types with names that override names we already used in the ctf_open()ed portion, since that would probably corrupt existing type relationships. (Doing things this way also avoids causing new errors for any existing code that was doing this sort of thing.) - ctf_lookup_variable entirely failed to work for variables just added by ctf_add_variable: you had to write the dict out and read it back in again before they appeared. - The symbol handling remembered what symbols you looked up but didn't remember their types, so you could look up an object symbol and then find it popping up when you asked for function symbols, which seems less than ideal. Since we had to rejig things enough to be able to distinguish function and object symbols internally anyway (in order to give suitable errors if you try to add a symbol with a name that already existed in the ctf_open()ed dict), this bug suddenly became more visible and was easily fixed. We do not (yet) support writing out dicts that have been previously read in via ctf_open() or other deserializer (you can look things up in them, but not write them out a second time). This never worked, so there is no incompatibility; if it is needed at a later date, the serializer is a little bit closer to having it work now (the only table we don't deal with is the types table, and that's because the upcoming CTFv4 changes are likely to make major changes to the way that table is represented internally, so adding more code that depends on its current form seems like a bad idea). There is a new testcase that tests much of this, in particular that modification of existing types is still banned and that you can add new ones and chase them without error. libctf/ * ctf-impl.h (struct ctf_dict.ctf_symhash): Split into... (ctf_dict.ctf_symhash_func): ... this and... (ctf_dict.ctf_symhash_objt): ... this. (ctf_dict.ctf_stypes): New, counts static types. (LCTF_INDEX_TO_TYPEPTR): Use it instead of CTF_RDWR. (LCTF_RDWR): Deleted. (LCTF_DIRTY): Renumbered. (LCTF_LINKING): Likewise. (ctf_lookup_variable_here): New. (ctf_lookup_by_sym_or_name): Likewise. (ctf_symbol_next_static): Likewise. (ctf_add_variable_forced): Likewise. (ctf_add_funcobjt_sym_forced): Likewise. (ctf_simple_open_internal): Adjust. (ctf_bufopen_internal): Likewise. * ctf-create.c (ctf_grow_ptrtab): Adjust a lot to start with. (ctf_create): Migrate a bunch of initializations into bufopen. Force recreation of name tables. Do not forcibly override the model, let ctf_bufopen do it. (ctf_static_type): New. (ctf_update): Drop LCTF_RDWR check. (ctf_dynamic_type): Likewise. (ctf_add_function): Likewise. (ctf_add_type_internal): Likewise. (ctf_rollback): Check ctf_stypes, not LCTF_RDWR. (ctf_set_array): Likewise. (ctf_add_struct_sized): Likewise. (ctf_add_union_sized): Likewise. (ctf_add_enum): Likewise. (ctf_add_enumerator): Likewise (only on the target dict). (ctf_add_member_offset): Likewise. (ctf_add_generic): Drop LCTF_RDWR check. Ban addition of types with colliding names. (ctf_add_forward): Note safety under the new rules. (ctf_add_variable): Split all but the existence check into... (ctf_add_variable_forced): ... this new function. (ctf_add_funcobjt_sym): Likewise... (ctf_add_funcobjt_sym_forced): ... for this new function. * ctf-link.c (ctf_link_add_linker_symbol): Ban calling on dicts with any stypes. (ctf_link_add_strtab): Likewise. (ctf_link_shuffle_syms): Likewise. (ctf_link_intern_extern_string): Note pre-existing prohibition. * ctf-lookup.c (ctf_lookup_by_id): Drop LCTF_RDWR check. (ctf_lookup_variable): Split out looking in a dict but not its parent into... (ctf_lookup_variable_here): ... this new function. (ctf_lookup_symbol_idx): Track whether looking up a function or object: cache them separately. (ctf_symbol_next): Split out looking in non-dynamic symtypetab entries to... (ctf_symbol_next_static): ... this new function. Don't get confused by the simultaneous presence of static and dynamic symtypetab entries. (ctf_try_lookup_indexed): Don't waste time looking up symbols by index before there can be any idea how symbols are numbered. (ctf_lookup_by_sym_or_name): Distinguish between function and data object lookups. Drop LCTF_RDWR. (ctf_lookup_by_symbol): Adjust. (ctf_lookup_by_symbol_name): Likewise. * ctf-open.c (init_types): Rename to... (init_static_types): ... this. Drop LCTF_RDWR. Populate ctf_stypes. (ctf_simple_open): Drop writable arg. (ctf_simple_open_internal): Likewise. (ctf_bufopen): Likewise. (ctf_bufopen_internal): Populate fields only used for writable dicts. Drop LCTF_RDWR. (ctf_dict_close): Cater for symhash cache split. * ctf-serialize.c (ctf_serialize): Use ctf_stypes, not LCTF_RDWR. * ctf-types.c (ctf_variable_next): Drop LCTF_RDWR. * testsuite/libctf-lookup/add-to-opened*: New test.
2024-04-17libctf warningsAlan Modra1-1/+1
Seen with every compiler I have if using -fno-inline: home/alan/src/binutils-gdb/libctf/ctf-create.c: In function ‘ctf_add_encoded’: /home/alan/src/binutils-gdb/libctf/ctf-create.c:555:3: warning: ‘encoding’ may be used uninitialized [-Wmaybe-uninitialized] 555 | memcpy (dtd->dtd_vlen, &encoding, sizeof (encoding)); Seen with gcc-4.9 and probably others at lower optimisation levels: home/alan/src/binutils-gdb/libctf/ctf-serialize.c: In function 'symtypetab_density': /home/alan/src/binutils-gdb/libctf/ctf-serialize.c:211:18: warning: 'sym' may be used uninitialized in this function [-Wmaybe-uninitialized] if (*max < sym->st_symidx) Seen with gcc-4.5 and probably others at lower optimisation levels: /home/alan/src/binutils-gdb/libctf/ctf-types.c:1649:21: warning: 'tp' may be used uninitialized in this function /home/alan/src/binutils-gdb/libctf/ctf-link.c:765:16: warning: 'parent_i' may be used uninitialized in this function Also with gcc-4.5: In file included from /home/alan/src/binutils-gdb/libctf/ctf-endian.h:25:0, from /home/alan/src/binutils-gdb/libctf/ctf-archive.c:24: /home/alan/src/binutils-gdb/libctf/swap.h:70:0: warning: "_Static_assert" redefined /usr/include/sys/cdefs.h:568:0: note: this is the location of the previous definition * swap.h (_Static_assert): Don't define if already defined. * ctf-serialize.c (symtypetab_density): Merge two CTF_SYMTYPETAB_FORCE_INDEXED blocks. * ctf-create.c (ctf_add_encoded): Avoid "encoding" may be used uninitialized warning. * ctf-link.c (ctf_link_deduplicating_open_inputs): Avoid "parent_i" may be used uninitialized warning. * ctf-types.c (ctf_type_rvisit): Avoid "tp" may be used uninitialized warning.
2024-01-04Update year range in copyright notice of binutils filesAlan Modra1-1/+1
Adds two new external authors to etc/update-copyright.py to cover bfd/ax_tls.m4, and adds gprofng to dirs handled automatically, then updates copyright messages as follows: 1) Update cgen/utils.scm emitted copyrights. 2) Run "etc/update-copyright.py --this-year" with an extra external author I haven't committed, 'Kalray SA.', to cover gas testsuite files (which should have their copyright message removed). 3) Build with --enable-maintainer-mode --enable-cgen-maint=yes. 4) Check out */po/*.pot which we don't update frequently.
2023-11-20libctf: adding CU mappings should be idempotentNick Alcock1-2/+16
When CTF finds conflicting types, it usually shoves each definition into a CTF dictionary named after the compilation unit. The intent of the obscure "cu-mapped link" feature is to allow you to implement custom linkers that shove the definitions into other, more coarse-grained units (say, one per kernel module, even if a module consists of more than one compilation unit): conflicting types within one of these larger components are hidden from name lookup so you can only look up (an arbitrary one of) them by name, but can still be found by chasing type graph links and are still fully deduplicated. You do this by calling ctf_link_add_cu_mapping (fp, "CU name", "bigger lump name"), repeatedly, with different "CU name"s: the ctf_link() following that will put all conflicting types found in "CU name"s sharing a "bigger lump name" into a child dict in an archive member named "bigger lump name". So it's clear enough what happens if you call it repeatedly with the same "bigger lump name" more than once, because that's the whole point of it: but what if you call it with the same "CU name" repeatedly? ctf_link_add_cu_mapping (fp, "CU name", "bigger lump name"); ctf_link_add_cu_mapping (fp, "CU name", "other name"); This is meant to be the same as just doing the second of these, as if the first was never called. Alas, this isn't what happens, and what you get is instead a bit of an inconsistent mess: more or less, the first takes precedence, which is the exact opposite of what we wanted. Fix this to work the right way round. (I plan to add support for CU-mapped links to GNU ld, mainly so that we can properly *test* this machinery.) libctf/ChangeLog: * ctf-link.c (ctf_create_per_cu): Note the behaviour of repeatedly adding FROMs. (ctf_link_add_cu_mapping): Implement that behavour.
2023-10-17libctf: Sanitize error types for PR 30836Torbjörn SVENSSON1-10/+5
Made sure there is no implicit conversion between signed and unsigned return value for functions setting the ctf_errno value. An example of the problem is that in ctf_member_next, the "offset" value is either 0L or (ctf_id_t)-1L, but it should have been 0L or -1L. The issue was discovered while building a 64 bit ld binary to be executed on the Windows platform. Example object file that demonstrates the issue is attached in the PR. libctf/ Affected functions adjusted. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com> Co-Authored-By: Yvan ROUX <yvan.roux@foss.st.com>
2023-04-08libctf, link: fix CU-mapped links with CTF_LINK_EMPTY_CU_MAPPINGSNick Alcock1-10/+11
This is a bug in the intersection of two obscure options that cannot even be invoked from ld with a feature added to stop ld of the same input file repeatedly from crashing the linker. The latter fix involved tracking input files (internally to libctf) not just with their input CU name but with a version of their input CU name that was augmented with a numeric prefix if their linker input file name was changed, to prevent distinct CTF dicts with the same cuname from overwriting each other. (We can't use just the linker input file name because one linker input can contain many CU dicts, particularly under ld -r). If these inputs then produced conflicting types, those types were emitted into similarly-named output dicts, so we needed similar machinery to detect clashing output dicts and add a numeric prefix to them as well. This works fine, except that if you used the cu-mapping feature to force double-linking of CTF (so that your CTF can be grouped into output dicts larger than a single translation unit) and then also used CTF_LINK_EMPTY_CU_MAPPINGS to force every possible output dict in the mapping to be created (even if empty), we did the creation of empty dicts first, and then all the actual content got considered to be a clash. So you ended up with a pile of useless empty dicts and then all the content was in full dicts with the same names suffixed with a #0. This seems likely to confuse consumers that use this facility. Fixed by generating all the EMPTY_CU_MAPPINGS empty dicts after linking is complete, not before it runs. No impact on ld, which does not do cu-mapped links or pass CTF_LINK_EMPTY_CU_MAPPINGS to ctf_link(). libctf/ * ctf-link.c (ctf_create_per_cu): Don't create new dicts iff one already exists and we are making one for no input in particular. (ctf_link): Emit empty CTF dicts corresponding to no input in particular only after linkiing is complete.
2023-01-12libctf: ctf-link outdated input check faultyNick Alcock1-6/+29
This check has a pair of faults which, combined, can lead to memory corruption. Firstly, it assumes that the values of the ctf_link_inputs hash are ctf_dict_t's: they are not, they are ctf_link_input_t's, a much shorter structure. So the flags check which is the core of this is faulty (but happens, by chance, to give the right output on most architectures, since usually we happen to get a 0 here, so the test that checks this usually passes). Worse, the warning that is emitted when the test fails is added to the wrong dict -- it's added to the input dict, whose warning list is never consumed, rendering the whole check useless. But the dict it adds to is still the wrong type, so we end up overwriting something deep in memory (or, much more likely, dereferencing a garbage pointer and crashing). Fixing both reveals another problem: the link input is an *archive* consisting of multiple members, so we have to consider whether to check all of them for the outdated-func-info thing we are checking here. However, no compiler exists that emits a mixture of members with this flag on and members with it off, and the linker always reserializes (and upgrades) such things when it sees them: so all members in a given archive must have the same value of the flag, so we only need to check one member per input archive. libctf/ PR libctf/29983 * ctf-link.c (ctf_link_warn_outdated_inputs): Get the types of members of ctf_link_inputs right, fixing a possible spurious tesst failure / wild pointer deref / overwrite. Emit the warning message into the right dict.
2023-01-01Update year range in copyright notice of binutils filesAlan Modra1-1/+1
The newer update-copyright.py fixes file encoding too, removing cr/lf on binutils/bfdtest2.c and ld/testsuite/ld-cygwin/exe-export.exp, and embedded cr in binutils/testsuite/binutils-all/ar.exp string match.
2022-12-08libctf: avoid potential double freeAlan Modra1-1/+4
* ctf-link.c (ctf_link_add_cu_mapping): Set t NULL after free.
2022-08-01libctf: Avoid use of uninitialised variablesAlan Modra1-6/+10
* ctf-link.c (ctf_link_add_ctf_internal): Don't free uninitialised pointers.
2022-06-21libctf: fix linking together multiple objects derived from the same sourceNick Alcock1-33/+128
Right now, if you compile the same .c input repeatedly with CTF enabled and different compilation flags, then arrange to link all of these together, then things misbehave in various ways. libctf may conflate either inputs (if the .o files have the same name, say if they are stored in different .a archives), or per-CU outputs when conflicting types are found: the latter can lead to entirely spurious errors when it tries to produce multiple per-CU outputs with the same name (discarding all but the last, but then looking for types in the earlier ones which have just been thrown away). Fixing this is multi-pronged. Both inputs and outputs need to be differentiated in the hashtables libctf keeps them in: inputs with the same cuname and filename need to be considered distinct as long as they have different associated CTF dicts, and per-CU outputs need to be considered distinct as long as they have different associated input dicts. Right now there is nothing tying the two together other than the CU name: fix this by introducing a new field in the ctf_dict_t named ctf_link_in_out, which (for input dicts) points to the associated per-CU output dict (if any), and for output dicts points to the associated input dict. At creation time the name used is completely arbitrary: it's only important that it be distinct if CTF dicts are distinct. So, when a clash is found, adjust the CU name by sticking the number of elements in the input on the end. At output time, the CU name will appear in the linked object, so it matters a little more that it look slightly less ugly: in conflicting cases, append an incrementing integer, starting at 0. This naming scheme is not very helpful, but it's hard to see what else we can do. The input .o name may be the same. The input .a name is not even visible to ctf_link, and even *that* might be the same, because .a's can contain many members with the same name, all of which participate in the link. All we really know is that the two have distinct dictionaries with distinct types in them, and at least this way they are all represented, any any symbols, variables etc referring to those types are accurately stored. (As a side-effect this also fixes a use-after-free and double-free when errors are found during variable or symbol emission.) Use the opportunity to prevent a couple of sources of problems, to wit changing the active CU mappings when a link has already been done (no effect on ld, which doesn't use CU mappings at all), and causing multiple consecutive ctf_link's to have the same net effect as just doing the last one (no effect on ld, which only ever does one ctf_link) rather than having the links be a sort of half-incremental not-really-intended mess. libctf/ChangeLog: PR libctf/29242 * ctf-impl.h (struct ctf_dict) [ctf_link_in_out]: New. * ctf-dedup.c (ctf_dedup_emit_type): Set it. * ctf-link.c (ctf_link_add_ctf_internal): Set the input CU name uniquely when clashes are found. (ctf_link_add): Document what repeated additions do. (ctf_new_per_cu_name): New, come up with a consistent name for a new per-CU dict. (ctf_link_deduplicating): Use it. (ctf_create_per_cu): Use it, and ctf_link_in_out, and set ctf_link_in_out properly. Don't overwrite per-CU dicts with per-CU dicts relating to different inputs. (ctf_link_add_cu_mapping): Prevent per-CU mappings being set up if we already have per-CU outputs. (ctf_link_one_variable): Adjust ctf_link_per_cu call. (ctf_link_deduplicating_one_symtypetab): Likewise. (ctf_link_empty_outputs): New, delete all the ctf_link_outputs and blank out ctf_link_in_out on the corresponding inputs. (ctf_link): Clarify the effect of multiple ctf_link calls. Empty ctf_link_outputs if it already exists rather than having the old output leak into the new link. Fix a variable name. * testsuite/config/default.exp (AR): Add. (OBJDUMP): Likewise. * testsuite/libctf-regression/libctf-repeat-cu.exp: New test. * testsuite/libctf-regression/libctf-repeat-cu*: Main program, library, and expected results for the test.
2022-03-23include, libctf, ld: extend variable section to contain functions tooNick Alcock1-1/+36
The CTF variable section is an optional (usually-not-present) section in the CTF dict which contains name -> type mappings corresponding to data symbols that are present in the linker input but not in the output symbol table: the idea is that programs that use their own symbol- resolution mechanisms can use this section to look up the types of symbols they have found using their own mechanism. Because these removed symbols (mostly static variables, functions, etc) all have names that are unlikely to appear in the ELF symtab and because very few programs have their own symbol-resolution mechanisms, a special linker flag (--ctf-variables) is needed to emit this section. Historically, we emitted only removed data symbols into the variable section. This seemed to make sense at the time, but in hindsight it really doesn't: functions are symbols too, and a C program can look them up just like any other type. So extend the variable section so that it contains all static function symbols too (if it is emitted at all), with types of kind CTF_K_FUNCTION. This is a little fiddly. We relied on compiler assistance for data symbols: the compiler simply emits all data symbols twice, once into the symtypetab as an indexed symbol and once into the variable section. Rather than wait for a suitably adjusted compiler that does the same for function symbols, we can pluck unreported function symbols out of the symtab and add them to the variable section ourselves. While we're at it, we do the same with data symbols: this is redundant right now because the compiler does it, but it costs very little time and lets the compiler drop this kludge and save a little space in .o files. include/ * ctf.h: Mention the new things we can see in the variable section. ld/ * testsuite/ld-ctf/data-func-conflicted-vars.d: New test. libctf/ * ctf-link.c (ctf_link_deduplicating_variables): Duplicate symbols into the variable section too. * ctf-serialize.c (symtypetab_delete_nonstatic_vars): Rename to... (symtypetab_delete_nonstatics): ... this. Check the funchash when pruning redundant variables. (ctf_symtypetab_sect_sizes): Adjust accordingly. * NEWS: Describe this change.
2022-01-02Update year range in copyright notice of binutils filesAlan Modra1-1/+1
The result of running etc/update-copyright.py --this-year, fixing all the files whose mode is changed by the script, plus a build with --enable-maintainer-mode --enable-cgen-maint=yes, then checking out */po/*.pot which we don't update frequently. The copy of cgen was with commit d1dd5fcc38ead reverted as that commit breaks building of bfp opcodes files.
2021-03-18libctf: fix some tabdamage and move some code aroundNick Alcock1-46/+46
ctf-link.c is unnecessarily confusing because ctf_link_lazy_open is positioned near functions that have nothing to do with opening files. Move it around, and fix some tabdamage that's crept in lately. libctf/ChangeLog 2021-03-18 Nick Alcock <nick.alcock@oracle.com> * ctf-link.c (ctf_link_lazy_open): Move up in the file, to near ctf_link_add_ctf. * ctf-lookup.c (ctf_lookup_symbol_idx): Repair tabdamage. (ctf_lookup_by_sym_or_name): Likewise. * testsuite/libctf-lookup/struct-iteration.c: Likewise. * testsuite/libctf-regression/type-add-unnamed-struct.c: Likewise.
2021-03-02bfd, ld, libctf: skip zero-refcount strings in CTF string reportingNick Alcock1-1/+2
This is a tricky one. BFD, on the linker's behalf, reports symbols to libctf via the ctf_new_symbol and ctf_new_dynsym callbacks, which ultimately call ctf_link_add_linker_symbol. But while this happens after strtab offsets are finalized, it happens before the .dynstr is actually laid out, so we can't iterate over it at this stage and it is not clear what the reported symbols are actually called. So a second callback, examine_strtab, is called after the .dynstr is finalized, which calls ctf_link_add_strtab and ultimately leads to ldelf_ctf_strtab_iter_cb being called back repeatedly until the offsets of every string in the .dynstr is passed to libctf. libctf can then use this to get symbol names out of the input (which usually stores symbol types in the form of a name -> type mapping at this stage) and extract the types of those symbols, feeding them back into their final form as a 1:1 association with the real symtab's STT_OBJ and STT_FUNC symbols (with a few skipped, see ctf_symtab_skippable). This representation is compact, but has one problem: if libctf somehow gets confused about the st_type of a symbol, it'll stick an entry into the function symtypetab when it should put it into the object symtypetab, or vice versa, and *every symbol from that one on* will have the wrong CTF type because it's actually looking up the type for a different symbol. And we have just such a bug. ctf_link_add_strtab was not taking the refcounts of strings into consideration, so even strings that had been eliminated from the strtab by virtue of being in objects eliminated via --as-needed etc were being reported. This is harmful because it can lead to multiple strings with the same apparent offset, and if the last duplicate to be reported relates to an eliminated symbol, we look up the wrong symbol from the input and gets its type wrong: if it's unlucky and the eliminated symbol is also of the wrong st_type, we will end up with a corrupted symtypetab. Thankfully the wrong-st_type case is already diagnosed by a this-can-never-happen paranoid warning: CTF warning: Symbol 61a added to CTF as a function but is of type 1 or the converse * CTF warning: Symbol a3 added to CTF as a data object but is of type 2 so at least we can tell when the corruption has spread to more than one symbol's type. Skipping zero-refcounted strings is easy: teach _bfd_elf_strtab_str to skip them, and ldelf_ctf_strtab_iter_cb to loop over skipped strings until it falls off the end or finds one that isn't skipped. bfd/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * elf-strtab.c (_bfd_elf_strtab_str): Skip strings with zero refcount. ld/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ldelfgen.c (ldelf_ctf_strtab_iter_cb): Skip zero-refcount strings. libctf/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ctf-create.c (symtypetab_density): Report the symbol name as well as index in the name != object error; note the likely consequences. * ctf-link.c (ctf_link_shuffle_syms): Report the symbol index as well as name.
2021-03-02libctf: free ctf_dynsyms properlyNick Alcock1-1/+1
In the "no symbols" case (commonplace for executables), we were freeing the ctf_dynsyms using free(), instead of ctf_dynhash_destroy(), leaking a little memory. (This is harmless in the common case of ld usage, but libctf might be used by persistent processes too.) libctf/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ctf-link.c (ctf_link_shuffle_syms): Free ctf_dynsyms properly.
2021-03-02libctf: minor error-handling fixesNick Alcock1-2/+11
A transient bug in the preceding change (fixed before commit) exposed a new failure, of ld/testsuite/ld-ctf/diag-parname.d. This attempts to ensure that if we link a dict with child type IDs but no attached parent, we get a suitable ECTF_NOPARENT error. This was happening before this commit, but only by chance, because ctf_variable_iter and ctf_variable_next check to see if the dict they're passed is a child dict without an associated parent. We forgot error-checking on the ctf_variable_next call, and as a result this was concealed -- and looking for the problem exposed a new bug. If any of the lookups beneath ctf_dedup_hash_type fail, the CTF link does *not* fail, but acts quite bizarrely, skipping the type but emitting an error to the CTF error/warning log -- so the linker will report an error, emit a partial CTF dict missing some types, and exit with exitcode 0 as if nothing went wrong. Since ctf_dedup_hash_type is never expected to fail in normal operation, this is surely wrong: failures at emission time do not emit partial CTF dicts, so failures at hashing time should not either. So propagate the error back up. Also fix a couple of smaller bugs where we fail to properly free things and/or propagate error codes on various rare link-time errors and out-of-memory conditions. libctf/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ctf-dedup.c (ctf_dedup): Pass on errors from ctf_dedup_hash_type. Call ctf_dedup_fini properly on other errors. (ctf_dedup_emit_type): Set the errno on dynhash insertion failure. * ctf-link.c (ctf_link_deduplicating_per_cu): Close outputs beyond output 0 when asserting because >1 output is found. (ctf_link_deduplicating): Likewise, when asserting because the shared output is not the same as the passed-in fp.
2021-03-02libctf: add a deduplicator-specific type mapping tableNick Alcock1-219/+102
When CTF linking is done, the linker has to track the association between types in the inputs and types in the outputs. The deduplicator does this via the cd_output_emission_hashes, which maps from hashes of types (valid in both the input and output) to the IDs of types in the specific dict in which the cd_emission_hashes is held. However, the nondeduplicating linker and ctf_add_type used a different mechanism, a dedicated hashtab stored in the ctf_link_type_mapping, populated via ctf_add_type_mapping and queried via the ctf_type_mapping function. To allow the same functions to be used for variable and symbol population in both the deduplicating and nondeduplicating linker, the deduplicator carefully transferred all its input->output mappings into this hashtab before returning. This is *expensive*. The number of entries in this hashtab scales as the number of input types, and unlike the hashing machinery the type mapping machinery (the only other thing which scales that way) has not been much optimized. Now the nondeduplicating linker is gone, we can throw this out, move the existing type mapping machinery to ctf-create.c and dedicate it to ctf_add_type alone, and add a new function ctf_dedup_type_mapping which uses the deduplicator's built-in knowledge of type mappings directly, without requiring an expensive repopulation phase. This speeds up a test link of nouveau.ko (a good worst-case candidate with a lot of types in each of a lot of input files) from 9.11s to 7.15s in my testing, a speedup of over 20%. libctf/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (ctf_dict_t) <ctf_link_type_mapping>: No longer used by the nondeduplicating linker. (ctf_add_type_mapping): Removed, now static. (ctf_type_mapping): Likewise. (ctf_dedup_type_mapping): New. (ctf_dedup_t) <cd_input_nums>: New. * ctf-dedup.c (ctf_dedup_init): Populate it. (ctf_dedup_fini): Free it again. Emphasise that this has to be the last thing called. (ctf_dedup): Populate it. (ctf_dedup_populate_type_mapping): Removed. (ctf_dedup_populate_type_mappings): Likewise. (ctf_dedup_emit): No longer call it. No longer call ctf_dedup_fini either. (ctf_dedup_type_mapping): New. * ctf-link.c (ctf_unnamed_cuname): New. (ctf_create_per_cu): Arguments must be non-null now. (ctf_in_member_cb_arg): Removed. (ctf_link): No longer populate it. No longer discard the mapping table. (ctf_link_deduplicating_one_symtypetab): Use ctf_dedup_type_mapping, not ctf_type_mapping. Use ctf_unnamed_cuname. (ctf_link_one_variable): Likewise. Pass in args individually: no longer a ctf_variable_iter callback. (empty_link_type_mapping): Removed. (ctf_link_deduplicating_variables): Use ctf_variable_next, not ctf_variable_iter. No longer pack arguments to ctf_link_one_variable into a struct. (ctf_link_deduplicating_per_cu): Call ctf_dedup_fini once all link phases are done. (ctf_link_deduplicating): Likewise. (ctf_link_intern_extern_string): Improve comment. (ctf_add_type_mapping): Migrate... (ctf_type_mapping): ... these functions... * ctf-create.c (ctf_add_type_mapping): ... here... (ctf_type_mapping): ... and make static, for the sole use of ctf_add_type.
2021-03-02libctf: remove reference to "unconflicted link mode".Nick Alcock1-3/+3
There is no such thing, and the comment makes no sense, and doesn't match what the code is doing. We always want to put variables in the same dicts as the types they relate to if at all possible. libctf/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ctf-link.c (ctf_link_one_variable): Remove reference to "unconflicted link mode".
2021-03-02libctf, include: remove the nondeduplicating CTF linkerNick Alcock1-230/+20
The nondeduplicating CTF linker was kept around when the deduplicating one was added so that people had something to fall back to in case the deduplicating linker turned out to be buggy. It's now much more stable than the nondeduplicating linker, in addition to much faster, using much less memory and producing much better output. In addition, while libctf has a linker flag to invoke the nondeduplicating linker, ld does not expose it: the only way to turn it on within ld is an intentionally- undocumented environment variable. So we can remove it without any ABI or user-visibility concerns (the only thing we leave around is the CTF_LINK_NONDEDUP flag, which can easily be interpreted as "deduplicate less", though right now it does nothing). This lets us remove a lot of complexity associated with tracking filenames and CU names separately (something the deduplcating linker never bothered with, since the cunames are always reliable and ld never hands us useful filenames anyway) The biggest lacuna left behind is the ctf_type_mapping machinery, which slows down deduplicating links quite a lot. We can't just ditch it because ctf_add_type uses it: removing the slowdown from the deduplicating linker is a job for another commit. include/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (CTF_LINK_SHARE_DUPLICATED): Note that this might merely change how much deduplication is done. libctf/ChangeLog 2021-03-02 Nick Alcock <nick.alcock@oracle.com> * ctf-link.c (ctf_create_per_cu): Drop FILENAME now that it is always identical to CUNAME. (ctf_link_deduplicating_one_symtypetab): Adjust. (ctf_link_one_type): Remove. (ctf_link_one_input_archive_member): Likewise. (ctf_link_close_one_input_archive): Likewise. (ctf_link_one_input_archive): Likewise. (ctf_link): No longer call it. Drop CTF_LINK_NONDEDUP path. Improve header comment a bit (dicts, not files). Adjust ctf_create_per_cu call. (ctf_link_deduplicating_variables): Simplify. (ctf_link_in_member_cb_arg_t) <cu_name>: Remove. <in_input_cu_file>: Likewise. <in_fp_parent>: Likewise. <done_parent>: Likewise. (ctf_link_one_variable): Turn uses of in_file_name to in_cuname.
2021-02-04libctf, ld: fix symtypetab and var section population under ld -rNick Alcock1-1/+33
The variable section in a CTF dict is meant to contain the types of variables that do not appear in the symbol table (mostly file-scope static declarations). We implement this by having the compiler emit all potential data symbols into both sections, then delete those symbols from the variable section that correspond to data symbols the linker has reported. Unfortunately, the check for this in ctf_serialize is wrong: rather than checking the set of linker-reported symbols, we check the set of names in the data object symtypetab section: if the linker has reported no symbols at all (usually if ld -r has been run, or if a non-linker program that does not use symbol tables is calling ctf_link) this will include every single symbol, emptying the variable section completely. Worse, when ld -r is in use, we want to force writeout of every symtypetab entry on the inputs, in an indexed section, whether or not the linker has reported them, since this isn't a final link yet and the symbol table is not finalized (and may grow more symbols than the linker has yet reported). But the check for this is flawed too: we were relying on ctf_link_shuffle_syms not having been called if no symbols exist, but that function is *always* called by ld even when ld -r is in use: ctf_link_add_linker_symbol is the one that's not called when there are no symbols. We clearly need to rethink this. Using the emptiness of the set of reported symbols as a test for ld -r is just ugly: the linker already knows if ld -r is underway and can just tell us. So add a new linker flag CTF_LINK_NO_FILTER_REPORTED_SYMS that is set to stop the linker filtering the symbols in the symtypetab sections using the set that the linker has reported: use the presence or absence of this flag to determine whether to emit unindexed symtabs: we only remove entries from the variable section when filtering symbols, and we only remove them if they are in the reported symbol set, fixing the case where no symbols are reported by the linker at all. (The negative sense of the new CTF_LINK flag is intentional: the common case, both for ld and for simple tools that want to do a ctf_link with no ELF symbol table in sight, is probably to filter out symbols that no linker has reported: i.e., for the simple tools, all of them.) There's another wrinkle, though. It is quite possible for a non-linker to add symbols to a dict via ctf_add_*_sym and then write it out via the ctf_write APIs: perhaps it's preparing a dict for a later linker invocation. Right now this would not lead to anything terribly meaningful happening: ctf_serialize just assumes it was called via ctf_link if symbols are present. So add an (internal-to-libctf) flag that indicates that a writeout is happening via ctf_link_write, and set it there (propagating it to child dicts as needed). ctf_serialize can then spot when it is not being called by a linker, and arrange to always write out an indexed, sorted symtypetab for fastest possible future symbol lookup by name in that case. (The writeouts done by ld -r are unsorted, because the only thing likely to use those symtabs is the linker, which doesn't benefit from symtypetab sorting.) Tests added for all three linking cases (ld -r, ld -shared, ld), with a bit of testsuite framework enhancement to stop it unconditionally linking the CTF to be checked by the lookup program with -shared, so tests can now examine CTF linked with -r or indeed with no flags at all, though the output filename is still foo.so even in this case. Another test added for the non-linker case that endeavours to determine whether the symtypetab is sorted by examining the order of entries returned from ctf_symbol_next: nobody outside libctf should rely on this ordering, but this test is not outside libctf :) include/ChangeLog 2021-01-26 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (CTF_LINK_NO_FILTER_REPORTED_SYMS): New. ld/ChangeLog 2021-01-26 Nick Alcock <nick.alcock@oracle.com> * ldlang.c (lang_merge_ctf): Set CTF_LINK_NO_FILTER_REPORTED_SYMS when appropriate. libctf/ChangeLog 2021-01-27 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.c (_libctf_nonnull_): Add parameters. (LCTF_LINKING): New flag. (ctf_dict_t) <ctf_link_flags>: Mention it. * ctf-link.c (ctf_link): Keep LCTF_LINKING set across call. (ctf_write): Likewise, including in child dictionaries. (ctf_link_shuffle_syms): Make sure ctf_dynsyms is NULL if there are no reported symbols. * ctf-create.c (symtypetab_delete_nonstatic_vars): Make sure the variable has been reported as a symbol by the linker. (symtypetab_skippable): Mention relationship between SYMFP and the flags. (symtypetab_density): Adjust nonnullity. Exit early if no symbols were reported and force-indexing is off (i.e., we are doing a final link). (ctf_serialize): Handle the !LCTF_LINKING case by writing out an indexed, sorted symtypetab (and allow SYMFP to be NULL in this case). Turn sorting off if this is a non-final link. Only delete nonstatic vars if we are filtering symbols and the linker has reported some. * testsuite/libctf-regression/nonstatic-var-section-ld-r*: New test of variable and symtypetab section population when ld -r is used. * testsuite/libctf-regression/nonstatic-var-section-ld-executable.lk: Likewise, when ld of an executable is used. * testsuite/libctf-regression/nonstatic-var-section-ld.lk: Likewise, when ld -shared alone is used. * testsuite/libctf-regression/nonstatic-var-section-ld*.c: Lookup programs for the above. * testsuite/libctf-writable/symtypetab-nonlinker-writeout.*: New test, testing survival of symbols across ctf_write paths. * testsuite/lib/ctf-lib.exp (run_lookup_test): New option, nonshared, suppressing linking of the SOURCE with -shared.
2021-01-05libctf: warn about information loss because of unreleased format changesNick Alcock1-0/+30
In the last cycle there have been various changes that have replaced parts of the CTF format with other parts without format compatibility. This was not a compat break, because the old format was never accepted by any version of libctf (the not-in-official-release CTF compiler patch was emitting an invalid func info section), but nonetheless it can confuse users using that patch if they link together object files and find the func info sections in the inputs silently disappearing. Scan the linker inputs for this problem and emit a warning if any are found. libctf/ChangeLog 2021-01-05 Nick Alcock <nick.alcock@oracle.com> * ctf-link.c (ctf_link_warn_outdated_inputs): New. (ctf_link_write): Call it.
2021-01-01Update year range in copyright notice of binutils filesAlan Modra1-1/+1
2020-11-20libctf: do not crash when CTF symbol or variable linking failsNick Alcock1-6/+10
When linking fails, we delete all the generated outputs, but we fail to remove them from the ctf_link_outputs hash we stuck them in before doing symbol and variable section linking (which we had to do because that's where ctf_create_per_cu, used by both, looks for them). This leaves stale pointers to freed memory behind, and crashes soon follow. Fix obvious. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-link.c (ctf_link_deduplicating): Clean up the ctf_link_outputs hash on error.
2020-11-20libctf: symbol type linking supportNick Alcock1-15/+331
This adds facilities to write out the function info and data object sections, which efficiently map from entries in the symbol table to types. The write-side code is entirely new: the read-side code was merely significantly changed and support for indexed tables added (pointed to by the no-longer-unused cth_objtidxoff and cth_funcidxoff header fields). With this in place, you can use ctf_lookup_by_symbol to look up the types of symbols of function and object type (and, as before, you can use ctf_lookup_variable to look up types of file-scope variables not present in the symbol table, as long as you know their name: but variables that are also data objects are now found in the data object section instead.) (Compatible) file format change: The CTF spec has always said that the function info section looks much like the CTF_K_FUNCTIONs in the type section: an info word (including an argument count) followed by a return type and N argument types. This format is suboptimal: it means function symbols cannot be deduplicated and it causes a lot of ugly code duplication in libctf. But conveniently the compiler has never emitted this! Because it has always emitted a rather different format that libctf has never accepted, we can be sure that there are no instances of this function info section in the wild, and can freely change its format without compatibility concerns or a file format version bump. (And since it has never been emitted in any code that generated any older file format version, either, we need keep no code to read the format as specified at all!) So the function info section is now specified as an array of uint32_t, exactly like the object data section: each entry is a type ID in the type section which must be of kind CTF_K_FUNCTION, the prototype of this function. This allows function types to be deduplicated and also correctly encodes the fact that all functions declared in C really are types available to the program: so they should be stored in the type section like all other types. (In format v4, we will be able to represent the types of static functions as well, but that really does require a file format change.) We introduce a new header flag, CTF_F_NEWFUNCINFO, which is set if the new function info format is in use. A sufficiently new compiler will always set this flag. New libctf will always set this flag: old libctf will refuse to open any CTF dicts that have this flag set. If the flag is not set on a dict being read in, new libctf will disregard the function info section. Format v4 will remove this flag (or, rather, the flag has no meaning there and the bit position may be recycled for some other purpose). New API: Symbol addition: ctf_add_func_sym: Add a symbol with a given name and type. The type must be of kind CTF_K_FUNCTION (a function pointer). Internally this adds a name -> type mapping to the ctf_funchash in the ctf_dict. ctf_add_objt_sym: Add a symbol with a given name and type. The type kind can be anything, including function pointers. This adds to ctf_objthash. These both treat symbols as name -> type mappings: the linker associates symbol names with symbol indexes via the ctf_link_shuffle_syms callback, which sets up the ctf_dynsyms/ctf_dynsymidx/ctf_dynsymmax fields in the ctf_dict. Repeated relinks can add more symbols. Variables that are also exposed as symbols are removed from the variable section at serialization time. CTF symbol type sections which have enough pads, defined by CTF_INDEX_PAD_THRESHOLD (whether because they are in dicts with symbols where most types are unknown, or in archive where most types are defined in some child or parent dict, not in this specific dict) are sorted by name rather than symidx and accompanied by an index which associates each symbol type entry with a name: the existing ctf_lookup_by_symbol will map symbol indexes to symbol names and look the names up in the index automatically. (This is currently ELF-symbol-table-dependent, but there is almost nothing specific to ELF in here and we can add support for other symbol table formats easily). The compiler also uses index sections to communicate the contents of object file symbol tables without relying on any specific ordering of symbols: it doesn't need to sort them, and libctf will detect an unsorted index section via the absence of the new CTF_F_IDXSORTED header flag, and sort it if needed. Iteration: ctf_symbol_next: Iterator which returns the types and names of symbols one by one, either for function or data symbols. This does not require any sorting: the ctf_link machinery uses it to pull in all the compiler-provided symbols cheaply, but it is not restricted to that use. (Compatible) changes in API: ctf_lookup_by_symbol: can now be called for object and function symbols: never returns ECTF_NOTDATA (which is now not thrown by anything, but is kept for compatibility and because it is a plausible error that we might start throwing again at some later date). Internally we also have changes to the ctf-string functionality so that "external" strings (those where we track a string -> offset mapping, but only write out an offset) can be consulted via the usual means (ctf_strptr) before the strtab is written out. This is important because ctf_link_add_linker_symbol can now be handed symbols named via strtab offsets, and ctf_link_shuffle_syms must figure out their actual names by looking in the external symtab we have just been fed by the ctf_link_add_strtab callback, long before that strtab is written out. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_symbol_next): New. (ctf_add_objt_sym): Likewise. (ctf_add_func_sym): Likewise. * ctf.h: Document new function info section format. (CTF_F_NEWFUNCINFO): New. (CTF_F_IDXSORTED): New. (CTF_F_MAX): Adjust accordingly. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h (CTF_INDEX_PAD_THRESHOLD): New. (_libctf_nonnull_): Likewise. (ctf_in_flight_dynsym_t): New. (ctf_dict_t) <ctf_funcidx_names>: Likewise. <ctf_objtidx_names>: Likewise. <ctf_nfuncidx>: Likewise. <ctf_nobjtidx>: Likewise. <ctf_funcidx_sxlate>: Likewise. <ctf_objtidx_sxlate>: Likewise. <ctf_objthash>: Likewise. <ctf_funchash>: Likewise. <ctf_dynsyms>: Likewise. <ctf_dynsymidx>: Likewise. <ctf_dynsymmax>: Likewise. <ctf_in_flight_dynsym>: Likewise. (struct ctf_next) <u.ctn_next>: Likewise. (ctf_symtab_skippable): New prototype. (ctf_add_funcobjt_sym): Likewise. (ctf_dynhash_sort_by_name): Likewise. (ctf_sym_to_elf64): Rename to... (ctf_elf32_to_link_sym): ... this, and... (ctf_elf64_to_link_sym): ... this. * ctf-open.c (init_symtab): Check for lack of CTF_F_NEWFUNCINFO flag, and presence of index sections. Refactor out ctf_symtab_skippable and ctf_elf*_to_link_sym, and use them. Use ctf_link_sym_t, not Elf64_Sym. Skip initializing objt or func sxlate sections if corresponding index section is present. Adjust for new func info section format. (ctf_bufopen_internal): Add ctf_err_warn to corrupt-file error handling. Report incorrect-length index sections. Always do an init_symtab, even if there is no symtab section (there may be index sections still). (flip_objts): Adjust comment: func and objt sections are actually identical in structure now, no need to caveat. (ctf_dict_close): Free newly-added data structures. * ctf-create.c (ctf_create): Initialize them. (ctf_symtab_skippable): New, refactored out of init_symtab, with st_nameidx_set check added. (ctf_add_funcobjt_sym): New, add a function or object symbol to the ctf_objthash or ctf_funchash, by name. (ctf_add_objt_sym): Call it. (ctf_add_func_sym): Likewise. (symtypetab_delete_nonstatic_vars): New, delete vars also present as data objects. (CTF_SYMTYPETAB_EMIT_FUNCTION): New flag to symtypetab emitters: this is a function emission, not a data object emission. (CTF_SYMTYPETAB_EMIT_PAD): New flag to symtypetab emitters: emit pads for symbols with no type (only set for unindexed sections). (CTF_SYMTYPETAB_FORCE_INDEXED): New flag to symtypetab emitters: always emit indexed. (symtypetab_density): New, figure out section sizes. (emit_symtypetab): New, emit a symtypetab. (emit_symtypetab_index): New, emit a symtypetab index. (ctf_serialize): Call them, emitting suitably sorted symtypetab sections and indexes. Set suitable header flags. Copy over new fields. * ctf-hash.c (ctf_dynhash_sort_by_name): New, used to impose an order on symtypetab index sections. * ctf-link.c (ctf_add_type_mapping): Delete erroneous comment relating to code that was never committed. (ctf_link_one_variable): Improve variable name. (check_sym): New, symtypetab analogue of check_variable. (ctf_link_deduplicating_one_symtypetab): New. (ctf_link_deduplicating_syms): Likewise. (ctf_link_deduplicating): Call them. (ctf_link_deduplicating_per_cu): Note that we don't call them in this case (yet). (ctf_link_add_strtab): Set the error on the fp correctly. (ctf_link_add_linker_symbol): New (no longer a do-nothing stub), add a linker symbol to the in-flight list. (ctf_link_shuffle_syms): New (no longer a do-nothing stub), turn the in-flight list into a mapping we can use, now its names are resolvable in the external strtab. * ctf-string.c (ctf_str_rollback_atom): Don't roll back atoms with external strtab offsets. (ctf_str_rollback): Adjust comment. (ctf_str_write_strtab): Migrate ctf_syn_ext_strtab population from writeout time... (ctf_str_add_external): ... to string addition time. * ctf-lookup.c (ctf_lookup_var_key_t): Rename to... (ctf_lookup_idx_key_t): ... this, now we use it for syms too. <clik_names>: New member, a name table. (ctf_lookup_var): Adjust accordingly. (ctf_lookup_variable): Likewise. (ctf_lookup_by_id): Shuffle further up in the file. (ctf_symidx_sort_arg_cb): New, callback for... (sort_symidx_by_name): ... this new function to sort a symidx found to be unsorted (likely originating from the compiler). (ctf_symidx_sort): New, sort a symidx. (ctf_lookup_symbol_name): Support dynamic symbols with indexes provided by the linker. Use ctf_link_sym_t, not Elf64_Sym. Check the parent if a child lookup fails. (ctf_lookup_by_symbol): Likewise. Work for function symbols too. (ctf_symbol_next): New, iterate over symbols with types (without sorting). (ctf_lookup_idx_name): New, bsearch for symbol names in indexes. (ctf_try_lookup_indexed): New, attempt an indexed lookup. (ctf_func_info): Reimplement in terms of ctf_lookup_by_symbol. (ctf_func_args): Likewise. (ctf_get_dict): Move... * ctf-types.c (ctf_get_dict): ... here. * ctf-util.c (ctf_sym_to_elf64): Re-express as... (ctf_elf64_to_link_sym): ... this. Add new st_symidx field, and st_nameidx_set (always 0, so st_nameidx can be ignored). Look in the ELF strtab for names. (ctf_elf32_to_link_sym): Likewise, for Elf32_Sym. (ctf_next_destroy): Destroy ctf_next_t.u.ctn_next if need be. * libctf.ver: Add ctf_symbol_next, ctf_add_objt_sym and ctf_add_func_sym.
2020-11-20bfd, include, ld, binutils, libctf: CTF should use the dynstr/symNick Alcock1-3/+6
This is embarrassing. The whole point of CTF is that it remains intact even after a binary is stripped, providing a compact mapping from symbols to types for everything in the externally-visible interface of an ELF object: it has connections to the symbol table for that purpose, and to the string table to avoid duplicating symbol names. So it's a shame that the hooks I implemented last year served to hook it up to the .symtab and .strtab, which obviously disappear on strip, leaving any accompanying the CTF dict containing references to strings (and, soon, symbols) which don't exist any more because their containing strtab has been vaporized. The original Solaris design used .dynsym and .dynstr (well, actually, .ldynsym, which has more symbols) which do not disappear. So should we. Thankfully the work we did before serves as guide rails, and adjusting things to use the .dynstr and .dynsym was fast and easy. The only annoyance is that the dynsym is assembled inside elflink.c in a fairly piecemeal fashion, so that the easiest way to get the symbols out was to hook in before every call to swap_symbol_out (we also leave in a hook in front of symbol additions to the .symtab because it seems plausible that we might want to hook them in future too: for now that hook is unused). We adjust things so that rather than being offered a whole hash table of symbols at once, libctf is now given symbols one at a time, with st_name indexes already resolved and pointing at their final .dynstr offsets: it's now up to libctf to resolve these to names as needed using the strtab info we pass it separately. Some bits might be contentious. The ctf_new_dynstr callback takes an elf_internal_sym, and this remains an elf_internal_sym right down through the generic emulation layers into ldelfgen. This is no worse than the elf_sym_strtab we used to pass down, but in the future when we gain non-ELF CTF symtab support we might want to lower the elf_internal_sym to some other representation (perhaps a ctf_link_symbol) in bfd or in ldlang_ctf_new_dynsym. We rename the 'apply_strsym' hooks to 'acquire_strings' instead, becuse they no longer have anything to do with symbols. There are some API changes to pieces of API which are technically public but actually totally unused by anything and/or unused by anything but ld so they can change freely: the ctf_link_symbol gains new fields to allow symbol names to be given as strtab offsets as well as strings, and a symidx so that the symbol index can be passed in. ctf_link_shuffle_syms loses its callback parameter: the idea now is that linkers call the new ctf_link_add_linker_symbol for every symbol in .dynsym, feed in all the strtab entries with ctf_link_add_strtab, and then a call to ctf_link_shuffle_syms will apply both and arrange to use them to reorder the CTF symtab at CTF serialization time (which is coming in the next commit). Inside libctf we have a new preamble flag CTF_F_DYNSTR which is always set in v3-format CTF dicts from this commit forwards: CTF dicts without this flag are associated with .strtab like they used to be, so that old dicts' external strings don't turn to garbage when loaded by new libctf. Dicts with this flag are associated with .dynstr and .dynsym instead. (The flag is not the next in sequence because this commit was written quite late: the missing flags will be filled in by the next commit.) Tests forthcoming in a later commit in this series. bfd/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * elflink.c (elf_finalize_dynstr): Call examine_strtab after dynstr finalization. (elf_link_swap_symbols_out): Don't call it here. Call ctf_new_symbol before swap_symbol_out. (elf_link_output_extsym): Call ctf_new_dynsym before swap_symbol_out. (bfd_elf_final_link): Likewise. * elf.c (swap_out_syms): Pass in bfd_link_info. Call ctf_new_symbol before swap_symbol_out. (_bfd_elf_compute_section_file_positions): Adjust. binutils/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * readelf.c (dump_section_as_ctf): Use .dynsym and .dynstr, not .symtab and .strtab. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * bfdlink.h (struct elf_sym_strtab): Replace with... (struct elf_internal_sym): ... this. (struct bfd_link_callbacks) <examine_strtab>: Take only a symstrtab argument. <ctf_new_symbol>: New. <ctf_new_dynsym>: Likewise. * ctf-api.h (struct ctf_link_sym) <st_symidx>: New. <st_nameidx>: Likewise. <st_nameidx_set>: Likewise. (ctf_link_iter_symbol_f): Removed. (ctf_link_shuffle_syms): Remove most parameters, just takes a ctf_dict_t now. (ctf_link_add_linker_symbol): New, split from ctf_link_shuffle_syms. * ctf.h (CTF_F_DYNSTR): New. (CTF_F_MAX): Adjust. ld/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ldelfgen.c (struct ctf_strsym_iter_cb_arg): Rename to... (struct ctf_strtab_iter_cb_arg): ... this, changing fields: <syms>: Remove. <symcount>: Remove. <symstrtab>: Rename to... <strtab>: ... this. (ldelf_ctf_strtab_iter_cb): Adjust. (ldelf_ctf_symbols_iter_cb): Remove. (ldelf_new_dynsym_for_ctf): New, tell libctf about a single symbol. (ldelf_examine_strtab_for_ctf): Rename to... (ldelf_acquire_strings_for_ctf): ... this, only doing the strtab portion and not symbols. * ldelfgen.h: Adjust declarations accordingly. * ldemul.c (ldemul_examine_strtab_for_ctf): Rename to... (ldemul_acquire_strings_for_ctf): ... this. (ldemul_new_dynsym_for_ctf): New. * ldemul.h: Adjust declarations accordingly. * ldlang.c (ldlang_ctf_apply_strsym): Rename to... (ldlang_ctf_acquire_strings): ... this. (ldlang_ctf_new_dynsym): New. (lang_write_ctf): Call ldemul_new_dynsym_for_ctf with NULL to do the actual symbol shuffle. * ldlang.h (struct elf_strtab_hash): Adjust accordingly. * ldmain.c (bfd_link_callbacks): Wire up new/renamed callbacks. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-link.c (ctf_link_shuffle_syms): Adjust. (ctf_link_add_linker_symbol): New, unimplemented stub. * libctf.ver: Add it. * ctf-create.c (ctf_serialize): Set CTF_F_DYNSTR on newly-serialized dicts. * ctf-open-bfd.c (ctf_bfdopen_ctfsect): Check for the flag: open the symtab/strtab if not present, dynsym/dynstr otherwise. * ctf-archive.c (ctf_arc_bufpreamble): New, get the preamble from some arbitrary member of a CTF archive. * ctf-impl.h (ctf_arc_bufpreamble): Declare it.
2020-11-20libctf, include, binutils, gdb: rename CTF-opening functionsNick Alcock1-4/+4
The functions that return ctf_dict_t's given a ctf_archive_t and a name are very clumsily named. It sounds like they return *archives*, not dictionaries, and the names are very long and clunky. Why do we have a ctf_arc_open_by_name when it opens a dictionary, not an archive, and when there is no way to open a dictionary in any other way? The answer is purely internal: the function is located in ctf-archive.c, and everything in there was called ctf_arc_*, and there is another way to open a dict (by offset in the archive), that is internal to ctf-archive.c and that nothing else can call. This is clearly bad naming. The internal organization of the source tree should not dictate public API names! So rename things (keeping the old, bad names for compatibility), and adjust all users. You now open a dict using ctf_dict_open, and open it giving ELF sections via ctf_dict_open_sections. binutils/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * objdump.c (dump_ctf): Use ctf_dict_open, not ctf_arc_open_by_name. * readelf.c (dump_section_as_ctf): Likewise. gdb/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctfread.c (elfctf_build_psymtabs): Use ctf_dict_open, not ctf_arc_open_by_name. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_arc_open_by_name): Rename to... (ctf_dict_open): ... this, keeping compatibility function. (ctf_arc_open_by_name_sections): Rename to... (ctf_dict_open_sections): ... this, keeping compatibility function. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-archive.c (ctf_arc_open_by_offset): Rename to... (ctf_dict_open_by_offset): ... this. Adjust callers. (ctf_arc_open_by_name_internal): Rename to... (ctf_dict_open_internal): ... this. Adjust callers. (ctf_arc_open_by_name_sections): Rename to... (ctf_dict_open_sections): ... this, keeping compatibility function. (ctf_arc_open_by_name): Rename to... (ctf_dict_open): ... this, keeping compatibility function. * libctf.ver: New functions added. * ctf-link.c (ctf_link_one_input_archive): Adjusted accordingly. (ctf_link_deduplicating_open_inputs): Likewise.
2020-11-20libctf, include, binutils, gdb, ld: rename ctf_file_t to ctf_dict_tNick Alcock1-82/+82
The naming of the ctf_file_t type in libctf is a historical curiosity. Back in the Solaris days, CTF dictionaries were originally generated as a separate file and then (sometimes) merged into objects: hence the datatype was named ctf_file_t, and known as a "CTF file". Nowadays, raw CTF is essentially never written to a file on its own, and the datatype changed name to a "CTF dictionary" years ago. So the term "CTF file" refers to something that is never a file! This is at best confusing. The type has also historically been known as a 'CTF container", which is even more confusing now that we have CTF archives which are *also* a sort of container (they contain CTF dictionaries), but which are never referred to as containers in the source code. So fix this by completing the renaming, renaming ctf_file_t to ctf_dict_t throughout, and renaming those few functions that refer to CTF files by name (keeping compatibility aliases) to refer to dicts instead. Old users who still refer to ctf_file_t will see (harmless) pointer-compatibility warnings at compile time, but the ABI is unchanged (since C doesn't mangle names, and ctf_file_t was always an opaque type) and things will still compile fine as long as -Werror is not specified. All references to CTF containers and CTF files in the source code are fixed to refer to CTF dicts instead. Further (smaller) renamings of annoyingly-named functions to come, as part of the process of souping up queries across whole archives at once (needed for the function info and data object sections). binutils/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * objdump.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t. (dump_ctf_archive_member): Likewise. (dump_ctf): Likewise. Use ctf_dict_close, not ctf_file_close. * readelf.c (dump_ctf_errs): Rename ctf_file_t to ctf_dict_t. (dump_ctf_archive_member): Likewise. (dump_section_as_ctf): Likewise. Use ctf_dict_close, not ctf_file_close. gdb/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctfread.c: Change uses of ctf_file_t to ctf_dict_t. (ctf_fp_info::~ctf_fp_info): Call ctf_dict_close, not ctf_file_close. include/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_file_t): Rename to... (ctf_dict_t): ... this. Keep ctf_file_t around for compatibility. (struct ctf_file): Likewise rename to... (struct ctf_dict): ... this. (ctf_file_close): Rename to... (ctf_dict_close): ... this, keeping compatibility function. (ctf_parent_file): Rename to... (ctf_parent_dict): ... this, keeping compatibility function. All callers adjusted. * ctf.h: Rename references to ctf_file_t to ctf_dict_t. (struct ctf_archive) <ctfa_nfiles>: Rename to... <ctfa_ndicts>: ... this. ld/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ldlang.c (ctf_output): This is a ctf_dict_t now. (lang_ctf_errs_warnings): Rename ctf_file_t to ctf_dict_t. (ldlang_open_ctf): Adjust comment. (lang_merge_ctf): Use ctf_dict_close, not ctf_file_close. * ldelfgen.h (ldelf_examine_strtab_for_ctf): Rename ctf_file_t to ctf_dict_t. Change opaque declaration accordingly. * ldelfgen.c (ldelf_examine_strtab_for_ctf): Adjust. * ldemul.h (examine_strtab_for_ctf): Likewise. (ldemul_examine_strtab_for_ctf): Likewise. * ldeuml.c (ldemul_examine_strtab_for_ctf): Likewise. libctf/ChangeLog 2020-11-20 Nick Alcock <nick.alcock@oracle.com> * ctf-impl.h: Rename ctf_file_t to ctf_dict_t: all declarations adjusted. (ctf_fileops): Rename to... (ctf_dictops): ... this. (ctf_dedup_t) <cd_id_to_file_t>: Rename to... <cd_id_to_dict_t>: ... this. (ctf_file_t): Fix outdated comment. <ctf_fileops>: Rename to... <ctf_dictops>: ... this. (struct ctf_archive_internal) <ctfi_file>: Rename to... <ctfi_dict>: ... this. * ctf-archive.c: Rename ctf_file_t to ctf_dict_t. Rename ctf_archive.ctfa_nfiles to ctfa_ndicts. Rename ctf_file_close to ctf_dict_close. All users adjusted. * ctf-create.c: Likewise. Refer to CTF dicts, not CTF containers. (ctf_bundle_t) <ctb_file>: Rename to... <ctb_dict): ... this. * ctf-decl.c: Rename ctf_file_t to ctf_dict_t. * ctf-dedup.c: Likewise. Rename ctf_file_close to ctf_dict_close. Refer to CTF dicts, not CTF containers. * ctf-dump.c: Likewise. * ctf-error.c: Likewise. * ctf-hash.c: Likewise. * ctf-inlines.h: Likewise. * ctf-labels.c: Likewise. * ctf-link.c: Likewise. * ctf-lookup.c: Likewise. * ctf-open-bfd.c: Likewise. * ctf-string.c: Likewise. * ctf-subr.c: Likewise. * ctf-types.c: Likewise. * ctf-util.c: Likewise. * ctf-open.c: Likewise. (ctf_file_close): Rename to... (ctf_dict_close): ...this. (ctf_file_close): New trivial wrapper around ctf_dict_close, for compatibility. (ctf_parent_file): Rename to... (ctf_parent_dict): ... this. (ctf_parent_file): New trivial wrapper around ctf_parent_dict, for compatibility. * libctf.ver: Add ctf_dict_close and ctf_parent_dict.
2020-08-27libctf, binutils, include, ld: gettextize and improve error handlingNick Alcock1-65/+68
This commit follows on from the earlier commit "libctf, ld, binutils: add textual error/warning reporting for libctf" and converts every error in libctf that was reported using ctf_dprintf to use ctf_err_warn instead, gettextizing them in the process, using N_() where necessary to avoid doing gettext calls unless an error message is actually generated, and rephrasing some error messages for ease of translation. This requires a slight change in the ctf_errwarning_next API: this API is public but has not been in a release yet, so can still change freely. The problem is that many errors are emitted at open time (whether opening of a CTF dict, or opening of a CTF archive): the former of these throws away its incompletely-initialized ctf_file_t rather than return it, and the latter has no ctf_file_t at all. So errors and warnings emitted at open time cannot be stored in the ctf_file_t, and have to go elsewhere. We put them in a static local in ctf-subr.c (which is not very thread-safe: a later commit will improve things here): ctf_err_warn with a NULL fp adds to this list, and the public interface ctf_errwarning_next with a NULL fp retrieves from it. We need a slight exception from the usual iterator rules in this case: with a NULL fp, there is nowhere to store the ECTF_NEXT_END "error" which signifies the end of iteration, so we add a new err parameter to ctf_errwarning_next which is used to report such iteration-related errors. (If an fp is provided -- i.e., if not reporting open errors -- this is optional, but even if it's optional it's still an API change. This is actually useful from a usability POV as well, since ctf_errwarning_next is usually called when there's been an error, so overwriting the error code with ECTF_NEXT_END is not very helpful! So, unusually, ctf_errwarning_next now uses the passed fp for its error code *only* if no errp pointer is passed in, and leaves it untouched otherwise.) ld, objdump and readelf are adapted to call ctf_errwarning_next with a NULL fp to report open errors where appropriate. The ctf_err_warn API also has to change, gaining a new error-number parameter which is used to add the error message corresponding to that error number into the debug stream when LIBCTF_DEBUG is enabled: changing this API is easy at this point since we are already touching all existing calls to gettextize them. We need this because the debug stream should contain the errno's message, but the error reported in the error/warning stream should *not*, because the caller will probably report it themselves at failure time regardless, and reporting it in every error message that leads up to it leads to a ridiculous chattering on failure, which is likely to end up as ridiculous chattering on stderr (trimmed a bit): CTF error: `ld/testsuite/ld-ctf/A.c (0): lookup failure for type 3: flags 1: The parent CTF dictionary is unavailable' CTF error: `ld/testsuite/ld-ctf/A.c (0): struct/union member type hashing error during type hashing for type 80000001, kind 6: The parent CTF dictionary is unavailable' CTF error: `deduplicating link variable emission failed for ld/testsuite/ld-ctf/A.c: The parent CTF dictionary is unavailable' ld/.libs/lt-ld-new: warning: CTF linking failed; output will have no CTF section: `The parent CTF dictionary is unavailable' We only need to be told that the parent CTF dictionary is unavailable *once*, not over and over again! errmsgs are still emitted on warning generation, because warnings do not usually lead to a failure propagated up to the caller and reported there. Debug-stream messages are not translated. If translation is turned on, there will be a mixture of English and translated messages in the debug stream, but rather that than burden the translators with debug-only output. binutils/ChangeLog 2020-08-27 Nick Alcock <nick.alcock@oracle.com> * objdump.c (dump_ctf_archive_member): Move error- reporting... (dump_ctf_errs): ... into this separate function. (dump_ctf): Call it on open errors. * readelf.c (dump_ctf_archive_member): Move error- reporting... (dump_ctf_errs): ... into this separate function. Support calls with NULL fp. Adjust for new err parameter to ctf_errwarning_next. (dump_section_as_ctf): Call it on open errors. include/ChangeLog 2020-08-27 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_errwarning_next): New err parameter. ld/ChangeLog 2020-08-27 Nick Alcock <nick.alcock@oracle.com> * ldlang.c (lang_ctf_errs_warnings): Support calls with NULL fp. Adjust for new err parameter to ctf_errwarning_next. Only check for assertion failures when fp is non-NULL. (ldlang_open_ctf): Call it on open errors. * testsuite/ld-ctf/ctf.exp: Always use the C locale to avoid breaking the diags tests. libctf/ChangeLog 2020-08-27 Nick Alcock <nick.alcock@oracle.com> * ctf-subr.c (open_errors): New list. (ctf_err_warn): Calls with NULL fp append to open_errors. Add err parameter, and use it to decorate the debug stream with errmsgs. (ctf_err_warn_to_open): Splice errors from a CTF dict into the open_errors. (ctf_errwarning_next): Calls with NULL fp report from open_errors. New err param to report iteration errors (including end-of-iteration) when fp is NULL. (ctf_assert_fail_internal): Adjust ctf_err_warn call for new err parameter: gettextize. * ctf-impl.h (ctfo_get_vbytes): Add ctf_file_t parameter. (LCTF_VBYTES): Adjust. (ctf_err_warn_to_open): New. (ctf_err_warn): Adjust. (ctf_bundle): Used in only one place: move... * ctf-create.c: ... here. (enumcmp): Use ctf_err_warn, not ctf_dprintf, passing the err number down as needed. Don't emit the errmsg. Gettextize. (membcmp): Likewise. (ctf_add_type_internal): Likewise. (ctf_write_mem): Likewise. (ctf_compress_write): Likewise. Report errors writing the header or body. (ctf_write): Likewise. * ctf-archive.c (ctf_arc_write_fd): Use ctf_err_warn, not ctf_dprintf, and gettextize, as above. (ctf_arc_write): Likewise. (ctf_arc_bufopen): Likewise. (ctf_arc_open_internal): Likewise. * ctf-labels.c (ctf_label_iter): Likewise. * ctf-open-bfd.c (ctf_bfdclose): Likewise. (ctf_bfdopen): Likewise. (ctf_bfdopen_ctfsect): Likewise. (ctf_fdopen): Likewise. * ctf-string.c (ctf_str_write_strtab): Likewise. * ctf-types.c (ctf_type_resolve): Likewise. * ctf-open.c (get_vbytes_common): Likewise. Pass down the ctf dict. (get_vbytes_v1): Pass down the ctf dict. (get_vbytes_v2): Likewise. (flip_ctf): Likewise. (flip_types): Likewise. Use ctf_err_warn, not ctf_dprintf, and gettextize, as above. (upgrade_types_v1): Adjust calls. (init_types): Use ctf_err_warn, not ctf_dprintf, as above. (ctf_bufopen_internal): Likewise. Adjust calls. Transplant errors emitted into individual dicts into the open errors if this turns out to be a failed open in the end. * ctf-dump.c (ctf_dump_format_type): Adjust ctf_err_warn for new err argument. Gettextize. Don't emit the errmsg. (ctf_dump_funcs): Likewise. Collapse err label into its only case. (ctf_dump_type): Likewise. * ctf-link.c (ctf_create_per_cu): Adjust ctf_err_warn for new err argument. Gettextize. Don't emit the errmsg. (ctf_link_one_type): Likewise. (ctf_link_lazy_open): Likewise. (ctf_link_one_input_archive): Likewise. (ctf_link_deduplicating_count_inputs): Likewise. (ctf_link_deduplicating_open_inputs): Likewise. (ctf_link_deduplicating_close_inputs): Likewise. (ctf_link_deduplicating): Likewise. (ctf_link): Likewise. (ctf_link_deduplicating_per_cu): Likewise. Add some missed ctf_set_errnos to obscure error cases. * ctf-dedup.c (ctf_dedup_rhash_type): Adjust ctf_err_warn for new err argument. Gettextize. Don't emit the errmsg. (ctf_dedup_populate_mappings): Likewise. (ctf_dedup_detect_name_ambiguity): Likewise. (ctf_dedup_init): Likewise. (ctf_dedup_multiple_input_dicts): Likewise. (ctf_dedup_conflictify_unshared): Likewise. (ctf_dedup): Likewise. (ctf_dedup_rwalk_one_output_mapping): Likewise. (ctf_dedup_id_to_target): Likewise. (ctf_dedup_emit_type): Likewise. (ctf_dedup_emit_struct_members): Likewise. (ctf_dedup_populate_type_mapping): Likewise. (ctf_dedup_populate_type_mappings): Likewise. (ctf_dedup_emit): Likewise. (ctf_dedup_hash_type): Likewise. Fix a bit of messed-up error status setting. (ctf_dedup_rwalk_one_output_mapping): Likewise. Don't hide unknown-type-kind messages (which signify file corruption).
2020-07-22libctf, link: tie in the deduplicating linkerNick Alcock1-2/+658
This fairly intricate commit connects up the CTF linker machinery (which operates in terms of ctf_archive_t's on ctf_link_inputs -> ctf_link_outputs) to the deduplicator (which operates in terms of arrays of ctf_file_t's, all the archives exploded). The nondeduplicating linker is retained, but is not called unless the CTF_LINK_NONDEDUP flag is passed in (which ld never does), or the environment variable LD_NO_CTF_DEDUP is set. Eventually, once we have confidence in the much-more-complex deduplicating linker, I hope the nondeduplicating linker can be removed. In brief, what this does is traverses each input archive in ctf_link_inputs, opening every member (if not already open) and tying child dicts to their parents, shoving them into an array and constructing a corresponding parents array that tells the deduplicator which dict is the parent of which child. We then call ctf_dedup and ctf_dedup_emit with that array of inputs, taking the outputs that result and putting them into ctf_link_outputs where the rest of the CTF linker expects to find them, then linking in the variables just as is done by the nondeduplicating linker. It also implements much of the CU-mapping side of things. The problem CU-mapping introduces is that if you map many input CUs into one output, this is saying that you want many translation units to produce at most one child dict if conflicting types are found in any of them. This means you can suddenly have multiple distinct types with the same name in the same dict, which libctf cannot really represent because it's not something you can do with C translation units. The deduplicator machinery already committed does as best it can with these, hiding types with conflicting names rather than making child dicts out of them: but we still need to call it. This is done similarly to the main link, taking the inputs (one CU output at a time), deduplicating them, taking the output and making it an input to the final link. Two (significant) optimizations are done: we share atoms tables between all these links and the final link (so e.g. all type hash values are shared, all decorated type names, etc); and any CU-mapped links with only one input (and no child dicts) doesn't need to do anything other than renaming the CU: the CU-mapped link phase can be skipped for it. Put together, large CU-mapped links can save 50% of their memory usage and about as much time (and the memory usage for CU-mapped links is significant, because all those output CUs have to have all their types stored in memory all at once). include/ * ctf-api.h (CTF_LINK_NONDEDUP): New, turn off the deduplicator. libctf/ * ctf-impl.h (ctf_list_splice): New. * ctf-util.h (ctf_list_splice): Likewise. * ctf-link.c (link_sort_inputs_cb_arg_t): Likewise. (ctf_link_sort_inputs): Likewise. (ctf_link_deduplicating_count_inputs): Likewise. (ctf_link_deduplicating_open_inputs): Likewise. (ctf_link_deduplicating_close_inputs): Likewise. (ctf_link_deduplicating_variables): Likewise. (ctf_link_deduplicating_per_cu): Likewise. (ctf_link_deduplicating): Likewise. (ctf_link): Call it.
2020-07-22libctf, link: add CTF_LINK_OMIT_VARIABLES_SECTIONNick Alcock1-1/+2
This flag (not used anywhere yet) causes the variables section to be omitted from the output CTF dict. include/ * ctf-api.h (CTF_LINK_OMIT_VARIABLES_SECTION): New. libctf/ * ctf-link.c (ctf_link_one_input_archive_member): Check CTF_LINK_OMIT_VARIABLES_SECTION.