aboutsummaryrefslogtreecommitdiff
path: root/libctf/ctf-string.c
AgeCommit message (Collapse)AuthorFilesLines
2020-08-27libctf, binutils, include, ld: gettextize and improve error handlingNick Alcock1-1/+1
This commit follows on from the earlier commit "libctf, ld, binutils: add textual error/warning reporting for libctf" and converts every error in libctf that was reported using ctf_dprintf to use ctf_err_warn instead, gettextizing them in the process, using N_() where necessary to avoid doing gettext calls unless an error message is actually generated, and rephrasing some error messages for ease of translation. This requires a slight change in the ctf_errwarning_next API: this API is public but has not been in a release yet, so can still change freely. The problem is that many errors are emitted at open time (whether opening of a CTF dict, or opening of a CTF archive): the former of these throws away its incompletely-initialized ctf_file_t rather than return it, and the latter has no ctf_file_t at all. So errors and warnings emitted at open time cannot be stored in the ctf_file_t, and have to go elsewhere. We put them in a static local in ctf-subr.c (which is not very thread-safe: a later commit will improve things here): ctf_err_warn with a NULL fp adds to this list, and the public interface ctf_errwarning_next with a NULL fp retrieves from it. We need a slight exception from the usual iterator rules in this case: with a NULL fp, there is nowhere to store the ECTF_NEXT_END "error" which signifies the end of iteration, so we add a new err parameter to ctf_errwarning_next which is used to report such iteration-related errors. (If an fp is provided -- i.e., if not reporting open errors -- this is optional, but even if it's optional it's still an API change. This is actually useful from a usability POV as well, since ctf_errwarning_next is usually called when there's been an error, so overwriting the error code with ECTF_NEXT_END is not very helpful! So, unusually, ctf_errwarning_next now uses the passed fp for its error code *only* if no errp pointer is passed in, and leaves it untouched otherwise.) ld, objdump and readelf are adapted to call ctf_errwarning_next with a NULL fp to report open errors where appropriate. The ctf_err_warn API also has to change, gaining a new error-number parameter which is used to add the error message corresponding to that error number into the debug stream when LIBCTF_DEBUG is enabled: changing this API is easy at this point since we are already touching all existing calls to gettextize them. We need this because the debug stream should contain the errno's message, but the error reported in the error/warning stream should *not*, because the caller will probably report it themselves at failure time regardless, and reporting it in every error message that leads up to it leads to a ridiculous chattering on failure, which is likely to end up as ridiculous chattering on stderr (trimmed a bit): CTF error: `ld/testsuite/ld-ctf/A.c (0): lookup failure for type 3: flags 1: The parent CTF dictionary is unavailable' CTF error: `ld/testsuite/ld-ctf/A.c (0): struct/union member type hashing error during type hashing for type 80000001, kind 6: The parent CTF dictionary is unavailable' CTF error: `deduplicating link variable emission failed for ld/testsuite/ld-ctf/A.c: The parent CTF dictionary is unavailable' ld/.libs/lt-ld-new: warning: CTF linking failed; output will have no CTF section: `The parent CTF dictionary is unavailable' We only need to be told that the parent CTF dictionary is unavailable *once*, not over and over again! errmsgs are still emitted on warning generation, because warnings do not usually lead to a failure propagated up to the caller and reported there. Debug-stream messages are not translated. If translation is turned on, there will be a mixture of English and translated messages in the debug stream, but rather that than burden the translators with debug-only output. binutils/ChangeLog 2020-08-27 Nick Alcock <nick.alcock@oracle.com> * objdump.c (dump_ctf_archive_member): Move error- reporting... (dump_ctf_errs): ... into this separate function. (dump_ctf): Call it on open errors. * readelf.c (dump_ctf_archive_member): Move error- reporting... (dump_ctf_errs): ... into this separate function. Support calls with NULL fp. Adjust for new err parameter to ctf_errwarning_next. (dump_section_as_ctf): Call it on open errors. include/ChangeLog 2020-08-27 Nick Alcock <nick.alcock@oracle.com> * ctf-api.h (ctf_errwarning_next): New err parameter. ld/ChangeLog 2020-08-27 Nick Alcock <nick.alcock@oracle.com> * ldlang.c (lang_ctf_errs_warnings): Support calls with NULL fp. Adjust for new err parameter to ctf_errwarning_next. Only check for assertion failures when fp is non-NULL. (ldlang_open_ctf): Call it on open errors. * testsuite/ld-ctf/ctf.exp: Always use the C locale to avoid breaking the diags tests. libctf/ChangeLog 2020-08-27 Nick Alcock <nick.alcock@oracle.com> * ctf-subr.c (open_errors): New list. (ctf_err_warn): Calls with NULL fp append to open_errors. Add err parameter, and use it to decorate the debug stream with errmsgs. (ctf_err_warn_to_open): Splice errors from a CTF dict into the open_errors. (ctf_errwarning_next): Calls with NULL fp report from open_errors. New err param to report iteration errors (including end-of-iteration) when fp is NULL. (ctf_assert_fail_internal): Adjust ctf_err_warn call for new err parameter: gettextize. * ctf-impl.h (ctfo_get_vbytes): Add ctf_file_t parameter. (LCTF_VBYTES): Adjust. (ctf_err_warn_to_open): New. (ctf_err_warn): Adjust. (ctf_bundle): Used in only one place: move... * ctf-create.c: ... here. (enumcmp): Use ctf_err_warn, not ctf_dprintf, passing the err number down as needed. Don't emit the errmsg. Gettextize. (membcmp): Likewise. (ctf_add_type_internal): Likewise. (ctf_write_mem): Likewise. (ctf_compress_write): Likewise. Report errors writing the header or body. (ctf_write): Likewise. * ctf-archive.c (ctf_arc_write_fd): Use ctf_err_warn, not ctf_dprintf, and gettextize, as above. (ctf_arc_write): Likewise. (ctf_arc_bufopen): Likewise. (ctf_arc_open_internal): Likewise. * ctf-labels.c (ctf_label_iter): Likewise. * ctf-open-bfd.c (ctf_bfdclose): Likewise. (ctf_bfdopen): Likewise. (ctf_bfdopen_ctfsect): Likewise. (ctf_fdopen): Likewise. * ctf-string.c (ctf_str_write_strtab): Likewise. * ctf-types.c (ctf_type_resolve): Likewise. * ctf-open.c (get_vbytes_common): Likewise. Pass down the ctf dict. (get_vbytes_v1): Pass down the ctf dict. (get_vbytes_v2): Likewise. (flip_ctf): Likewise. (flip_types): Likewise. Use ctf_err_warn, not ctf_dprintf, and gettextize, as above. (upgrade_types_v1): Adjust calls. (init_types): Use ctf_err_warn, not ctf_dprintf, as above. (ctf_bufopen_internal): Likewise. Adjust calls. Transplant errors emitted into individual dicts into the open errors if this turns out to be a failed open in the end. * ctf-dump.c (ctf_dump_format_type): Adjust ctf_err_warn for new err argument. Gettextize. Don't emit the errmsg. (ctf_dump_funcs): Likewise. Collapse err label into its only case. (ctf_dump_type): Likewise. * ctf-link.c (ctf_create_per_cu): Adjust ctf_err_warn for new err argument. Gettextize. Don't emit the errmsg. (ctf_link_one_type): Likewise. (ctf_link_lazy_open): Likewise. (ctf_link_one_input_archive): Likewise. (ctf_link_deduplicating_count_inputs): Likewise. (ctf_link_deduplicating_open_inputs): Likewise. (ctf_link_deduplicating_close_inputs): Likewise. (ctf_link_deduplicating): Likewise. (ctf_link): Likewise. (ctf_link_deduplicating_per_cu): Likewise. Add some missed ctf_set_errnos to obscure error cases. * ctf-dedup.c (ctf_dedup_rhash_type): Adjust ctf_err_warn for new err argument. Gettextize. Don't emit the errmsg. (ctf_dedup_populate_mappings): Likewise. (ctf_dedup_detect_name_ambiguity): Likewise. (ctf_dedup_init): Likewise. (ctf_dedup_multiple_input_dicts): Likewise. (ctf_dedup_conflictify_unshared): Likewise. (ctf_dedup): Likewise. (ctf_dedup_rwalk_one_output_mapping): Likewise. (ctf_dedup_id_to_target): Likewise. (ctf_dedup_emit_type): Likewise. (ctf_dedup_emit_struct_members): Likewise. (ctf_dedup_populate_type_mapping): Likewise. (ctf_dedup_populate_type_mappings): Likewise. (ctf_dedup_emit): Likewise. (ctf_dedup_hash_type): Likewise. Fix a bit of messed-up error status setting. (ctf_dedup_rwalk_one_output_mapping): Likewise. Don't hide unknown-type-kind messages (which signify file corruption).
2020-01-01Update year range in copyright notice of binutils filesAlan Modra1-1/+1
2019-10-03libctf: remove ctf_malloc, ctf_free and ctf_strdupNick Alcock1-11/+11
These just get in the way of auditing for erroneous usage of strdup and add a huge irregular surface of "ctf_malloc or malloc? ctf_free or free? ctf_strdup or strdup?" ctf_malloc and ctf_free usage has not reliably matched up for many years, if ever, making the whole game pointless. Go back to malloc, free, and strdup like everyone else: while we're at it, fix a bunch of places where we weren't properly checking for OOM. This changes the interface of ctf_cuname_set and ctf_parent_name_set, which could strdup but could not return errors (like ENOMEM). New in v4. include/ * ctf-api.h (ctf_cuname_set): Can now fail, returning int. (ctf_parent_name_set): Likewise. libctf/ * ctf-impl.h (ctf_alloc): Remove. (ctf_free): Likewise. (ctf_strdup): Likewise. * ctf-subr.c (ctf_alloc): Remove. (ctf_free): Likewise. * ctf-util.c (ctf_strdup): Remove. * ctf-create.c (ctf_serialize): Use malloc, not ctf_alloc; free, not ctf_free; strdup, not ctf_strdup. (ctf_dtd_delete): Likewise. (ctf_dvd_delete): Likewise. (ctf_add_generic): Likewise. (ctf_add_function): Likewise. (ctf_add_enumerator): Likewise. (ctf_add_member_offset): Likewise. (ctf_add_variable): Likewise. (membadd): Likewise. (ctf_compress_write): Likewise. (ctf_write_mem): Likewise. * ctf-decl.c (ctf_decl_push): Likewise. (ctf_decl_fini): Likewise. (ctf_decl_sprintf): Likewise. Check for OOM. * ctf-dump.c (ctf_dump_append): Use malloc, not ctf_alloc; free, not ctf_free; strdup, not ctf_strdup. (ctf_dump_free): Likewise. (ctf_dump): Likewise. * ctf-open.c (upgrade_types_v1): Likewise. (init_types): Likewise. (ctf_file_close): Likewise. (ctf_bufopen_internal): Likewise. Check for OOM. (ctf_parent_name_set): Likewise: report the OOM to the caller. (ctf_cuname_set): Likewise. (ctf_import): Likewise. * ctf-string.c (ctf_str_purge_atom_refs): Use malloc, not ctf_alloc; free, not ctf_free; strdup, not ctf_strdup. (ctf_str_free_atom): Likewise. (ctf_str_create_atoms): Likewise. (ctf_str_add_ref_internal): Likewise. (ctf_str_remove_ref): Likewise. (ctf_str_write_strtab): Likewise.
2019-10-03libctf: avoid the need to ever use ctf_updateNick Alcock1-33/+130
The method of operation of libctf when the dictionary is writable has before now been that types that are added land in the dynamic type section, which is a linked list and hash of IDs -> dynamic type definitions (and, recently a hash of names): the DTDs are a bit of CTF representing the ctf_type_t and ad hoc C structures representing the vlen. Historically, libctf was unable to do anything with these types, not even look them up by ID, let alone by name: if you wanted to do that say if you were adding a type that depended on one you just added) you called ctf_update, which serializes all the DTDs into a CTF file and reopens it, copying its guts over the fp it's called with. The ctf_updated types are then frozen in amber and unchangeable: all lookups will return the types in the static portion in preference to the dynamic portion, and we will refuse to re-add things that already exist in the static portion (and, of late, in the dynamic portion too). The libctf machinery remembers the boundary between static and dynamic types and looks in the right portion for each type. Lots of things still don't quite work with dynamic types (e.g. getting their size), but enough works to do a bunch of additions and then a ctf_update, most of the time. Except it doesn't, because ctf_add_type finds it necessary to walk the full dynamic type definition list looking for types with matching names, so it gets slower and slower with every type you add: fixing this requires calling ctf_update periodically for no other reason than to avoid massively slowing things down. This is all clunky and very slow but kind of works, until you consider that it is in fact possible and indeed necessary to modify one sort of type after it has been added: forwards. These are necessarily promoted to structs, unions or enums, and when they do so *their type ID does not change*. So all of a sudden we are changing types that already exist in the static portion. ctf_update gets massively confused by this and allocates space enough for the forward (with no members), but then emits the new dynamic type (with all the members) into it. You get an assertion failure after that, if you're lucky, or a coredump. So this commit rejigs things a bit and arranges to exclusively use the dynamic type definitions in writable dictionaries, and the static type definitions in readable dictionaries: we don't at any time have a mixture of static and dynamic types, and you don't need to call ctf_update to make things "appear". The ctf_dtbyname hash I introduced a few months ago, which maps things like "struct foo" to DTDs, is removed, replaced instead by a change of type of the four dictionaries which track names. Rather than just being (unresizable) ctf_hash_t's populated only at ctf_bufopen time, they are now a ctf_names_t structure, which is a pair of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used in readonly dictionaries, and the ctf_dynhash_t being used in writable ones. The decision as to which to use is centralized in the new functions ctf_lookup_by_rawname (which takes a type kind) and ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.) This change lets us switch from using static to dynamic name hashes on the fly across the entirety of libctf without complexifying anything: in fact, because we now centralize the knowledge about how to map from type kind to name hash, it actually simplifies things and lets us throw out quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced by the dynamic half of the name tables), through to ctf_dtnextid (now that a dictionary's static portion is never referenced if the dictionary is writable, we can just use ctf_typemax to indicate the maximum type: dynamic or non-dynamic does not matter, and we no longer need to track the boundary between the types). You can now ctf_rollback() as far as you like, even past a ctf_update or for that matter a full writeout; all the iteration functions work just as well on writable as on read-only dictionaries; ctf_add_type no longer needs expensive duplicated code to run over the dynamic types hunting for ones it might be interested in; and the linker no longer needs a hack to call ctf_update so that calling ctf_add_type is not impossibly expensive. There is still a bit more complexity: some new code paths in ctf-types.c need to know how to extract information from dynamic types. This complexity will go away again in a few months when libctf acquires a proper intermediate representation. You can still call ctf_update if you like (it's public API, after all), but its only effect now is to set the point to which ctf_discard rolls back. Obviously *something* still needs to serialize the CTF file before writeout, and this job is done by ctf_serialize, which does everything ctf_update used to except set the counter used by ctf_discard. It is automatically called by the various functions that do CTF writeout: nobody else ever needs to call it. With this in place, forwards that are promoted to non-forwards no longer crash the link, even if it happens tens of thousands of types later. v5: fix tabdamage. libctf/ * ctf-impl.h (ctf_names_t): New. (ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t. (ctf_file_t) <ctf_structs>: Likewise. <ctf_unions>: Likewise. <ctf_enums>: Likewise. <ctf_names>: Likewise. <ctf_lookups>: Improve comment. <ctf_ptrtab_len>: New. <ctf_prov_strtab>: New. <ctf_str_prov_offset>: New. <ctf_dtbyname>: Remove, redundant to the names hashes. <ctf_dtnextid>: Remove, redundant to ctf_typemax. (ctf_dtdef_t) <dtd_name>: Remove. <dtd_data>: Note that the ctt_name is now populated. (ctf_str_atom_t) <csa_offset>: This is now the strtab offset for internal strings too. <csa_external_offset>: New, the external strtab offset. (CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case. (ctf_name_table): New declaration. (ctf_lookup_by_rawname): Likewise. (ctf_lookup_by_rawhash): Likewise. (ctf_set_ctl_hashes): Likewise. (ctf_serialize): Likewise. (ctf_dtd_insert): Adjust. (ctf_simple_open_internal): Likewise. (ctf_bufopen_internal): Likewise. (ctf_list_empty_p): Likewise. (ctf_str_remove_ref): Likewise. (ctf_str_add): Returns uint32_t now. (ctf_str_add_ref): Likewise. (ctf_str_add_external): Now returns a boolean (int). * ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab for strings in the appropriate range. (ctf_str_create_atoms): Create the ctf_prov_strtab. Detect OOM when adding the null string to the new strtab. (ctf_str_free_atoms): Destroy the ctf_prov_strtab. (ctf_str_add_ref_internal): Add make_provisional argument. If make_provisional, populate the offset and fill in the ctf_prov_strtab accordingly. (ctf_str_add): Return the offset, not the string. (ctf_str_add_ref): Likewise. (ctf_str_add_external): Return a success integer. (ctf_str_remove_ref): New, remove a single ref. (ctf_str_count_strtab): Do not count the initial null string's length or the existence or length of any unreferenced internal atoms. (ctf_str_populate_sorttab): Skip atoms with no refs. (ctf_str_write_strtab): Populate the nullstr earlier. Add one to the cts_len for the null string, since it is no longer done in ctf_str_count_strtab. Adjust for csa_external_offset rename. Populate the csa_offset for both internal and external cases. Flush the ctf_prov_strtab afterwards, and reset the ctf_str_prov_offset. * ctf-create.c (ctf_grow_ptrtab): New. (ctf_create): Call it. Initialize new fields rather than old ones. Tell ctf_bufopen_internal that this is a writable dictionary. Set the ctl hashes and data model. (ctf_update): Rename to... (ctf_serialize): ... this. Leave a compatibility function behind. Tell ctf_simple_open_internal that this is a writable dictionary. Pass the new fields along from the old dictionary. Drop ctf_dtnextid and ctf_dtbyname. Use ctf_strraw, not dtd_name. Do not zero out the DTD's ctt_name. (ctf_prefixed_name): Rename to... (ctf_name_table): ... this. No longer return a prefixed name: return the applicable name table instead. (ctf_dtd_insert): Use it, and use the right name table. Pass in the kind we're adding. Migrate away from dtd_name. (ctf_dtd_delete): Adjust similarly. Remove the ref to the deleted ctt_name. (ctf_dtd_lookup_type_by_name): Remove. (ctf_dynamic_type): Always return NULL on read-only dictionaries. No longer check ctf_dtnextid: check ctf_typemax instead. (ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead. (ctf_rollback): Likewise. No longer fail with ECTF_OVERROLLBACK. Use ctf_name_table and the right name table, and migrate away from dtd_name as in ctf_dtd_delete. (ctf_add_generic): Pass in the kind explicitly and pass it to ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid. Migrate away from dtd_name to using ctf_str_add_ref to populate the ctt_name. Grow the ptrtab if needed. (ctf_add_encoded): Pass in the kind. (ctf_add_slice): Likewise. (ctf_add_array): Likewise. (ctf_add_function): Likewise. (ctf_add_typedef): Likewise. (ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking ctt_name rather than dtd_name. (ctf_add_struct_sized): Pass in the kind. Use ctf_lookup_by_rawname, not ctf_hash_lookup_type / ctf_dtd_lookup_type_by_name. (ctf_add_union_sized): Likewise. (ctf_add_enum): Likewise. (ctf_add_enum_encoded): Likewise. (ctf_add_forward): Likewise. (ctf_add_type): Likewise. (ctf_compress_write): Call ctf_serialize: adjust for ctf_size not being initialized until after the call. (ctf_write_mem): Likewise. (ctf_write): Likewise. * ctf-archive.c (arc_write_one_ctf): Likewise. * ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not ctf_hash_lookup_type. (ctf_lookup_by_id): No longer check the readonly types if the dictionary is writable. * ctf-open.c (init_types): Assert that this dictionary is not writable. Adjust to use the new name hashes, ctf_name_table, and ctf_ptrtab_len. GNU style fix for the final ptrtab scan. (ctf_bufopen_internal): New 'writable' parameter. Flip on LCTF_RDWR if set. Drop out early when dictionary is writable. Split the ctf_lookups initialization into... (ctf_set_cth_hashes): ... this new function. (ctf_simple_open_internal): Adjust. New 'writable' parameter. (ctf_simple_open): Adjust accordingly. (ctf_bufopen): Likewise. (ctf_file_close): Destroy the appropriate name hashes. No longer destroy ctf_dtbyname, which is gone. (ctf_getdatasect): Remove spurious "extern". * ctf-types.c (ctf_lookup_by_rawname): New, look up types in the specified name table, given a kind. (ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *. (ctf_member_iter): Add support for iterating over the dynamic type list. (ctf_enum_iter): Likewise. (ctf_variable_iter): Likewise. (ctf_type_rvisit): Likewise. (ctf_member_info): Add support for types in the dynamic type list. (ctf_enum_name): Likewise. (ctf_enum_value): Likewise. (ctf_func_type_info): Likewise. (ctf_func_type_args): Likewise. * ctf-link.c (ctf_accumulate_archive_names): No longer call ctf_update. (ctf_link_write): Likewise. (ctf_link_intern_extern_string): Adjust for new ctf_str_add_external return value. (ctf_link_add_strtab): Likewise. * ctf-util.c (ctf_list_empty_p): New.
2019-10-03libctf: support getting strings from the ELF strtabNick Alcock1-36/+129
The CTF file format has always supported "external strtabs", which internally are strtab offsets with their MSB on: such refs get their strings from the strtab passed in at CTF file open time: this is usually intended to be the ELF strtab, and that's what this implementation is meant to support, though in theory the external strtab could come from anywhere. This commit adds support for these external strings in the ctf-string.c strtab tracking layer. It's quite easy: we just add a field csa_offset to the atoms table that tracks all strings: this field tracks the offset of the string in the ELF strtab (with its MSB already on, courtesy of a new macro CTF_SET_STID), and adds a new function that sets the csa_offset to the specified offset (plus MSB). Then we just need to avoid writing out strings to the internal strtab if they have csa_offset set, and note that the internal strtab is shorter than it might otherwise be. (We could in theory save a little more time here by eschewing sorting such strings, since we never actually write the strings out anywhere, but that would mean storing them separately and it's just not worth the complexity cost until profiling shows it's worth doing.) We also have to go through a bit of extra effort at variable-sorting time. This was previously using direct references to the internal strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab is not yet ready to put in its usual field (in a ctf_file_t that hasn't even been allocated yet at this stage): but now we're using the external strtab, this will no longer do because it'll be looking things up in the wrong strtab, with disastrous results. Instead, pass the new internal strtab in to a new ctf_strraw_explicit function which is just like ctf_strraw except you can specify a ne winternal strtab to use. But even now that it is using a new internal strtab, this is not quite enough: it can't look up strings in the external strtab because ld hasn't written it out yet, and when it does will write it straight to disk. Instead, when we write the internal strtab, note all the offset -> string mappings that we have noted belong in the *external* strtab to a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look in there at ctf_strraw time if it is set. This uses minimal extra memory (because only strings in the external strtab that we actually use are stored, and even those come straight out of the atoms table), but let both variable sorting and name interning when ctf_bufopen is next called work fine. (This also means that we don't need to filter out spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to the caller, once we wrap ctf_bufopen so that we have a new internal variant of ctf_bufopen etc that we can pass the synthetic external strtab to. That error has been filtered out since the days of Solaris libctf, which didn't try to handle the problem of getting external strtabs right at construction time at all.) v3: add the synthetic strtab and all associated machinery. v5: fix tabdamage. include/ * ctf.h (CTF_SET_STID): New. libctf/ * ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field. (ctf_file_t) <ctf_syn_ext_strtab>: Likewise. (ctf_str_add_ref): Name the last arg. (ctf_str_add_external) New. (ctf_str_add_strraw_explicit): Likewise. (ctf_simple_open_internal): Likewise. (ctf_bufopen_internal): Likewise. * ctf-string.c (ctf_strraw_explicit): Split from... (ctf_strraw): ... here, with new support for ctf_syn_ext_strtab. (ctf_str_add_ref_internal): Return the atom, not the string. (ctf_str_add): Adjust accordingly. (ctf_str_add_ref): Likewise. Move up in the file. (ctf_str_add_external): New: update the csa_offset. (ctf_str_count_strtab): Only account for strings with no csa_offset in the internal strtab length. (ctf_str_write_strtab): If the csa_offset is set, update the string's refs without writing the string out, and update the ctf_syn_ext_strtab. Make OOM handling less ugly. * ctf-create.c (struct ctf_sort_var_arg_cb): New. (ctf_update): Handle failure to populate the strtab. Pass in the new ctf_sort_var arg. Adjust for ctf_syn_ext_strtab addition. Call ctf_simple_open_internal, not ctf_simple_open. (ctf_sort_var): Call ctf_strraw_explicit rather than looking up strings by hand. * ctf-hash.c (ctf_hash_insert_type): Likewise (but using ctf_strraw). Adjust to diagnose ECTF_STRTAB nonetheless. * ctf-open.c (init_types): No longer filter out ECTF_STRTAB. (ctf_file_close): Destroy the ctf_syn_ext_strtab. (ctf_simple_open): Rename to, and reimplement as a wrapper around... (ctf_simple_open_internal): ... this new function, which calls ctf_bufopen_internal. (ctf_bufopen): Rename to, and reimplement as a wrapper around... (ctf_bufopen_internal): ... this new function, which sets ctf_syn_ext_strtab.
2019-07-01libctf: deduplicate and sort the string tableNick Alcock1-0/+330
ctf.h states: > [...] the CTF string table does not contain any duplicated strings. Unfortunately this is entirely untrue: libctf has before now made no attempt whatsoever to deduplicate the string table. It computes the string table's length on the fly as it adds new strings to the dynamic CTF file, and ctf_update() just writes each string to the table and notes the current write position as it traverses the dynamic CTF file's data structures and builds the final CTF buffer. There is no global view of the strings and no deduplication. Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding a new dynhash table ctf_str_atoms that maps unique strings to a list of references to those strings: a reference is a simple uint32_t * to some value somewhere in the under-construction CTF buffer that needs updating to note the string offset when the strtab is laid out. Adding a string is now a simple matter of calling ctf_str_add_ref(), which adds a new atom to the atoms table, if one doesn't already exist, and adding the location of the reference to this atom to the refs list attached to the atom: this works reliably as long as one takes care to only call ctf_str_add_ref() once the final location of the offset is known (so you can't call it on a temporary structure and then memcpy() that structure into place in the CTF buffer, because the ref will still point to the old location: ctf_update() changes accordingly). Generating the CTF string table is a matter of calling ctf_str_write_strtab(), which counts the length and number of elements in the atoms table using the ctf_dynhash_iter() function we just added, populating an array of pointers into the atoms table and sorting it into order (to help compressors), then traversing this table and emitting it, updating the refs to each atom as we go. The only complexity here is arranging to keep the null string at offset zero, since a lot of code in libctf depends on being able to leave strtab references at 0 to indicate 'no name'. Once the table is constructed and the refs updated, we know how long it is, so we can realloc() the partial CTF buffer we allocated earlier and can copy the table on to the end of it (and purge the refs because they're not needed any more and have been invalidated by the realloc() call in any case). The net effect of all this is a reduction in uncompressed strtab sizes of about 30% (perhaps a quarter to a half of all strings across the Linux kernel are eliminated as duplicates). Of course, duplicated strings are highly redundant, so the space saving after compression is only about 20%: when the other non-strtab sections are factored in, CTF sizes shrink by about 10%. No change in externally-visible API or file format (other than the reduction in pointless redundancy). libctf/ * ctf-impl.h: (struct ctf_strs_writable): New, non-const version of struct ctf_strs. (struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated. (struct ctf_str_atom): New, disambiguated single string. (struct ctf_str_atom_ref): New, points to some other location that references this string's offset. (struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs. Remove member ctf_dtvstrlen: we no longer track the total strlen as we add strings. (ctf_str_create_atoms): Declare new function in ctf-string.c. (ctf_str_free_atoms): Likewise. (ctf_str_add): Likewise. (ctf_str_add_ref): Likewise. (ctf_str_purge_refs): Likewise. (ctf_str_write_strtab): Likewise. (ctf_realloc): Declare new function in ctf-util.c. * ctf-open.c (ctf_bufopen): Create the atoms table. (ctf_file_close): Destroy it. * ctf-create.c (ctf_update): Copy-and-free it on update. No longer special-case the position of the parname string. Construct the strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the rest of each buffer element is constructed, not via open-coding: realloc the CTF buffer and append the strtab to it. No longer maintain ctf_dtvstrlen. Sort the variable entry table later, after strtab construction. (ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members. (ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref after buffer element construction instead. (ctf_copy_lmembers): Likewise. (ctf_copy_emembers): Likewise. (ctf_create): No longer maintain the ctf_dtvstrlen. (ctf_dtd_delete): Likewise. (ctf_dvd_delete): Likewise. (ctf_add_generic): Likewise. (ctf_add_enumerator): Likewise. (ctf_add_member_offset): Likewise. (ctf_add_variable): Likewise. (membadd): Likewise. * ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts if there are active ctf_str_num_refs. (ctf_strraw): Move to ctf-string.c. (ctf_strptr): Likewise. * ctf-string.c: New file, strtab manipulation. * Makefile.am (libctf_a_SOURCES): Add it. * Makefile.in: Regenerate.