This doc describes how SuperSize breaks down native binaries into symbols.
[TOC]
Native symbols are those with a section of:
.text(executable code).rodata(read-only data).data(writable data).data.rel.ro(data that is read-only after ELF relocations are applied).bss(symbols that are zero-initialized. These consume no space in the binary, and so are generally ignored despite still being collected.
There are 3 modes that SuperSize can use to break an ELF down into symbols:
linker_map- Uses linker map + build directory to create symbols.dwarf- Uses debug information to create symbols.sections- Creates one symbol for each ELF section.
This is the mode that produces the largest number of symbols, and thus is the preferred mode. Information provided only by this mode:
- Path information for symbols outside of .text
- DWARF information is complete for .text symbols (maybe because stack symbolization is a primary use-case?), but incomplete or missing for symbols in other sections.
- String literals (.rodata symbols that look like
"some string dat...").- Linker map files contain
** merge stringsentries, which tell us where to string tables exist within.rodata.
- Linker map files contain
object_path, which is useful for attributing STL usages to individual source files.- Path aliases - when an inline symbol is used by multiple source files, we attribute the symbol's cost equally among the files.
- Linker-generated symbols. E.g. Switch tables.
build.ninjais parsed to get:- List of
.oand.afiles that were inputs to the linker. - Mapping of
.cc->.ofiles.
- List of
- All
.o(and.a) files are parsed:- with
nmto get symbol list. - Non-ThinLTO: with
nmto get list of string literals - ThinLTO: with
llvm-bcanalyzerto get list of string literals
- with
- ELF file is parse with
nmto get list of symbol names that were identical-code-folded to the same address. - Linker map (created via
-Wl,-Map=output.map) parsed to get:- Full list of symbols that comprise the binary,
- Location of string tables (
** merge stringsentries). - Non-ThinLTO:
object_path(.ofile) associated with each symbol - Note:
- With ThinLTO,
object_pathpoints to a hashed filename within the thinlto cache (not useful). - When multiple symbols are folded together due to Identical Code Folding, the linker map file lists only one of them.
- With ThinLTO,
- ELF file string tables are parsed by looking for
\0bytes and creating string literal symbols for each string therein.
- Create initial symbol list from linker map.
- Assign object paths by seeing which
.ofiles define each symbol (match up the names).- When multiple files define the same symbol, create symbol aliases.
- Create string literal symbols from string tables, and assign them paths based
on which
.ofiles define the same string literal. - Assign
source_pathusing the.o->.ccmapping frombuild.ninja.- This means that
.hfiles are never listed as sources. No information about inlined symbols is gathered (by design).
- This means that
- Create symbol aliases when
nmreports multiple symbols mapping to the same address. - Normalize
source_pathby removing generated path prefix (and addingFLAG_GENERATED) when applicable. - Normalize symbol names.
Creates symbols using only an ELF with debug information enabled. Requires
compiler flag -gmlt to enable full source paths (rather than just basename).
- Create initial symbol list with
nm --print-size. - Add name aliases using output from
nm(this could have been done at the same time as the previous step, but is done as a separate step in order to share logic withlinker_mapmode. - Uses
dwarfdumpto find allDW_AT_compile_unitandDW_AT_rangesentries and create a map of address range -> source path. - Assign source paths based to .text symbols based on symbol address.
Bloaty is an excellent tool, and produces size information with similar fidelity to "dwarf" mode, as it uses the same data source. We did not use bloaty since "dwarfdump" was already readily available and gave similar results. It would be nice to also have a "bloaty" mode so that we could more direclty compare outputs.
This mode uses readelf -s to create one symbol for each ELF section. It is
used for native files where no debug information or linker map file is
available, and for native files whose ABI do not match the --abi-filter.
Some manipulation happens in order to make names and paths more human-readable.
(anonymous::)is removed from names (and stored as a symbol flag).[clone]suffix removed (and stored as a symbol flag).vtable for FOO->Foo [vtable]- Mangling done by linkers is undone (e.g. prefixing with "unlikely.")
- Names are processed into:
name: Name without template and argument parameters.template_name: Name without argument parameters.full_name: Name with all parameters.
- LLVM function outlining creates many
OUTLINED_FUNCTION_*symbols. These are renamed to** outlined functionsor** outlined functions * (count), and are de-duped so an address can have at most one such symbol.- Update: Outlining was ARM64-only, and has been disabled in our build due to performance regressions.