Scatter/Gather thoughts

by Johan Petersson

GCC ident strings

"What's the deal with GCC text strings in all binaries?"

"What strings? The text you see if you run strings?"

"No, not those. I know about strings, and I'm talking about the strings that strings won't show."

"But... the whole point of strings is to show the strings in binary files! Are you sure there are strings that strings doesn't show?"

At this point the discussion begins an inevitable descent into a pythonesque parody of conversation à la Spam, so I'll summarize my findings for you. The strings in question can look like

GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)

or

GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)

or even

GCC: (GNU) 3.4.3 20041125 (Gentoo Linux 3.4.3-r1, ssp-3.4.3-0, pie-8.7.6.7)

There can be a fair amount of such text in big executables. At first I couldn't find them, because I didn't look in the right place. The GNU strings program will not show these strings by default, because they are in uninitialized .comment sections in the binaries. You can see them if you look at the binary data directly, for example by doing

grep -aoE 'GCC:[ -~]+' a.out

I had always assumed strings looked through entire files, but that turns out not to be true unless you use the -a or --all flag, meaning Do not scan only the initialized and loaded sections of object files; scan the whole files.

That still leaves us with the question of what these strings are and why they are present in our files. On ELF platforms, it's standard GCC behaviour to automatically generate an .ident directive into the assembly output:

.ident "GCC: (GNU) 4.0.0 20041214 (experimental)"

That turns into a .comment section in the compiled object file, and when the object files are linked into a binary all of the ident strings will be included. Some object files will almost invariably come from libraries, so normally you'll see at least a couple of different GCC versions mentioned when you examine your executables.

There does not seem to be any reason for the .ident strings other than backwards compatibility (with SVR4, according to some sources). You can inhibit the automatic generation of .ident directives using the GCC compiler option -fno-ident, but unless you have also rebuilt all libraries with this option you'll still get strings from e.g. glibc when linking. All .ident strings can be stripped entirely from an executable with strip:

strip --remove-section=.comment a.out

Like symbols, these strings are not loaded into memory, so they won't affect the memory footprint of your program. You might save a few megabytes of disk space if you were to strip them entirely from your system, though. I'll leave them in for now; since the strings identifies the exact compiler versions used, they might come in handy for pinpointing problems resulting from known buggy compilers.

19 December, 2004