The DBus (1.14.6) build with link-time optimisation (LTO) fails on Mac with a segmentation violation in slibtool when creating libdbus-testutils.la, specifically just after the creation of the actual archive, libdbus-testutils.a .
On a number of occasions I'm seeing
Feb 11 22:53:06 Portia rlibtool[66047]: rlibtool(66047,0x7fff78f6c310) malloc: *** error for object 0x7fe7b9e00aa0: pointer being freed was not allocated *** set a breakpoint in malloc_error_break to debug
in my system.log but that disappears when I set MallocErrorAbort=1. The backtrace doesn't change though:
MallocErrorAbort=1
Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: EXC_I386_GPFLT Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 rlibtool 0x0000000100e34943 slbt_get_driver_fdctx + 35 (slbt_driver_ctx.c:1534) 1 rlibtool 0x0000000100e2bb89 slbt_driver_fdcwd + 25 (slibtool_driver_impl.h:272) 2 rlibtool 0x0000000100e2b54c slbt_store_archive + 268 (slbt_archive_store.c:73) 3 rlibtool 0x0000000100e39ebf slbt_archive_import_impl + 319 (slbt_archive_import.c:46) 4 rlibtool 0x0000000100e39d36 slbt_archive_import + 86 (slbt_archive_import.c:65) 5 rlibtool 0x0000000100e4aff6 slbt_exec_link_create_archive + 1542 (slbt_exec_link.c:1398) 6 rlibtool 0x0000000100e49902 slbt_exec_link + 2178 (slbt_exec_link.c:2034) 7 rlibtool 0x0000000100e2f145 slbt_perform_driver_actions + 229 (slbt_amain.c:85) 8 rlibtool 0x0000000100e2ed57 slbt_main + 1175 (slbt_amain.c:199) 9 rlibtool 0x0000000100e5d5ce main + 46 (slibtool.c:14) 10 libdyld.dylib 0x00007fff939285fd start + 1 Thread 0 crashed with X86 Thread State (64-bit): rax: 0xaffffffffffffd80 rbx: 0x00007ff192403b40 rcx: 0x0000000000000000 rdx: 0x00007fff5edd8ef0 rdi: 0xb000000000000000 rsi: 0x00007fff5edd8740 rbp: 0x00007fff5edd8720 rsp: 0x00007fff5edd8700 r8: 0x00007ff192500000 r9: 0x000000008e68dd72 r10: 0x000000001f8d6585 r11: 0xfffffffffffffa50 r12: 0x0000000000000000 r13: 0x0000000000000000 r14: 0x0000000100e5dbb8 r15: 0x0000000000000000 rip: 0x0000000100e34943 rfl: 0x0000000000010246 cr2: 0x00000001010cb000 Logical CPU: 0 Error Code: 0x00000000 Trap Number: 13
It looks like the "in-memory archive merging facility" in slbt_archive_import_impl might not be able to handle archives generated by llvm-ar that contain LLVM bitcode objects? Not sure if that's a proper diagnosis because the attached log shows that llvm-ar does the actual work. Plus, we're creating an archive from scratch so I don't know what's being merged here. Either way, the most logical explanation is that the slbt_archive_ctx *arctx is NULL (impossible because there are no checks against that?) or invalid/stale.
slbt_archive_import_impl
slbt_archive_ctx *arctx
A quick look at slbt_merge_archives suggests that the call to slbt_merge_archives_fail at its very end could result in arctx == NULL without an error being returned!
slbt_merge_archives
slbt_merge_archives_fail
arctx == NULL
<img alt="dbus-build.log" src="/cross/slibtool/issue/raw/files/3dde53632251781c5ac6c21a0ca52e1488e49643bdf5cbea1952f88b61b87357-dbus-build.log" />
Bingo, something indeed goes wrong trying to get the archive meta data(?). This is what I see when I insert a return in from of that last slbt_merge_archives_fail call :
return
:info:build rlibtool: link: /opt/local/bin/llvm-ar-mp-9.0 -crs .libs/libdbus-testutils.a .libs/disable-crash-handling.o .libs/test-utils.o :info:build bash(76718,0x7fff78f6c310) malloc: enabling abort() on bad malloc or free :info:build xcrun(76718,0x7fff78f6c310) malloc: enabling abort() on bad malloc or free :info:build llvm-ar(76718,0x7fff78f6c310) malloc: enabling abort() on bad malloc or free :info:build rlibtool: error logged in slbt_get_archive_meta(), line 405: flow error: unexpected condition or other. :info:build rlibtool: < returned to > slbt_merge_archives(), line 683. :info:build rlibtool: < returned to > slbt_archive_import_impl(), line 50. :info:build rlibtool: < returned to > slbt_exec_link_create_archive(), line 1399. :info:build rlibtool: < returned to > slbt_exec_link(), line 2039. :info:build make[3]: *** [libdbus-testutils.la] Error 2
Line 405 is in fact an error trying to read the archive header(s), so it looks like this is indeed an incompatibility issue with the archive format.
Probably not nice to hear, but is it wise to rely only on an "in-memory archive merging facility" that you can't really hope to test on all platforms, without fallback to a traditional method that I presume uses external tools from the toolchain?
FWIW, is this in anyway related to ar's MRI parser? I used to have a patch to avoid attemps using that when using llvm-ar .
And a closing remark: building pkgconf with LTO and --enable-static completes just fine...
pkgconf
--enable-static
Thanks for the catch! That's very helpful.
You're right, that's not nice to hear, but that's fine. Some context;
gnu ar's implementation of MRI scripts is buggy (changes order of members). Then again, not all systems provide gnu ar.
llvm-ar can merge archives (-qL), but again, not all systems come with llvm-ar, and adding llvm-ar as a dependency even on systems where it does exist is going to make bootstrapping tricky (llvm-ar, for instance, has a dependency on libxml2, which builds with libtool). Accordingly, resorting to llvm-ar is not something which can be considered an acceptable solution.
slibtool's ar parsing and the in-memory archive merging facility have been tested against all common archive flavors, namely: sysv, bsd, 32-bits, 64-bits, big endian, little endian.
slibtool's ar parsing is also needed for the implementation of -export-symbols and -export-symbols-regex.
-export-symbols
-export-symbols-regex
indeed, we have not yet tested slibtool-ar against archives containing llvm bitcode, and that's a good point.
Regarding fallback options: I left the implementation of the MRI-based archive merging in the code base (under src/fallback), so that downstream users on systems with archives that cannot be supported by slibtool could use it. That would require a one-line patch in slbt_archive_import() --> check for failure, and resort to slbt_archive_import_mri() as needed.
slbt_archive_import()
slbt_archive_import_mri()
since MRI scripts were found to be buggy, using slbt_archive_import_mri() is not going to be part of slibtool's official release.
The above slbt_archive_import_mri() function could be easily converted into slbt_archive_import_llvm() -- for now, though, I'd much rather understand what failed with the bitcode rather than resort to an external tool of a huge scale.
slbt_archive_import_llvm()
Is that a thin archive by any chance? Can you paste the output of:
$ hexdump -C /path/to/archive.a | head
hexdump -C /path/to/archive.a | head
I think you're misreading the code. slbt_merge_archives_fail() returns whatever ret value it was passed by the caller, which in our context is either SLBT_SYSTEM_ERROR(dctx,0) or SLBT_NESTED_ERROR(dctx), which both always return a negative (-1) result. That's common practice everywhere in slibotol where a static _fail() function is in use.
slbt_merge_archives_fail()
ret
SLBT_SYSTEM_ERROR(dctx,0)
SLBT_NESTED_ERROR(dctx)
_fail()
Reading again through the backtraces and looking at the relevant line numbers, it appears that:
the archives do get successfully merged (or else slbt_store_archive() would have not been invoked.
for whatever reason, dctx has an invalid value, which likely happens because the *arctx passed to it is invalid.
Since we have not seen something similar on our end, could you debug this using lldb, place breakpoints at slbt_merge_archives() and slbt_store_archive(), step through the execution, and see where things go wrong?
lldb
slbt_merge_archives()
slbt_store_archive()
Yes, but you're discarding the return value from slbt_merge_archives_fail and continuing execution in slbt_merge_archives ultimately letting that function return 0. But the _fail function will have freed arctx.
not all systems come with llvm-ar
No, but most systems come with only a bare minimum build toolchain installed. slibtool doesn't need to worry about whether a given build tool is available, only if it works with the most common ones that the user could want to use.
And evidently it (you) could chose not to support anything outside the GNU toolchain.
That would require a one-line patch in slbt_archive_import() --> check for failure, and resort to slbt_archive_import_mri() as needed.
Hmmm, just to the code, not to the build system?
could you debug this using lldb
I can try when I get back to my Mac. Idem for the thin archive thing - but if the archive is thin that is something the build system specified, not I. pkgconf builds OK but I can't affirm if the archive merge function is used there.
Did you notice that this is a bit of a cross-build, targetting 32-bit i386 rather than the native x86_64 code? I think that should make no difference for the archive, just its contents, but I guess there could easily be something in its headers that slibtool isn't expecting.
I'm attaching the build log of the same code with LLVM and very similar options on Linux. Of course llvm-ar has to comply with Linux standards when targetting Linux so the fact that "things work OK" there probably doesn't mean much...
<img alt="dbus-build.log" src="/cross/slibtool/issue/raw/files/b8ab01eb9a108b2df8c339039057e45945dab15caf82ea840a2578ff6bc90bdd-dbus-build.log" />
Is that a thin archive by any chance? Can you paste the output of: $ hexdump -C /path/to/archive.a | head
> hexdump -C dbus-mp9-work/dbus-1.14.6-i386//test/.libs/libdbus-testutils.a | head 00000000 21 3c 61 72 63 68 3e 0a 23 31 2f 31 32 20 20 20 |!<arch>.#1/12 | 00000010 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 | 0 | 00000020 20 20 20 20 30 20 20 20 20 20 30 20 20 20 20 20 | 0 0 | 00000030 30 20 20 20 20 20 20 20 33 39 33 32 20 20 20 20 |0 3932 | 00000040 20 20 60 0a 5f 5f 2e 53 59 4d 44 45 46 00 00 00 | `.__.SYMDEF...| 00000050 90 03 00 00 00 00 00 00 a0 0f 00 00 1e 00 00 00 |................| 00000060 d8 27 00 00 39 00 00 00 d8 27 00 00 50 00 00 00 |.'..9....'..P...| 00000070 d8 27 00 00 6a 00 00 00 d8 27 00 00 81 00 00 00 |.'..j....'......| 00000080 d8 27 00 00 97 00 00 00 d8 27 00 00 aa 00 00 00 |.'.......'......| 00000090 d8 27 00 00 c1 00 00 00 d8 27 00 00 dc 00 00 00 |.'.......'......|
For comparison, libpkgconf.a which doesn't give problems yields this:
> hexdump -C pkgconf-1.9.0/.libs/libpkgconf.a | head 00000000 21 3c 61 72 63 68 3e 0a 23 31 2f 31 32 20 20 20 |!<arch>.#1/12 | 00000010 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 | 0 | 00000020 20 20 20 20 30 20 20 20 20 20 30 20 20 20 20 20 | 0 0 | 00000030 30 20 20 20 20 20 20 20 33 32 33 36 20 20 20 20 |0 3236 | 00000040 20 20 60 0a 5f 5f 2e 53 59 4d 44 45 46 00 00 00 | `.__.SYMDEF...| 00000050 10 03 00 00 00 00 00 00 e8 0c 00 00 17 00 00 00 |................| 00000060 e8 0c 00 00 2a 00 00 00 e8 0c 00 00 48 00 00 00 |....*.......H...| 00000070 90 27 00 00 5e 00 00 00 90 27 00 00 71 00 00 00 |.'..^....'..q...| 00000080 90 27 00 00 87 00 00 00 90 27 00 00 9b 00 00 00 |.'.......'......| 00000090 e8 55 00 00 ba 00 00 00 e8 55 00 00 cf 00 00 00 |.U.......U......|
Fixed in commit 4bc3453, thanks for the catch! In the future, a pointer to the exact line would be much appreciated:-) Apparently the oversight at the original time of writing the code carried over to my reviewing it after reading your original report ...
hexdump -C dbus-mp9-work/dbus-1.14.6-i386//test/.libs/libdbus-testutils.a | head 00000000 21 3c 61 72 63 68 3e 0a 23 31 2f 31 32 20 20 20 |!<arch>.#1/12 | 00000010 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 | 0 | 00000020 20 20 20 20 30 20 20 20 20 20 30 20 20 20 20 20 | 0 0 | 00000030 30 20 20 20 20 20 20 20 33 39 33 32 20 20 20 20 |0 3932 | 00000040 20 20 60 0a 5f 5f 2e 53 59 4d 44 45 46 00 00 00 | `.__.SYMDEF...| 00000050 90 03 00 00 00 00 00 00 a0 0f 00 00 1e 00 00 00 |................| 00000060 d8 27 00 00 39 00 00 00 d8 27 00 00 50 00 00 00 |.'..9....'..P...| 00000070 d8 27 00 00 6a 00 00 00 d8 27 00 00 81 00 00 00 |.'..j....'......| 00000080 d8 27 00 00 97 00 00 00 d8 27 00 00 aa 00 00 00 |.'.......'......| 00000090 d8 27 00 00 c1 00 00 00 d8 27 00 00 dc 00 00 00 |.'.......'......| ``` For comparison, libpkgconf.a which doesn't give problems yields this: ``` hexdump -C pkgconf-1.9.0/.libs/libpkgconf.a | head 00000000 21 3c 61 72 63 68 3e 0a 23 31 2f 31 32 20 20 20 |!<arch>.#1/12 | 00000010 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 | 0 | 00000020 20 20 20 20 30 20 20 20 20 20 30 20 20 20 20 20 | 0 0 | 00000030 30 20 20 20 20 20 20 20 33 32 33 36 20 20 20 20 |0 3236 | 00000040 20 20 60 0a 5f 5f 2e 53 59 4d 44 45 46 00 00 00 | `.__.SYMDEF...| 00000050 10 03 00 00 00 00 00 00 e8 0c 00 00 17 00 00 00 |................| 00000060 e8 0c 00 00 2a 00 00 00 e8 0c 00 00 48 00 00 00 |....*.......H...| 00000070 90 27 00 00 5e 00 00 00 90 27 00 00 71 00 00 00 |.'..^....'..q...| 00000080 90 27 00 00 87 00 00 00 90 27 00 00 9b 00 00 00 |.'.......'......| 00000090 e8 55 00 00 ba 00 00 00 e8 55 00 00 cf 00 00 00 |.U.......U......| ```
hexdump -C dbus-mp9-work/dbus-1.14.6-i386//test/.libs/libdbus-testutils.a | head 00000000 21 3c 61 72 63 68 3e 0a 23 31 2f 31 32 20 20 20 |!<arch>.#1/12 | 00000010 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 | 0 | 00000020 20 20 20 20 30 20 20 20 20 20 30 20 20 20 20 20 | 0 0 | 00000030 30 20 20 20 20 20 20 20 33 39 33 32 20 20 20 20 |0 3932 | 00000040 20 20 60 0a 5f 5f 2e 53 59 4d 44 45 46 00 00 00 | `.__.SYMDEF...| 00000050 90 03 00 00 00 00 00 00 a0 0f 00 00 1e 00 00 00 |................| 00000060 d8 27 00 00 39 00 00 00 d8 27 00 00 50 00 00 00 |.'..9....'..P...| 00000070 d8 27 00 00 6a 00 00 00 d8 27 00 00 81 00 00 00 |.'..j....'......| 00000080 d8 27 00 00 97 00 00 00 d8 27 00 00 aa 00 00 00 |.'.......'......| 00000090 d8 27 00 00 c1 00 00 00 d8 27 00 00 dc 00 00 00 |.'.......'......| ```
For comparison, libpkgconf.a which doesn't give problems yields this: ```
hexdump -C pkgconf-1.9.0/.libs/libpkgconf.a | head 00000000 21 3c 61 72 63 68 3e 0a 23 31 2f 31 32 20 20 20 |!<arch>.#1/12 | 00000010 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 | 0 | 00000020 20 20 20 20 30 20 20 20 20 20 30 20 20 20 20 20 | 0 0 | 00000030 30 20 20 20 20 20 20 20 33 32 33 36 20 20 20 20 |0 3236 | 00000040 20 20 60 0a 5f 5f 2e 53 59 4d 44 45 46 00 00 00 | `.__.SYMDEF...| 00000050 10 03 00 00 00 00 00 00 e8 0c 00 00 17 00 00 00 |................| 00000060 e8 0c 00 00 2a 00 00 00 e8 0c 00 00 48 00 00 00 |....*.......H...| 00000070 90 27 00 00 5e 00 00 00 90 27 00 00 71 00 00 00 |.'..^....'..q...| 00000080 90 27 00 00 87 00 00 00 90 27 00 00 9b 00 00 00 |.'.......'......| 00000090 e8 55 00 00 ba 00 00 00 e8 55 00 00 cf 00 00 00 |.U.......U......| ```
Thanks! We're getting closer I believe ... will right again in some twelve hours.
Actually, and since the error that was fixed in commit 4bc3453 did create a lot of spurious noise, (given the invalid pointer, etc.), please start the build from scratch. What I anticipate:
On that last note, if the failure associated with slbt_get_archive_meta() is not ENOENT, then please attach the archive which slibtool failed to parse to the new issue. Thanks!
In the future, a pointer to the exact line would be much appreciated:-)
Sorry, yes. Somehow my own brain overlooked the fact that I'm using a very recent check-out (and was apparently too lazy to figure out if any of my patches moved the code in question to a different line)...
Going to see what lldb can teach me.
Actually, and since the error that was fixed in commit 4bc3453 did create a lot of spurious noise, (given the invalid pointer, etc.), please start the build from scratch. What I anticipate: the segmentation fault will be gone (so we'll close this issue). we'll see a proper report of the failure returned from slbt_get_archive_meta(), in which case a new issue should be opened.
Our posts crossed. I actually did that already, and attached the failing command and its output in my 1st reply to the OP... So you're right, the SIGSEGV is/was gone, but the error output isn't really much more helpful beyond the indication of the line number.
(And we have here a good example of why it's not always productive to delay trace output as you apparently do ... the error message from slbt_get_archive_meta could have been printed and would have helped understanding why the crash occurred. ;) And I do hope this isn't a good example of me overlooking something instead ^^ )
slbt_get_archive_meta
Fair enough:-) Let's speed things up even further via the following:
$ for f in $(find . -name '*.a'); do echo $f; slibtool-ar -Wcheck $f; done
You had to make me utter bash'ism, did you? :)
bash-4.3$ for f in $(find . -name '*.a'); do echo $f; slibtool-ar -Wcheck $f; done ./bus/.libs/libdbus-daemon-internal.a ./bus/.libs/liblaunch-helper-internal.a ./dbus/.libs/libdbus-1.a ./dbus/.libs/libdbus-internal.a ./test/.libs/libdbus-testutils.a
You had to make me utter bash'ism, did you? :) bash-4.3$ for f in $(find . -name '*.a'); do echo $f; slibtool-ar -Wcheck $f; done ./bus/.libs/libdbus-daemon-internal.a ./bus/.libs/liblaunch-helper-internal.a ./dbus/.libs/libdbus-1.a ./dbus/.libs/libdbus-internal.a ./test/.libs/libdbus-testutils.a
Above suggests that all generated archive files could be parsed by slibtool-ar.
Closing this for now since the SIGSEGV problem has been resolved, discussion shall continue in the other issues:-)
Metadata Update from @midipix: - Issue status updated to: Closed (was: Open)
Meanwhile, in the debugger, I've established that slbt_ar_read_decimal_64 apparently fails its backwards search of the end of the variable it tries to read from mark.
slbt_ar_read_decimal_64
mark
> (cd dbus-mp9-work/dbus-1.14.6-i386/test/ ; lldb -- /opt/local/bin/rlibtool --tag=CC --mode=link ccache /opt/local/bin/clang-mp-9.0 -fno-strict-aliasing -fno-common -Wnested-externs -Wmissing-prototypes -Wstrict-prototypes -Wdeclaration-after-statement -Wimplicit-function-declaration -Wold-style-definition -Wall -Wextra -Wundef -Wwrite-strings -Wpointer-arith -Wmissing-declarations -Wredundant-decls -Wno-unused-parameter -Wno-missing-field-initializers -Wformat=2 -Wcast-align -Wformat-nonliteral -Wformat-security -Wsign-compare -Wstrict-aliasing -Wshadow -Winline -Wpacked -Wmissing-format-attribute -Wmissing-noreturn -Winit-self -Wmissing-include-dirs -Warray-bounds -Wreturn-type -Wswitch-enum -Wswitch-default -Wnull-dereference -Wdouble-promotion -Wchar-subscripts -Wfloat-equal -Wpointer-sign -Wno-deprecated-declarations -Wno-unused-label -Wno-error=unused-parameter -Wno-error=missing-field-initializers -Wno-error=deprecated-declarations -Wno-error=unused-label -O3 -g -flto=thin -march=native -arch i386 -L/opt/local/lib -Wl,-headerpad_max_install_names -O3 -g -flto=thin -Wl,-cache_path_lto,/Volumes/VMs/MPbuild/_Volumes_Debian_MP9_site-ports_devel_dbus/dbus/work/dbus-1.14.6/.lto_cache -march=native -arch i386 -o libdbus-testutils.la disable-crash-handling.lo test-utils.lo ../dbus/libdbus-1.la ../dbus/libdbus-internal.la)
I had to set a few breakpoints with trace output commands to figure that, going in, I have
(lldb) p *arhdr (ar_raw_file_header) $82 = { ar_file_id = { [0] = '1' [1] = '/' [2] = '2' [3] = '8' [4] = ' ' [5] = ' ' [6] = ' ' [7] = ' ' [8] = ' ' [9] = ' ' [10] = ' ' [11] = ' ' [12] = ' ' [13] = ' ' [14] = ' ' [15] = '0' } ar_time_date_stamp = { [0] = ' ' [1] = ' ' [2] = ' ' [3] = ' ' [4] = ' ' [5] = ' ' [6] = ' ' [7] = ' ' [8] = ' ' [9] = ' ' [10] = ' ' [11] = '0' } ar_uid = { [0] = ' ' [1] = ' ' [2] = ' ' [3] = ' ' [4] = ' ' [5] = '0' } ar_gid = { [0] = ' ' [1] = ' ' [2] = ' ' [3] = ' ' [4] = ' ' [5] = '6' } ar_file_mode = { [0] = '4' [1] = '4' [2] = ' ' [3] = ' ' [4] = ' ' [5] = ' ' [6] = ' ' [7] = '6' } ar_file_size = { [0] = '1' [1] = '4' [2] = '0' [3] = ' ' [4] = ' ' [5] = ' ' [6] = ' ' [7] = ' ' [8] = ' ' [9] = '`' } ar_end_tag = { [0] = '\n' [1] = 'd' } }
Then, slbt_ar_read_decimal_64 triggers its eject at line 96 with
(lldb) frame variable (const char *) mark = 0x0000000101808438 "140 `\ndisable-crash-handling.o" (int) len = 10 (uint64_t *) dec = 0x00007fff5fbf98e8 (int) i = 3 (uint64_t) res = 140 Process 11764 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 8.1 frame #0: 0x0000000100006a58 rlibtool`slbt_ar_read_decimal_64(mark="140 `\ndisable-crash-handling.o", len=10, dec=0x00007fff5fbf98e8) at slbt_archive_meta.c:96 93 res *= 10; 94 res += (mark[i] - '0'); 95 } else { -> 96 return -1; 97 } 98 } 99
Errno does appear to be 2 (ENOENT) but I figure it could be useful anyway to attach the offending archive.
2
Except that haven't gotten beyond determining which archives are being merged in slbt_merge_archives (so not which is the offending one):
(lldb) p *arctxv[0]->path (const char *const) $219 = 0x00007fff5fbfbe20 ".libs/libdbus-testutils.a" (lldb) p *arctxv[1]->path (const char *const) $220 = 0x0000000100804522 "../dbus/.libs/libdbus-1.a"
<img alt="libdbus-testutils.a" src="/cross/slibtool/issue/raw/files/44e803b7e6444fb61f0b76a4da3e3cc4d4660657e09b79230a21fefa15503342-libdbus-testutils.a" />
Metadata Update from @RJVB: - Issue status updated to: Open (was: Closed)
Had to bzip libdbus-1.a :
<img alt="libdbus-1.a.bz2" src="/cross/slibtool/issue/raw/files/3e7a53d2ca979a3ea4c757357ece548558f519122ed7d92f83b0be580077779b-libdbus-1.a.bz2" />
(PS: is there a way to disable the smiley popup that appears whenever I type a colon after a space?!)
Sorry, too late, didn't see you were posting while I was jumping back and forth between typing my debugging report and doing the actual debugging. The SEGV was just a side-effect of what actually fails ;)
Output from
> find dbus-mp9-work/dbus-1.14.6-i386/ -name "*.a" | xargs s libtool-ar -Wverbose -Wprint >> /tmp/dbus-archives.txt
<img alt="dbus-archives.txt" src="/cross/slibtool/issue/raw/files/2dba486cfc342496165c3df141358755ab53db0ce7ed8072c37fc5a663414750-dbus-archives.txt" />
And wondering, why do your slbt_ar_read_* functions scan the buffer backwards trying to find the actual textual length of the value they want to read? Apparently you (can) assume that the textual representation always starts at mark[0], so why not simply read digits until you encounter something else, and only return an error if a) no digits were found or b) that "something else" != AR_DEC_PADDING? You could probably also simply do sscanf(mark, "%d ", &dec); (should return 1 on success).
slbt_ar_read_*
mark[0]
AR_DEC_PADDING
sscanf(mark, "%d ", &dec);
Through hardware, possibly :european_post_office:
(lldb) p *arhdr (ar_raw_file_header) $82 = { ar_file_id = { [0] = '1' [1] = '/' [2] = '2' [3] = '8' [4] = ' ' [5] = ' ' [6] = ' ' [7] = ' ' [8] = ' ' [9] = ' ' [10] = ' ' [11] = ' ' [12] = ' ' [13] = ' ' [14] = ' ' [15] = '0' }
Looks like a potential culprit: 1/28 with a trailing ar_file_id[15] = 0, which differs from both sysv and bsd4.4
ar_file_id[15] = 0
Can you upload the corresponding .a file? The files you've uploaded do not exhibit the above and are successfully parsed by slibtool-ar, at least on m end.
No idea which file that is. For some reason this issue also doesn't occur with slibtool-ar on my end, and I still haven't figured out how to obtain the name of the file that slbt_get_archive_meta works on. It has to be either the 2 I attached, or the result of their merging (but which is that??)
I've been exploring my idea of parsing ar_file_size differently:
ar_file_size
static int slbt_ar_read_decimal_64(const char * mark, int len, uint64_t * dec) { int i; uint64_t res; for (i=0,res=0; i<len; i++) { if ((mark[i] >= '0') && (mark[i] <= '9')) { res *= 10; res += (mark[i] - '0'); } else { break; } } // fprintf(stderr, "mark=<%s>[%d], i=%d, mark[i]='%c' res=%lu\n", mark, len, i, mark[i], res); if (mark[i] != AR_DEC_PADDING) { return -1; } *dec = res; return 0; }
With that, slibtool-ar -Wverbose -Wprint produces the same output and my mystery archive passes the 1st test. It now bails at (the new) line 520 because *ch++ != AR_DEC_PADDING. That one is going to be a little less easy for me to figure out...
slibtool-ar -Wverbose -Wprint
*ch++ != AR_DEC_PADDING
Yep, it's indeed libdbus-testutils.a. The offending ar_file_id:
libdbus-testutils.a
ar_file_id
(lldb) p arhdr->ar_file_id (char [16]) $3 = "1/28 0 0 0 644 6140 `\ndisable-crash-handling.o"
and
> otool -a dbus-mp9-work/dbus-1.14.6-i386//test/.libs/libdbus-testutils.a Archive : dbus-mp9-work/dbus-1.14.6-i386//test/.libs/libdbus-testutils.a 00 0/0 860 0 #1/12 0644 0/0 6140 0 #1/28 0644 0/0 52716 0 #1/12 > otool -v -a dbus-mp9-work/dbus-1.14.6-i386//test/.libs/libdbus-testutils.a Archive : dbus-mp9-work/dbus-1.14.6-i386//test/.libs/libdbus-testutils.a ---------- 0/0 860 Thu Jan 1 01:00:00 1970 __.SYMDEF -rw-r--r-- 0/0 6140 Thu Jan 1 01:00:00 1970 disable-crash-handling.o -rw-r--r-- 0/0 52716 Thu Jan 1 01:00:00 1970 test-utils.o
With my modified algorithm I'm getting:
> slibtool-ar -Wverbose -Wprint dbus-mp9-work/dbus-1.14.6-i386//test/.libs/libdbus-testutils.a Archive: - Meta: - [ name: dbus-mp9-work/dbus-1.14.6-i386//test/.libs/libdbus-testutils.a ] - Members: - Member: - [ name: disable-crash-handling.o ] - [ timestamp: 1970/01/01 @ 01:00 ] - [ filesize: 6112 ] - [ uid: 0 ] - [ gid: 0 ] - [ mode: 420 ] - Member: - [ name: test-utils.o ] - [ timestamp: 1970/01/01 @ 01:00 ] - [ filesize: 52704 ] - [ uid: 0 ] - [ gid: 0 ] - [ mode: 420 ]
Does that correspond to what you're seeing?
00 0/0 860 0 #1/12 0644 0/0 6140 0 #1/28 0644 0/0 52716 0 #1/12
sigh lldb seems to be messing up the output, possibly by trying to be smart about #1.
The above output from your otool command shows that the binary file is actually fine, with #1/28 as the file id. And indeed, if this hadn't been the case then slibtool-ar -Wcheck would have failed.
otool
#1/28
Moving forward: in the lldb command-line, where you previously had p *arhdr, please try p /x *arhdr instead and we'll take things from there ...
p *arhdr
p /x *arhdr
Actually, I have to belay my previous message. It's not the archive I thought. It turns out that slbt_get_archive_meta is called THREE times, once on libdbus-1.a, once on libdbus-testutils.a, and once on a file I cannot seem to find. That last time, I have
(lldb) p ((char*)archive->map_addr) (char *) $121 = 0x000000010120e000 "!<arch>\n#1/12 0 0 0 33731 `\n__.SYMDEF"
and I don't see that size (33731) in the otool output for any of the archives in my build directory. As far as I can see, slibtool also doesn't have any other archives open.
Is that mystery archive the in-memory one I've seen referenced?
please try p /x *arhdr instead and we'll take things from there ...
That looks like a gdb'ism that my version of lldb doesn't support. What is it supposed to do?
FWIW, the first file of that mystery archive does show correctly, no trying to be clever here:
(lldb) p *arhdr (ar_raw_file_header) $122 = { ar_file_id = { [0] = '#' [1] = '1' [2] = '/' [3] = '1' [4] = '2' [5] = ' ' [6] = ' ' [7] = ' ' [8] = ' ' [9] = ' ' [10] = ' ' [11] = ' ' [12] = ' ' [13] = ' ' [14] = ' ' [15] = ' ' } <snip>
I've been wondering if the missing hash couldn't be the result of a previous non-standard/unexpected difference that leads the entire buffer to be left-shifted 1 char. I'm also seeing
ar_end_tag = { [0] = '\n' [1] = 'd' }
in the offending arhdr struct. All the correct ones end with a
ar_end_tag = { [0] = '`' [1] = '\n' }
Not a mystery, just in-memory :crocodile: to avoid file system races, the merged archive first exists only in memory, and is then saved to disk via slbt_store_archive().
The last step of the merge operation is to parse the newly created archive, which is why you saw slbt_get_archive_meta() being called three times.
I'll try merging the two archives that you uploaded earlier (libdbus-testutils.a, libdbus-1.a) and hopefully will be seeing the exact same error that you had encountered:-)
I've been wondering if the missing hash couldn't be the result of a previous non-standard/unexpected difference that leads the entire buffer to be left-shifted 1 char. I'm also seeing ar_end_tag = { [0] = '\n' [1] = 'd' } in the offending arhdr struct. All the correct ones end with a ar_end_tag = { [0] = '`' [1] = '\n' }
Awesome. I almost had the same thought but needed to have it come from someone else:-) To be continued ...
Reproduced.
Some more fodder, the callers for each of the 3 slbt_get_archive_meta invocations:
frame #0: 0x000000010000456a rlibtool`slbt_get_archive_meta(dctx=0x0000000100802c88, archive=0x0000000100103d60, meta=0x0000000100103d70) at slbt_archive_meta.c:353 350 memset(hdrinfobuf,0,sizeof(hdrinfobuf)); 351 352 mark = archive->map_addr; -> 353 cap = &mark[archive->map_size]; 354 355 /* preliminary validation */ 356 if (archive->map_size < sizeof(struct ar_raw_signature)) (lldb) up frame #1: 0x0000000100001b11 rlibtool`slbt_get_archive_ctx(dctx=0x0000000100802c88, path=".libs/libdbus-testutils.a", pctx=0x00007fff5fbfbd00) at slbt_archive_ctx.c:83 80 return slbt_free_archive_ctx_impl(ctx, 81 SLBT_NESTED_ERROR(dctx)); 82 -> 83 if (slbt_get_archive_meta(dctx,&ctx->map,&ctx->meta)) 84 return slbt_free_archive_ctx_impl(ctx, 85 SLBT_NESTED_ERROR(dctx)); 86
frame #0: 0x000000010000456a rlibtool`slbt_get_archive_meta(dctx=0x0000000100802c88, archive=0x0000000100104290, meta=0x00000001001042a0) at slbt_archive_meta.c:353 350 memset(hdrinfobuf,0,sizeof(hdrinfobuf)); 351 352 mark = archive->map_addr; -> 353 cap = &mark[archive->map_size]; 354 355 /* preliminary validation */ 356 if (archive->map_size < sizeof(struct ar_raw_signature)) (lldb) up frame #1: 0x0000000100001b11 rlibtool`slbt_get_archive_ctx(dctx=0x0000000100802c88, path="../dbus/.libs/libdbus-1.a", pctx=0x00007fff5fbfbd08) at slbt_archive_ctx.c:83 80 return slbt_free_archive_ctx_impl(ctx, 81 SLBT_NESTED_ERROR(dctx)); 82 -> 83 if (slbt_get_archive_meta(dctx,&ctx->map,&ctx->meta)) 84 return slbt_free_archive_ctx_impl(ctx, 85 SLBT_NESTED_ERROR(dctx)); 86
frame #0: 0x000000010000456a rlibtool`slbt_get_archive_meta(dctx=0x0000000100802c88, archive=0x00000001001047e0, meta=0x00000001001047f0) at slbt_archive_meta.c:353 350 memset(hdrinfobuf,0,sizeof(hdrinfobuf)); 351 352 mark = archive->map_addr; -> 353 cap = &mark[archive->map_size]; 354 355 /* preliminary validation */ 356 if (archive->map_size < sizeof(struct ar_raw_signature)) (lldb) up frame #1: 0x0000000100003fe8 rlibtool`slbt_merge_archives(arctxv=0x00007fff5fbfbd00, arctxm=0x00007fff5fbfbcc8) at slbt_archive_merge.c:680 677 struct slbt_archive_ctx_impl * ictx; 678 ictx = slbt_get_archive_ictx(arctx); 679 -> 680 if (slbt_get_archive_meta(dctx,arctx->map,&ictx->meta) < 0) 681 return slbt_merge_archives_fail( 682 arctx,0,0, 683 SLBT_NESTED_ERROR(dctx));
So yeah, I guess slibtool is failing on its own in-memory archive?!
We should stop crossing posts like they were rapiers ^^
Anyway, my cats have been making their presence known for a while now; time for me to call it a day as far as this is concerned!
We should stop crossing posts like they were rapiers ^^ Anyway, my cats have been making their presence known for a while now; time for me to call it a day as far as this is concerned!
cat(1) one should always be made a priority :=)
Fixed in commit 32c2302. Please confirm!
You must scan the entire byte array, and so it makes sense to test only for space-padding end-to-beginning, and then digits only beginning-to-detemrined-end. Think of: - "123 123" - "01234 junk" - etc.
You must scan the entire byte array, and so it makes sense to test only for space-padding end-to-beginning, and then digits only beginning-to-detemrined-end.
I still don't see it, and I know that'll keep turning it my head if I don't react to this now...
The strings (byte arrays) I've seen contain either a number followed by whitespace and then some other content, or start with whitespace. I don't think I've seen any string that has only whitespace after the number we want to read, and your examples don't either. Sticking with those examples, if the _read function receives len==strlen(mark) on entry I don't see how the scan-from-end loop would decrease len. Nor how these examples would NOT cause an error to be returned, or why you'd want to scan the entire string. Because if so, what value should be returned for "123 123"?
len==strlen(mark)
len
If valid byte arrays always the number followed by spaces and some other content, my implementation should always return that number with less overhead. But maybe you can give a real-world example where it would go wrong?
Keep in mind that the purpose of the function is not just to "extract" the number from the byte array, but also to validate the byte array in its entirety. Accordingly, "123 123" should fail; and anything not having only 0x20 (space) after the last digit should fail.
And that's why I said that you'd have to scan the entire byte array anyway, which means two loops: one that scans all the digits, and another one that verifies that everything following the last digit is just the expected space padding. I found it more elegant to implement the scanning starting at the end, but this could of course be implemented the other way around. The most important thing is to validate all bytes, not just the first 0x20 following the last digit.
Keep in mind that the purpose of the function is not just to "extract" the number from the byte array, but also to validate the byte array in its entirety.
Ah! See, I had no way of knowing that (unless I missed a comment in or above another _read function) ...
Out of curiosity, how? On a Mac? I'm not on mine until later today but have access to the files so tried to reproduce the issue on Linux by executing the same command using clang's --target cross-building feature, and didn't get any errors.
--target
Fix confirmed but see #57.
Metadata Update from @RJVB: - Issue status updated to: Closed (was: Open)
Reproduced. Out of curiosity, how? On a Mac? I'm not on mine until later today but have access to the files so tried to reproduce the issue on Linux by executing the same command using clang's --target cross-building feature, and didn't get any errors.
At the core, the problem was not osx-specific. The bug would only manifest itself when size of the string table was odd, and that did happen only on osx (I guess on other platforms ar(1) automatically aligns the string table at a 2-byte boundary).
The slibtool-ar code did take alignment into account everywhere else, and not aligning the size of the string table was nothing but an oversight. Glad we ran into this!
Login to comment on this ticket.