Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertions with new linker #7

Open
th-otto opened this issue Sep 17, 2023 · 11 comments
Open

Assertions with new linker #7

th-otto opened this issue Sep 17, 2023 · 11 comments

Comments

@th-otto
Copy link

th-otto commented Sep 17, 2023

When trying to link dxx-rebirth, i get some assertions currently:

ld: assertion fail bfd/elf32-atariprg.c:763
ld: assertion fail bfd/elf32-atariprg.c:763
ld: assertion fail bfd/elf32-atariprg.c:763
ld: assertion fail bfd/elf32-atariprg.c:763
ld: assertion fail bfd/elf32-atariprg.c:763
ld: assertion fail bfd/elf32-atariprg.c:763

The assertion is from

BFD_ASSERT (diff > 0); /* No backward relocation. */

The reason is that there seem to be two or more relocations for the same address (so diff == 0 and not > 0). Any idea how this can happen?

BTW @mikrosk : how did you manage to compile dxx-rebirth with our gcc? I had several issues with it, due to missing physfs which i had to port first, end because the Sconstruct configuration was not able to locate libpng etc. for a cross-compiler. Also at some places, it tried to include <GL/gl.h> even if i disable use of opengl.

Edit: also strange: although the BFD_ASSERT triggers, the linker still exits with code 0?

@th-otto
Copy link
Author

th-otto commented Sep 17, 2023

A fix for the exit code has been pushed. But the actual problem still remains.

@th-otto
Copy link
Author

th-otto commented Sep 17, 2023

Apropos relocations: what about the relocations in debug info? From GEMDOS point of view, they are part of the symbols, and therefor not loaded. Wouldn't that mean we have to apply them when gdb loads the debug info?

@th-otto
Copy link
Author

th-otto commented Sep 17, 2023

Added some debug printfs, but only found the following (first value is r_offset from the entry, 2nd is calculated relocation address):

reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000014 003f2130
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000028 003f2144
reloc: build/similar/3d/.d1x-rebirth.interp.o: 000001bc 003f22d8
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000318 003f2434
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000324 003f2440
reloc: build/similar/3d/.d1x-rebirth.interp.o: 000003a4 003f24c0
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000450 003f256c
reloc: build/similar/3d/.d1x-rebirth.interp.o: 000006b4 003f27d0
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000900 003f2a1c
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000978 003f2a94
reloc: build/similar/3d/.d1x-rebirth.interp.o: 000009b8 003f2ad4
reloc: build/similar/3d/.d1x-rebirth.interp.o: 000009dc 003f2af8
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000a2c 003f2b48
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000a7c 003f2b98
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000b4c 003f2c68
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000c54 003f2d70
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000c98 003f2db4
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000ce8 003f2e04
reloc: build/similar/3d/.d1x-rebirth.interp.o: 00000d24 003f2e40

reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 00000014 003f2e20
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 00000028 003f2e34
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 00000034 003f2e40
**build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: duplicate relocation at 0x003f2e40**
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 000000d0 003f2edc
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 00000160 003f2f6c
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 0000016c 003f2f78
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 000001c0 003f2fcc
reloc: build/similar/arch/sdl/.d1x-rebirth.digi_audio.o: 000001f4 003f3000

Comparing that to output of objdump -r, it looks like both values are from .eh_frame sections:

build/similar/3d/.d1x-rebirth.interp.o:     file format elf32-m68k

RELOCATION RECORDS FOR [.eh_frame]:
OFFSET   TYPE              VALUE
00000014 R_68K_32          __gxx_personality_v0
00000028 R_68K_32          .text
000001bc R_68K_32          .text+0x000001e6
00000318 R_68K_32          .text.unlikely._ZN16interpreter_base10op_defaultEjPKh
00000324 R_68K_32          .gcc_except_table._ZN16interpreter_base10op_defaultEjPKh
000003a4 R_68K_32          .text+0x0000037c
00000450 R_68K_32          .text+0x00000520
000006b4 R_68K_32          .text+0x00000922
00000900 R_68K_32          .text+0x00000c96
00000978 R_68K_32          .text+0x00001144
000009b8 R_68K_32          .text+0x0000116e
000009dc R_68K_32          .text+0x0000118c
00000a2c R_68K_32          .text+0x000011e2
00000a7c R_68K_32          .text.unlikely._ZN19partial_range_error6reportILm256EEEvPKcjS2_S2_mjm
00000a88 R_68K_32          .gcc_except_table._ZN19partial_range_error6reportILm256EEEvPKcjS2_S2_mjm
00000b4c R_68K_32          .text+0x0000121e
00000c54 R_68K_32          .text+0x000012f8
00000c98 R_68K_32          .text+0x00001720
00000ce8 R_68K_32          .text+0x0000178c
00000d24 R_68K_32          .text+0x00001bce

build/similar/arch/sdl/.d1x-rebirth.digi_audio.o:     file format elf32-m68k

RELOCATION RECORDS FOR [.eh_frame]:
OFFSET   TYPE              VALUE
00000014 R_68K_32          __gxx_personality_v0
00000028 R_68K_32          .text
00000034 R_68K_32          .gcc_except_table
000000d0 R_68K_32          .text+0x0000013e
00000160 R_68K_32          .text+0x00000286
0000016c R_68K_32          .gcc_except_table+0x00000013
000001c0 R_68K_32          .text+0x00000416
000001f4 R_68K_32          .text+0x0000047c

Any idea what's going wrong?

Edit: forgot to mention: compiler was configured to use dwarf exception unwind info.

Edit2: however the assertion also triggers with Vincents toolchain.

@vinriviere
Copy link
Member

The reason is that there seem to be two or more relocations for the same address (so diff == 0 and not > 0). Any idea how this can happen?

Oh, strange. I used those assertions much, for situations which shouldn't happen in the real world. It's generally rather obscure in the linker code to determine if a situation could happen or not. But if it does happen, then of course the right fix is to add a runtime check. As you did.

But I still wonder how it can happen. Normally, each address in loaded sections should only be loaded once.

Apropos relocations: what about the relocations in debug info? From GEMDOS point of view, they are part of the symbols, and therefor not loaded. Wouldn't that mean we have to apply them when gdb loads the debug info?

The sections are already relocated by gdb when the PRG is attached to the process. I added specific code for that, like in gdb 5.x. I guess that also applies to DWARF-2 sections, but this must be checked.

@th-otto
Copy link
Author

th-otto commented Sep 18, 2023

I added specific code for that, like in gdb 5.x. I guess that also applies to DWARF-2 sections, but this must be checked.

Hm, i will have to check that. But i doubt that can work; the value that has to be added for absolute relocations is that of the real text start in physical memory, something that gdb does not know about. I guess GDB only adds the VMA for that address. Especially interesting when the program is run by the gdbserver, but the debug information is loaded by the gdb running on the host.

But currently the issue with duplicate relocations is more important, because it causes link errors. I get the impression that it might also have to do with weak symbols, something that seems to be generated quite a lot from c++ templates.

@vinriviere
Copy link
Member

Hm, i will have to check that. But i doubt that can work; the value that has to be added for absolute relocations is that of the real text start in physical memory, something that gdb does not know about.

The relocation code is in mintelf_nat_target::child_initialize():

mintelf_nat_target::child_initialize (pid_t pid)

It calls ptrace (PT_BASEPAGE) to get the process the basepage address, then calls objfile_relocate() to do the actual relocation job. From my tests, that worked well.

@th-otto
Copy link
Author

th-otto commented Sep 22, 2023

BTW, i get similar duplicate relocation errors when trying to compile a native gcc for m68k (with --target=m68k-atari-mintelf and --host=m68k-atari-mintelf)

@th-otto
Copy link
Author

th-otto commented Sep 26, 2023

I was able to reproduce it with a much simpler testcase:

#include <stdio.h>

void func_exception(int i)
{
        printf("in %s\n", __func__);
        if (i == 5)
                throw(i);
}

int main(void)
{
        int i;

        setvbuf(stderr, NULL, _IONBF, 0);
        printf("Hello, C++\n");
        try {
                for (i = 0; i < 10; i++)
                        func_exception(i);
        } catch (...)
        {
                printf("got exception\n");
        }
        return 0;
}

This will give one duplicate, and by adding some printfs, i sse that they are from pmem_type_info.o and pbase_type_info.o (both from libsupc++.a). My new theory: The problem is due to

/* Merge CIE_INF with NEW_CIE->CIE_INF. */
cie_inf->removed = 1;
cie_inf->u.cie.merged = 1;
cie_inf->u.cie.u.merged_with = new_cie->cie_inf;
if (cie_inf->u.cie.make_lsda_relative)
new_cie->cie_inf->u.cie.make_lsda_relative = 1;
. This will merge 2 CIE records in the output, but the TPA relocations for both inputs have already been added. Maybe adding the tpa relocations in m68k_elf32_atariprg_relocate_section is done too early, and must be postponed until the final link, when it is known that the input section is really used?

@th-otto
Copy link
Author

th-otto commented Sep 26, 2023

YES. This exactly was the reason. Fix for this: use _bfd_elf_section_offset to compute the offset of the relocation in the section, which takes that into account. Fix is pushed.

And the best thing about this: it also seems to fix freemint/m68k-atari-mint-gcc#23 . This makes sense, if the tpa relocation is applied to the wrong offset.

@vinriviere
Copy link
Member

Aha @th-otto, excellent 😃 It had been a pain to get correct TPA relocations even for the DATA segment. The mixture of input_section / output_section / vma / output_offset is really obscure. After struggling, my code seemed to be correct for any section. But you discovered the last trap: those special values -1 and -2 for section offset in the .eh_frame section!!

Really, it's incredible to see how it's complicated to add custom features such as TPA relocations to the linker. There are so much special cases. But this time, it's OK for .eh_frame sections. This should be enough for most usages. Until next issue.

@th-otto
Copy link
Author

th-otto commented Sep 27, 2023

Yes, that was really a pain. Hopefully this time it will cover all cases, as that function seems to be supposed to handle all special cases. Until GNU people decide to rework that ugly hack...

Actually i found that when adding debug output to m68k_elf32_atariprg_relocate_section, printing only the section-relative offset, and comparing that to objdump -r (after linking with --emit-relocs). Those relative offsets should be identical, and actually this was the case for all sections but .eh_frame.

th-otto pushed a commit that referenced this issue Aug 8, 2024
When running test-case gdb.server/connect-with-no-symbol-file.exp on
aarch64-linux (specifically, an opensuse leap 15.5 container on a
fedora asahi 39 system), I run into:
...
(gdb) detach^M
Detaching from program: target:connect-with-no-symbol-file, process 185104^M
Ending remote debugging.^M
terminate called after throwing an instance of 'gdb_exception_error'^M
...

The detailed backtrace of the corefile is:
...
 (gdb) bt
 #0  0x0000ffff75504f54 in raise () from /lib64/libpthread.so.0
 #1  0x00000000007a86b4 in handle_fatal_signal (sig=6)
     at gdb/event-top.c:926
 #2  <signal handler called>
 #3  0x0000ffff74b977b4 in raise () from /lib64/libc.so.6
 #4  0x0000ffff74b98c18 in abort () from /lib64/libc.so.6
 #5  0x0000ffff74ea26f4 in __gnu_cxx::__verbose_terminate_handler() ()
    from /usr/lib64/libstdc++.so.6
 #6  0x0000ffff74ea011c in ?? () from /usr/lib64/libstdc++.so.6
 #7  0x0000ffff74ea0180 in std::terminate() () from /usr/lib64/libstdc++.so.6
 #8  0x0000ffff74ea0464 in __cxa_throw () from /usr/lib64/libstdc++.so.6
 #9  0x0000000001548870 in throw_it (reason=RETURN_ERROR,
     error=TARGET_CLOSE_ERROR, fmt=0x16c7810 "Remote connection closed", ap=...)
     at gdbsupport/common-exceptions.cc:203
 #10 0x0000000001548920 in throw_verror (error=TARGET_CLOSE_ERROR,
     fmt=0x16c7810 "Remote connection closed", ap=...)
     at gdbsupport/common-exceptions.cc:211
 #11 0x0000000001548a00 in throw_error (error=TARGET_CLOSE_ERROR,
     fmt=0x16c7810 "Remote connection closed")
     at gdbsupport/common-exceptions.cc:226
 #12 0x0000000000ac8f2c in remote_target::readchar (this=0x233d3d90, timeout=2)
     at gdb/remote.c:9856
 #13 0x0000000000ac9f04 in remote_target::getpkt (this=0x233d3d90,
     buf=0x233d40a8, forever=false, is_notif=0x0) at gdb/remote.c:10326
 #14 0x0000000000acf3d0 in remote_target::remote_hostio_send_command
     (this=0x233d3d90, command_bytes=13, which_packet=17,
     remote_errno=0xfffff1a3cf38, attachment=0xfffff1a3ce88,
     attachment_len=0xfffff1a3ce90) at gdb/remote.c:12567
 #15 0x0000000000ad03bc in remote_target::fileio_fstat (this=0x233d3d90, fd=3,
     st=0xfffff1a3d020, remote_errno=0xfffff1a3cf38)
     at gdb/remote.c:12979
 #16 0x0000000000c39878 in target_fileio_fstat (fd=0, sb=0xfffff1a3d020,
     target_errno=0xfffff1a3cf38) at gdb/target.c:3315
 #17 0x00000000007eee5c in target_fileio_stream::stat (this=0x233d4400,
     abfd=0x2323fc40, sb=0xfffff1a3d020) at gdb/gdb_bfd.c:467
 #18 0x00000000007f012c in <lambda(bfd*, void*, stat*)>::operator()(bfd *,
     void *, stat *) const (__closure=0x0, abfd=0x2323fc40, stream=0x233d4400,
     sb=0xfffff1a3d020) at gdb/gdb_bfd.c:955
 #19 0x00000000007f015c in <lambda(bfd*, void*, stat*)>::_FUN(bfd *, void *,
     stat *) () at gdb/gdb_bfd.c:956
 #20 0x0000000000f9b838 in opncls_bstat (abfd=0x2323fc40, sb=0xfffff1a3d020)
     at bfd/opncls.c:665
 #21 0x0000000000f90adc in bfd_stat (abfd=0x2323fc40, statbuf=0xfffff1a3d020)
     at bfd/bfdio.c:431
 #22 0x000000000065fe20 in reopen_exec_file () at gdb/corefile.c:52
 #23 0x0000000000c3a3e8 in generic_mourn_inferior ()
     at gdb/target.c:3642
 #24 0x0000000000abf3f0 in remote_unpush_target (target=0x233d3d90)
     at gdb/remote.c:6067
 #25 0x0000000000aca8b0 in remote_target::mourn_inferior (this=0x233d3d90)
     at gdb/remote.c:10587
 #26 0x0000000000c387cc in target_mourn_inferior (
     ptid=<error reading variable: Cannot access memory at address 0x2d310>)
     at gdb/target.c:2738
 #27 0x0000000000abfff0 in remote_target::remote_detach_1 (this=0x233d3d90,
     inf=0x22fce540, from_tty=1) at gdb/remote.c:6421
 #28 0x0000000000ac0094 in remote_target::detach (this=0x233d3d90,
     inf=0x22fce540, from_tty=1) at gdb/remote.c:6436
 #29 0x0000000000c37c3c in target_detach (inf=0x22fce540, from_tty=1)
     at gdb/target.c:2526
 #30 0x0000000000860424 in detach_command (args=0x0, from_tty=1)
    at gdb/infcmd.c:2817
 #31 0x000000000060b594 in do_simple_func (args=0x0, from_tty=1, c=0x231431a0)
     at gdb/cli/cli-decode.c:94
 #32 0x00000000006108c8 in cmd_func (cmd=0x231431a0, args=0x0, from_tty=1)
     at gdb/cli/cli-decode.c:2741
 #33 0x0000000000c65a94 in execute_command (p=0x232e52f6 "", from_tty=1)
     at gdb/top.c:570
 #34 0x00000000007a7d2c in command_handler (command=0x232e52f0 "")
     at gdb/event-top.c:566
 #35 0x00000000007a8290 in command_line_handler (rl=...)
     at gdb/event-top.c:802
 #36 0x0000000000c9092c in tui_command_line_handler (rl=...)
     at gdb/tui/tui-interp.c:103
 #37 0x00000000007a750c in gdb_rl_callback_handler (rl=0x23385330 "detach")
     at gdb/event-top.c:258
 #38 0x0000000000d910f4 in rl_callback_read_char ()
     at readline/readline/callback.c:290
 #39 0x00000000007a7338 in gdb_rl_callback_read_char_wrapper_noexcept ()
     at gdb/event-top.c:194
 #40 0x00000000007a73f0 in gdb_rl_callback_read_char_wrapper
     (client_data=0x22fbf640) at gdb/event-top.c:233
 #41 0x0000000000cbee1c in stdin_event_handler (error=0, client_data=0x22fbf640)
     at gdb/ui.c:154
 #42 0x000000000154ed60 in handle_file_event (file_ptr=0x232be730, ready_mask=1)
     at gdbsupport/event-loop.cc:572
 #43 0x000000000154f21c in gdb_wait_for_event (block=1)
     at gdbsupport/event-loop.cc:693
 #44 0x000000000154dec4 in gdb_do_one_event (mstimeout=-1)
    at gdbsupport/event-loop.cc:263
 #45 0x0000000000910f98 in start_event_loop () at gdb/main.c:400
 #46 0x0000000000911130 in captured_command_loop () at gdb/main.c:464
 #47 0x0000000000912b5c in captured_main (data=0xfffff1a3db58)
     at gdb/main.c:1338
 #48 0x0000000000912bf4 in gdb_main (args=0xfffff1a3db58)
     at gdb/main.c:1357
 #49 0x00000000004170f4 in main (argc=10, argv=0xfffff1a3dcc8)
     at gdb/gdb.c:38
 (gdb)
...

The abort happens because a c++ exception escapes to c code, specifically
opncls_bstat in bfd/opncls.c.  Compiling with -fexceptions works around this.

Fix this by catching the exception just before it escapes, in stat_trampoline
and likewise in few similar spot.

Add a new template catch_exceptions to do so in a consistent way.

Tested on aarch64-linux.

Approved-by: Pedro Alves <[email protected]>

PR remote/31577
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31577
th-otto pushed a commit that referenced this issue Aug 8, 2024
Since commit b1da98a ("gdb: remove use of alloca in
new_macro_definition"), if cached_argv is empty, we call macro_bcache
with a nullptr data.  This ends up caught by UBSan deep down in the
bcache code:

    $ ./gdb -nx -q --data-directory=data-directory  /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/macscp/macscp -readnow
    Reading symbols from /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/macscp/macscp...
    Expanding full symbols from /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/macscp/macscp...
    /home/smarchi/src/binutils-gdb/gdb/bcache.c:195:12: runtime error: null pointer passed as argument 2, which is declared to never be null

The backtrace:

    #1  0x00007ffff619a05d in __ubsan::__ubsan_handle_nonnull_arg_abort (Data=<optimized out>) at ../../../../src/libsanitizer/ubsan/ubsan_handlers.cpp:750
    #2  0x000055556337fba2 in gdb::bcache::insert (this=0x62d0000c8458, addr=0x0, length=0, added=0x0) at /home/smarchi/src/binutils-gdb/gdb/bcache.c:195
    #3  0x0000555564b49222 in gdb::bcache::insert<char const*, void> (this=0x62d0000c8458, addr=0x0, length=0, added=0x0) at /home/smarchi/src/binutils-gdb/gdb/bcache.h:158
    #4  0x0000555564b481fa in macro_bcache<char const*> (t=0x62100007ae70, addr=0x0, len=0) at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:117
    #5  0x0000555564b42b4a in new_macro_definition (t=0x62100007ae70, kind=macro_function_like, special_kind=macro_ordinary, argv=std::__debug::vector of length 0, capacity 0, replacement=0x62a00003af3a "__builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:573
    #6  0x0000555564b44674 in macro_define_internal (source=0x6210000ab9e0, line=469, name=0x7fffffffa710 "__va_arg_pack", kind=macro_function_like, special_kind=macro_ordinary, argv=std::__debug::vector of length 0, capacity 0, replacement=0x62a00003af3a "__builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:777
    #7  0x0000555564b44ae2 in macro_define_function (source=0x6210000ab9e0, line=469, name=0x7fffffffa710 "__va_arg_pack", argv=std::__debug::vector of length 0, capacity 0, replacement=0x62a00003af3a "__builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:816
    #8  0x0000555563f62fc8 in parse_macro_definition (file=0x6210000ab9e0, line=469, body=0x62a00003af2a "__va_arg_pack() __builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/dwarf2/macro.c:203

This can be reproduced by running gdb.base/macscp.exp.  Avoid calling
macro_bcache if the macro doesn't have any arguments.

Change-Id: I33b5a7c3b3a93d5adba98983fcaae9c8522c383d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants