-
-
Notifications
You must be signed in to change notification settings - Fork 774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug in CPU identification #1712
Comments
Hello fellow For Cortex-M parts, this isn't actually normally a problem in any way because only one part is allowed per AP due to the required memory layout on the debug bus (one AP is one debug bus), so the outer most table for that AP on a conforming implementation has the core's information and that's fine (inner tables should be the ARM-specific information such as what the core type is. This will always have designer_code == JEP106_MANUFACTUER_ARM, and the part_number will always be a core ID value such as 4c4 (M4), 4c7 (M7), 4c0 (M0), etc). For Cortex-A/R parts this is more of a problem (multiple cores per AP are ok), though a very similar thing happens with the sub-table designer and part codes - which is why this logic is how it is. For most parts it's the outer-most table that's Vendor-provided and defines their ID information. What we suspect you've actually hit here is that for DPv2 parts, we want to use the DP's information (from TARGETID), and it's this which is causing problems for you. NB: The AP information is from the ROM tables PIDR not the DP PIDR register. DP PIDR only exists for DPv2+ parts and is used only in the Here are some example tables from some parts on our desk showing how this generally winds up working:
|
ooo interesting--so it sounds like I'm hitting an issue which shouldn't exist for M7 parts? Here's the output when I run against my M7 system with the change I mentioned above in place (and some more debugging info):
Given your comments above and what I'm seeing, does that mean that my custom system is a non-conforming system in some way? Specifically, you say above that for an M7 I should see a partno of 0x4c7, but I see 0xc. |
Interesting that the ROM table header is missing from that - was this BMDA built using Anyway, yeah, those part numbers are all over the place - very strange! For a Cortex-M7 we'd expect something like this from the ROM tables:
(edit: please excuse our accidentally closing the issue while trying to copy that trace in) |
this was built from tip
Am I correct in saying that the root of my issues is that the first detected device has a manufacturer and partno of 0 ? |
Ok, so you're building from To answer the main question though - yes, looks like whoever's chip this is failed to configure the IDs in their part of the ROM tables, leaving them decidedly blank. Interesting that the second table in the set defines a would-be Cortex-M8 but is empty before finally defining a Cortex-M7 |
That doesn't matter, because I decoupled Line 611 in 8402360
diff --git a/src/target/adiv5.c b/src/target/adiv5.c
index 137162f8..962f0ded 100644
--- a/src/target/adiv5.c
+++ b/src/target/adiv5.c
@@ -608,7 +608,7 @@ static void adiv5_component_probe(
}
}
-#if ENABLE_DEBUG == 1 && defined(PLATFORM_HAS_DEBUG)
+#if ENABLE_DEBUG == 1
/* Check SYSMEM bit */
const uint32_t memtype = adiv5_mem_read32(ap, addr | ADIV5_ROM_MEMTYPE) & ADIV5_ROM_MEMTYPE_SYSMEM;
The To the topicstarter: visualising links between CoreSight discoverable ROM tables is crucial to verify that the implementer did it correctly and that debugger-projects like BMD (and OpenOCD etc.) can sensibly use this information. You can use a BMP for this. I saw a couple chips which have useless ROM tables and are impossible to discover dynamically, which is one of main features of BMD. |
Ah, ok, that explains it! Regarding the rest of your reply on that point - see the linked commit from our previous post, which is fast becoming our preferred way to fix these problems |
@dragonmux my probe is the one I bought from here: https://1bitsquared.com/collections/embedded-hardware/products/black-magic-probe. I'm able to build he debug app with
When running the debug app, everything works fine up until the app tries to erase FLASH (which our chip doesn't have) which fails. If I try to run the debug app again, I get:
and have to unplug and re-plug the probe to get the first output again. Why would running the debug app not work, but running the firmware on the probe work ? It seems like the debug app always tries to erase FLASH, and assumes that connected M7 chips have FLASH even if they don't register any, as is the case in my probe() function:
|
Ok, that's a fair bit to unpack, so.. here goes:
With regards the verbosity level, it's combining. The banner for all INFO-enabled verbosity combinations should look like:
Please do not hard code the verbosity level, and especially not to 127 - for most of your needs we're best served with a verbosity level of 13 or 29 (INFO + TARGET + PROTO [+ PROBE]).
What was the command line you used to attempt that? If you used the command-line Flash manipulation options like
The firmware can only be talked to via the GDB interface or its remote protocol. The only way for the probe itself to attempt to erase Flash of its own volition is if you ask for it to from GDB via |
Sorry for the long post! I'll keep this one short. The command I used to try and connect to our board via the BMDA was |
Ah, no, you're giving BMDA the .elf file name for the firmware so you're invoking Flash operations - specifically write mode, and BMDA's trying to byte-for-byte blast your target's (non-existent) Flash with the file's contents (BMDA doesn't understand what an ELF file is for now). This is the behaviour selected by doing If you just want to connect to the board (with or without verbosity flags) it's just |
DOH! i didn't think that providing the filename would implicitly put me into FLASH programming mode, but that makes sense. When I remove that argument and connect to BDMA via GDB, I'm able to connect to the board, and
so BDMA is trying to read the debug halt and status register over and over, and presumably failing? There's no error printed though, even at maximum verbosity. I have two questions:
|
The reads aren't failing, they're just not showing a halted processor - For a halted processor that's hit a breakpoint, we'd anticipate Your target is just never hitting the breakpoint you have set for some reason, or if it would, the breakpoint unit is not armed and so never triggering and making the target halt. Indeed, BMD is using the Cortex-M architecture implementation (cortexm.c) which provides all the bits for debug and halt, assuming working breakpoint units. |
Digging into what happens after
Yet when I do My application IS running as far as I can tell, but skips over the breakpoint if it is set (some of the time). Other times, if I Ctrl-C GDB after runs for few seconds, I see the same thing as above, which implies that the CPU isn't doing anything I think, because the PC hasn't moved. Even weirder, |
On the GDB side of things, you're looking for the z/Z packet documentation which describes how GDB requests we set a breakpoint. From there, BMD handles the packet in the state machine: Lines 355 to 360 in 476017e
Which hands off to a helper that vectors into the target layer: Lines 783 to 787 in 476017e
That winds up calling the Cortex-M functions to set up: blackmagic/src/target/cortexm.c Lines 1300 to 1319 in 476017e
and clear: blackmagic/src/target/cortexm.c Lines 1351 to 1354 in 476017e
breakpoints against the target's hardware FPB (Flash Patch and Breakpoint unit). The DWT (Data Watch and Trace unit) is similarly used for watchpoints. It is possible under some situations for BMD to use a software breakpoint using the Flash Patch part of the FPB, but usually it'll use a hardware breakpoint. We do not know what will happen if the FPB revision check fails and it winds up setting breakpoints wrong per the unit you actually have. Single-stepping uses the same machinery (DHCSR single-step bit then resume the processor and wait for halt again), so it is not surprising if one is broken that both are. |
Getting weirder: A breakpoint on
0xe000ef68 is the "Data cache clean by address" register, according to https://developer.arm.com/documentation/ddi0489/b/BABCIIIA, so if that's the right reference for the M7, then the request to set a breakpoint is not being handled properly (which is probably not BMDA's fault, but rather our non-conforming chip). It seems odd that the "m" command is being sent into BMDA on a breakpoint set request, as that maps to :
which has nothing to do with setting a breakpoint as far as I can tell. Can you shed some light? |
When you actually ask GDB to set a breakpoint it does things in two steps - the first is to read the target memory so it knows what instruction is going to get hit.. the second is to then, when you run
|
Ah got it. When I step through the code, the breakpoint type passed to If I manually change the type of the breakpoint request to
and i have to restart the GDB client to get a usable prompt back, and my code is no longer running/outputting anything. |
How interesting - that sounds like the Cortex-M code, when trying to inspect the breakpoints configuration of the target, is seeing no valid breakpoint slots and reporting 0 back to GDB.. in which case with no hardware breakpoints to use, GDB's only option is soft breakpoints. Need to think then on how to get software breakpoints done in cortexm_breakwatch_set() using Flash patching and the Edit: it would be very interesting if you could inspect what the FPB control value winds up reading as here: blackmagic/src/target/cortexm.c Lines 771 to 777 in 476017e
|
That register reads as 0x10000080, and the DWT_CTRL reg reads as 0x40000000. So there should be 8 breakpoints and 4 watchpoints in hw, which looks reasonable. AFAICT, the GDB client has no reason to request a sw breakpoint via the Z0 packet, instead of a hw breakpoint via a Z1 packet, but yet it does. If I
Which bizzarely says that sw breakpoints are not supported by the running BMDA? But yet it still sends a sw breakpoint request later??? If I run the usual connect, scan and attach sequence with a fresh GDB client with remote debug enabled, I get:
Which all looks OK to me. |
Have you declared any flash regions via BMD to GDB so that it understands that hardware breakpoints via FPB are preferred to overwriting insns in RAM (sw bp)? I see a 16 MiB plus something region at 0x0 in your declarations, but it's ram. |
No I haven't declared any flash regions; my probe() function is just this:
|
Please experiment with |
Think the clarification needed here is that GDB, not knowing you're trying to set a breakpoint in Flash as no Flash is defined, assumes it's working with RAM and does its RAM breakpoint routine - which, of course, doesn't work here because Flash is non-mutable without using the controller for it or Flash patching via the FPB. In regards the limitations on slots, @ALTracer - if we actually used the Flash Patch part of the unit, we can insert a breakpoint instruction and get the proper behaviour for any number of additional breakpoints if memory serves.. or at least some significant number more than the normal hardware breakpoints part of the unit provides. It's something we've been meaning to play with. |
Ok I think I understand how BMDA interacts with hw vs. sw breakpoints now--thanks! I can reliably hit a breakpoint on
After getting this msg, execution jumps to 0x0 (or maybe it jumping to 0x0 causes the segfault; not sure at this point), and trying to |
In adiv5.c, there is the following code on the latest tip:
which effectively results (I think) in the
ap->designer_code
andap->partno
being set to whatever is parsed out of the read PIDR register for the first thing that is probed on the DAP bus. For a cortex M7 system I'm working on, the actual CPU shows up as the 4th thing to be probed, resulting in 0 for the designer code and partno whencortexm_probe()
is called. In that function we have:and it just so happens that
ap->dp->version
is 1 for our M7, which results in atarget->designer_code
andtarget->part_id
of 0, making it impossible to call the correct probe function without inserting a hacky 0 case in the big switch() later in the function. Is this a bug, or am I not understanding something/using things correctly?FWIW, removing the above lines and inserting the following later in the
cortexm_probe()
function fixes the error and doesn't cause obvious problems, but I have no idea if it would break things elsewhere:The text was updated successfully, but these errors were encountered: