Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TinyCrypt deprecation - library's usage removal part 2 (bluetooth) #79931

Merged
merged 16 commits into from
Nov 27, 2024

Conversation

valeriosetti
Copy link
Collaborator

@valeriosetti valeriosetti commented Oct 16, 2024

This is the second step toward the removal of TinyCrypt library usage from the Zephyr's codebase. This is the continuation of #79566 and #79653.
In particular this PR takes care of removing TinyCrypt usage from the bluetooth subsystem.

To be done

Decide if we want to keep TinyCrypt in BT-mesh (as deprecated) or we want to completely remove it from the codebase.

@valeriosetti valeriosetti force-pushed the deprecate-tinycrypt-part2 branch 5 times, most recently from 9f3f671 to 98b97ea Compare October 18, 2024 04:30
@valeriosetti valeriosetti force-pushed the deprecate-tinycrypt-part2 branch 5 times, most recently from 946f924 to c9cf39d Compare October 23, 2024 08:26
@rettichschnidi
Copy link
Contributor

Any chance this will be ready for Zephyr 4.0?

@valeriosetti
Copy link
Collaborator Author

Any chance this will be ready for Zephyr 4.0?

As far as I know codefreeze is today for 4.0 and I still need to resolve some failures in this PR + it also depends on 2 other PRs which should be merged before this one. I'm sorry but I think the answer is "no"

@mmahadevan108
Copy link
Collaborator

@valeriosetti , do we need both Part 1 and Part 2 to be merged into 4.0.
Can we merge only Part 1 if Part 2 is not ready in time before the release?

@jhedberg
Copy link
Member

@valeriosetti , do we need both Part 1 and Part 2 to be merged into 4.0. Can we merge only Part 1 if Part 2 is not ready in time before the release?

The challenge as I see it is this part from https://docs.zephyrproject.org/latest/develop/api/api_lifecycle.html#deprecated

Code using the deprecated API needs to be modified to remove usage of said API

Which means that we can't make a release and still have code in the upstream tree using a deprecated API.

@valeriosetti
Copy link
Collaborator Author

@valeriosetti , do we need both Part 1 and Part 2 to be merged into 4.0. Can we merge only Part 1 if Part 2 is not ready in time before the release?

While part 1 is ready to be merged, part 2 absolutely not:

Which means that we can't make a release and still have code in the upstream tree using a deprecated API.

That's what I thought at the beginning of this activity as well, but then I've been asked to keep the following support for TinyCrypt in the codebase for at least the next 2 releases:

I'm not an expert of BT, but based on the number of changes I've done so far in this PR to make the CI happy, it seems that BT is way more affected by TinyCrypt removal than other subsystems. So I agree with @alxelax that giving the user some time to deal with TinyCrypt deprecation wouldn't be that bad. But this is my feeling, of course, so I'll be glad to hear what's reviewers opinion about this :)

This commit fixes
tests/bsim/bluetooth/ll/multiple_id/tests_scripts/multiple.sh.
The reason is that this test seems to depend on random number
sequence in order to have the proper scheduling.

It also fixes some AC tests by commenting out failing test
cases. The reason of the failure is still unknown and needs
to be investigated in future work.

Signed-off-by: Valerio Setti <[email protected]>
Increase test and main stack sizes for the qemu_cortex_m3 platform
in order to be able to successfully run the test.

Signed-off-by: Valerio Setti <[email protected]>
Moving from TinyCrypt to PSA Crypto API caused failures in the cpu_net
build due to RAM being overflowed. It turned out that 8192 bytes were
allocated for system heap memory, but Mbed TLS is the only user
of that memory (I found this though puncover) for AES purposes.
We reduce that to 1024 bytes because this should be enough for
this purpose.

Note: albeit this is also a standalone example, it's used extensively
in other samples/tests and babblesim, so a failure in building it
propagates in a lot of other failures.

Signed-off-by: Valerio Setti <[email protected]>
@valeriosetti
Copy link
Collaborator Author

@valeriosetti so what about the babblesim test failing? the others we can ignore! I am ready to merge this now

The BabbleSim test failure was not related to crypto and for that I got a fix from @cvinayak that I applied here.
As for other twister failures I saw that #82140 was merged (thanks!) and indeed I just rebased on main.

@nashif nashif merged commit 184c0f9 into zephyrproject-rtos:main Nov 27, 2024
40 checks passed
@fabiobaltieri
Copy link
Member

fabiobaltieri commented Nov 27, 2024

Hey this seems to be causing a pretty substantial increase in flash size usage on some bt applications, checking the hid example (west build -p -b nrf52dk/nrf52832 samples/bluetooth/peripheral_hids)

Memory region         Used Size  Region Size  %age Used
           FLASH:      181116 B       512 KB     34.55%
             RAM:       29240 B        64 KB     44.62%
        IDT_LIST:          0 GB        32 KB      0.00%

at 4b6683c

and

Memory region         Used Size  Region Size  %age Used
           FLASH:      203988 B       512 KB     38.91%
             RAM:       30904 B        64 KB     47.16%
        IDT_LIST:          0 GB        32 KB      0.00%

at 184c0f9

That's pretty substantial, I have few projects myself that are now out of flash, is there some magic option to take the usage back down? Mainly thinking something for the migration guide, 20k of flash is a lot

data instead of waiting for BT connection to be ready. This is helpful on
platforms where the BT controller owns the HW random generator and the application
processor needs to get random data before BT is fully enabled.
(:github:`79931`)
Copy link
Member

@fabiobaltieri fabiobaltieri Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think this belongs to here, the migration guide should be for things one would have to do to migrate, there's no instructions here, maybe it was meant to go in release notes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote it here because this was a change compared to the previous behavior. I mean, if the final user is already successfully using this driver then he won't notice any difference, but now they can use it also in other scenarios. I thought that release-notes was mostly for new features.
Perhaps I misinterpreted the difference between migration-guide and release-notes. Do you suggest to move it anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there's probably some space for interpretation but I think the migration guide should be for stuff where the user has to take an action, like change something on the application side. @kartben? Anyway I don't think it's super important. One thing you may mention though is that some bluetooth functionalities now depends on the mbedtls module rather than tinycrypt, this is useful if anyone is running a project with a name-allowlist filter project list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, perhaps you're right. I mentioned something below for bt-mesh, but BT in general is affected by this, not only mesh. I will prepare a follow-up PR to add documentation

@valeriosetti
Copy link
Collaborator Author

@fabiobaltieri yes, I've also been reported by @Emil-Gydesen-Bose on Discord that with the current configuration crypto operations take noticeably longer than with TinyCrypt on nrf53 (build without TF-M) and nrf52. Of course this was not catched by CI because tests themselves are fine, they just take longer.
For configurations without TF-M I'm planning to enable CONFIG_MBEDTLS_PSA_P256M_DRIVER_ENABLED which uses an optimized version of p256m in Mbed TLS which is much faster than the default software implementation. On a nrf52 this made bt init command move from 2.1 seconds to 0.46 seconds, which is comparable to the TinyCrypt performance. That Kconfig also reduced the final binary size.

Can you please try it also on your side? It would be nice to have another +1 from someone who's testing this in a real scenario

@kartben
Copy link
Collaborator

kartben commented Nov 28, 2024

@valeriosetti you're commenting on speed/performance but @fabiobaltieri was asking about flash size increase?

@valeriosetti
Copy link
Collaborator Author

@valeriosetti you're commenting on speed/performance but @fabiobaltieri was asking about flash size increase?

Actually in the comment there is also

That Kconfig also reduced the final binary size.

at the end of the paragraph :)

@Thalley
Copy link
Collaborator

Thalley commented Nov 28, 2024

@nashif this shouldn't have been merged IMO. For some reason the zephyrbot removed the DNM label #79931 (comment) and I didn't feel the need to block this PR because it had that label.

Anyways, there are 4 outstanding issues here:

  1. Without the increase in CONFIG_BT_LONG_WQ_STACK_SIZE all Bluetooth pairings on the nRF53 and nRF54 series will fail on target
  2. With the change to CONFIG_HEAP_MEM_POOL_SIZE (TinyCrypt deprecation - library's usage removal part 2 (bluetooth) #79931 (comment)) all bt_enable() on nRF5340 hardware will fail
  3. There's nearly a 10x increase in pairing time (takes up to 10 seconds) which is not only really bad, but will also lead to timeouts on tests on hardware as well
  4. There's a significant increase in memory as pointed out by TinyCrypt deprecation - library's usage removal part 2 (bluetooth) #79931 (comment)

Points 1. and 2. will prevent us from running Bluetooth on several boards and prevent us from passing a lot of Bluetooth PTS tests. Point 3 may also timeout a lot of those tests. Point 4. might stop several boards from being able to run Bluetooth.

I would suggest to either revert this PR and fix these things, or fix this fix with urgent care.
I would also suggest that we start running some Bluetooth tests on hardware in CI. @sjanc Do you think we can work towards running just a few PTS tests in CI?

@jhedberg
Copy link
Member

@Thalley considering how long this PR had been open, and those issues hadn't been fixed, I have doubts that any of those would have been fixed in a timely manner. Now the merging of this PR at least works as a forcing function to get those issues fixed with urgency. It was still the right move not to merge this PR right before the 4.0 release, since then we'd have had a release with such breakage. Now that breakage is at least on the main branch, which is expected to be unstable at times during the merge window being open.

@valeriosetti
Copy link
Collaborator Author

valeriosetti commented Nov 28, 2024

I would suggest to either revert this PR and fix these things, or fix this fix with urgent care.

FWIW I'm working on a follow up PR that fixes (1), (3) and likely also noticeably reduce (4). I will post it later today.

For (2) I can revert that specific commit and see what the CI thinks about it. But I would keep this one as separate PR compared the the one I mentioned above.

@Thalley
Copy link
Collaborator

Thalley commented Nov 28, 2024

@Thalley considering how long this PR had been open, and those issues hadn't been fixed, I have doubts that any of those would have been fixed in a timely manner. Now the merging of this PR at least works as a forcing function to get those issues fixed with urgency. It was still the right move not to merge this PR right before the 4.0 release, since then we'd have had a release with such breakage. Now that breakage is at least on the main branch, which is expected to be unstable at times during the merge window being open.

You are right :) I should also have marked those issues as a change request to this PR.

I'm still curious as to why the zephyrbot removed the DNM label autonomously, although I assume it's because it added the label itself and then whatever caused the label to be added was fixed and then the label was removed.

@fabiobaltieri
Copy link
Member

fabiobaltieri commented Nov 28, 2024

@valeriosetti yeah CONFIG_MBEDTLS_PSA_P256M_DRIVER_ENABLED seems to give some substantial savings, here's the numbers I'm seeing on my test application and with west build -t rom_report:

  • tinycrypt: total 266080, tinycrypt modlue 5826
  • mbedtls default: total 289000, mbedtls module 27736
  • CONFIG_MBEDTLS_PSA_P256M_DRIVER_ENABLED=y: total 276436, mbedtls module: 15166

So the modules goes from roughly x5 to x3 the size, which is still a lot but way better. Flashed the firmware on my device, seems to connect and work fine but you folks probably want to do some better testing.

@Thalley the bot kicks in on west.yml changes and adds/removes the dnm tags depending on few conditions (if the hash is valid mainly but there's some more now), so if a PR has a manifest change you can't control that manually really. Maybe we should have a DNM tag for the bot and one for manual use.

If you want to dig into it you can open the Manifest workflow logs ("View details" on the merge line and "Details" on manifest, brings you here) and then expand on the "Manifest" step, you'll find a:

Found revision bceac6cdfccb41ef4e289b9dca17daad48cda270 in branch main
...
removing label DNM
Unable to remove DNM label

(I presume the "unable" error is there because it was already removed, guess that's a bug but yeah it tries every time)

@valeriosetti
Copy link
Collaborator Author

So the modules goes from roughly x5 to x3 the size, which is still a lot but way better. Flashed the firmware on my device, seems to connect and work fine but you folks probably want to do some better testing

Yeah, this increase is a known effect of the switch to PSA Crypto provided through Mbed TLS. As discussed in the PR that introduced the P256-M driver, PSA Crypto adds key management code (and some wrappers) that TinyCrypt does not have and which cause the increase of ROM code.

If it were only for EC support, then there would be MBEDTLS_PSA_P256M_DRIVER_RAW which gives you direct access to the P256-M driver (so no key management or extra-wrappers) and that would give you even smaller footprint than TinyCrypt. But this comes with a price: you would not use standard PSA Crypto API calls, but those of the driver instead + you need to manage the keys on your own. However this solution:

  • is wrong for platforms that have TF-M support, because in that case PSA Crypto must provided by the TF-M code, not Mbed TLS [1]
  • letting the user managing key storage is not the best because it depends on the user's knowledge about the topic. Now Mbed TLS's PSA Crypto core can make use of secure storage (introduced in 3.7) which allows key storing to be handled properly and directly from the PSA Crypto core.

However BT also uses PSA Crypto for AES-CMAC and for that one there's no "direct access" to the low level implementation, so there's no way to exclude the "core" part of PSA Crypto from the code.

[1]: it's worth mentioning IMO that on platforms with TF-M, the switch to PSA Crypto API from TinyCrypt reduced ROM footprint because in that case we don't have a duplicated crypto library also in the non-secure side.

@fabiobaltieri
Copy link
Member

@valeriosetti thanks for the detailed answer that's quite something to digest, I won't pretend I understood half of that :-) but happy enough that you folks looked into it and decided it's working as intended.

@msmttchr
Copy link
Contributor

msmttchr commented Dec 2, 2024

Sorry for late comment, but we discovered that this PR breaks compilation in most of the BlueNRG based platforms. In particular, is it a correct assumption that now entropy driver is mandatory for platforms supporting BLE?
I think it’s also not desirable the 3x memory increase, especially in platforms with limited amount of flash, not to speak about the performance penalties.
I guess there is no way to get a solution with similar performance of deprecated tinycrypt?

@jhedberg
Copy link
Member

jhedberg commented Dec 2, 2024

@msmttchr #82217 should mitigate at least some of the memory overhead issues. As for the entropy requirement, that seems to indeed be the case. Some HCI drivers have opted to just select ENTROPY_GENERATOR, however there's also a more high level solution being worked on over here: #82363

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.