Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Move logger/trace (and debug tools?) to other repository #2480

Closed
perexg opened this issue Mar 3, 2020 · 40 comments
Closed

[FEATURE] Move logger/trace (and debug tools?) to other repository #2480

perexg opened this issue Mar 3, 2020 · 40 comments
Labels
enhancement New feature or request logger Dictionary and logger
Milestone

Comments

@perexg
Copy link

perexg commented Mar 3, 2020

Actually, the SOF repository is "all in one" now. While we have the sof-bin repository now, the debug tools which are supposed to be used by the end users for the problem analysis like sof-logger should be moved to a separate repository. The code should be compatible with all SOF firmware/driver releases.

@perexg perexg added the enhancement New feature or request label Mar 3, 2020
@jajanusz
Copy link
Contributor

jajanusz commented Mar 4, 2020

We had it like that previously but TBH I don't remember what was the issue - probably something with keeping headers shared by tools and FW aligned (ABI changes).

@jajanusz
Copy link
Contributor

jajanusz commented Mar 4, 2020

Anyway splitting is always good and we should do this.
It could be problematic because of how tools and FW code was coupled, but with new build-system and some refactor it shouldn't be that big issue.
FW doesn't have hard dependency on these tools, the tools depend on some headers from FW and in case of f.e. testbench some runtime code - but it can be exported as .so lib that this tool can use.

@perexg
Copy link
Author

perexg commented Mar 4, 2020

As I wrote, those tools must support all ABI versions for the released (tagged) firmware files.

@lgirdwood lgirdwood added this to the v1.6 milestone Mar 4, 2020
@lgirdwood
Copy link
Member

@xiulipan fyi. This will need to be added into your release scripting.

@lgirdwood
Copy link
Member

@xiulipan any update here ? I think we have all the topologies and LDCs now but not the tools ?

@xiulipan
Copy link
Contributor

@lgirdwood Good point here for sof-bin to include the tools., but I am not sure which version tools should I put into the repo. @jajanusz Can we include the tools binary with release also?

@lgirdwood
Copy link
Member

@xiulipan ideally we would have our makefile

  1. Checkout tools based on tag
  2. build tools (alongside topologies)
  3. build debian package and rpm with tools, ldcs, topologies and FW binaries,
    See
    https://askubuntu.com/questions/1130558/how-to-build-deb-package-for-ubuntu-18-04
    https://packaging.ubuntu.com/html/packaging-new-software.html
    and for rpms
    https://docs.fedoraproject.org/en-US/quick-docs/creating-rpm-packages/

It looks like for both we need to have a config file that the packaging tools will use to build and pull in the binaries. Btw, this could be done in two containers (one ubuntu and the other fedora) if you have the time (as it means anyone could then build the packages).

@perexg
Copy link
Author

perexg commented Aug 20, 2020

@lgirdwood : It diverges from my request to put the logger tool code to another repository and maintain the compatibility with all known (previous) SOF releases (one binary should work with all released SOF firmware files). In this case, distributions can maintain this tool in a package themselves. Distributions have already a package for SOF firmware (sof-bin) and I would not mix the tools with firmware.

@lgirdwood
Copy link
Member

@perexg ok, your right -
@mengdonglin this may be a few days effort for @xiulipan . Do you think we can do for v1,6 ?

@xiulipan
Copy link
Contributor

@lgirdwood Here is my understanding for the request here:

  • tools binary should go with sof-bin repo (we can add them in 1.6)
  • tools source code should be in standalone repo. (if that means we should get soft back? Then we will face the issue why we drop soft: how to sync the headers with ABI change) I think we can have some discussion about this in next tech talk

@lgirdwood
Copy link
Member

@xiulipan I think @perexg is asking that the sof tooling (rimage, sof-logger etc - everything that builds a binary) goes in the rimage repo and is then distrubuted indipendently from the FW. This way distro folks can pick up the new repo and build for thier targets.
@perexg how do you recommend dealing with the header dependcy the tools have on the FW repo ? The FW headers are used by most of the tools, so anyone building tools would also need the FW headers (note FW does not need to be built to generate any of the headers).

@mengdonglin mengdonglin modified the milestones: v1.6, v1.7 Sep 2, 2020
@lgirdwood
Copy link
Member

@marc-hb @xiulipan fyi.

@xiulipan
Copy link
Contributor

@lgirdwood What is the plan now? Move tools into rimage repo? or soft (sof-tools) repo again?

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 13, 2021

I think answering ABI questions (including your own) has to come first.

@perexg
Copy link
Author

perexg commented Jan 13, 2021

I think that it's pretty easy to include all header versions and use the appropriate code when a certain firmware is detected at runtime. It's a traditional way, how all *nix tools works (backward compatibility). The rimage is used for the fw building, right? I would create another repo for the user space tools which are supposed to be used by users (not developers).

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 13, 2021

Let me try to break this down and please correct me:

  1. Make sure the (native) logger can be built very easily and independently = without building the (cross-compiled) firmware
  2. Make sure the logger and other tools are backward compatible with old firmware and continuously validate that.
  3. Move the logger etc. to a resurrected SOFT git repo

The title and description mention only 3 but attempting 3 before 1 and 2 would just get things back to the initial problems that were solved by moving everything into a single repo. I imagine the main problem was users constantly asking why logger version X crashes with firmware version Y?

The main problem with 1. is not (CMake) code which we should already have for the most part but agreeing on what is a good "developer interface": a relatively new kind of goal in this project.

I don't know enough to gauge the effort for the the initial implementation of 2. On the other hand I'm concerned about the on-going and continuously growing validation and maintenance effort implied. Hopefully someone familiar with the code and/or the original issue can elaborate.

Once we have 1. and 2., the main effort for 3 will hopefully be just shuffling some CMake files around while avoiding too much duplication. I'm afraid I'm missing the value of 3 besides making a clear statement that 1 and 2 have been achieved - is that the main goal? "Splitting is always good" is not factual.

PS: jajanusz is not involved with SOF anymore.

@perexg
Copy link
Author

perexg commented Jan 13, 2021

I agree with all three points. It's exactly what I'd like to propose.

I imagine the main problem was users constantly asking why logger version X crashes with firmware version Y?

We would like to use this tool in a package for the universal distributions. We cannot predict the behaviour of users. Although, most of them is using the latest released software (kernel, user space), some of them are downgrading or preserving kernels for a reason. So we need one binary which works with all known firmware / driver combinations to the release date.

@plbossart
Copy link
Member

the logger should be relatively easy to handle as a standalone tool, the main issue I see is that the logger needs to use the exact .ldc file matching the .ri file. If the two files don't match the translation from error code to human-readable error cannot happen and the logger will crap out.

In hindsight, I am not sure why we created a separate .ldc file, we could have added it to the firmware file. All we needed was to strip the dictionary before downloading the actual binary to the DSP...

@perexg
Copy link
Author

perexg commented Jan 14, 2021

the logger should be relatively easy to handle as a standalone tool, the main issue I see is that the logger needs to use the exact .ldc file matching the .ri file. If the two files don't match the translation from error code to human-readable error cannot happen and the logger will crap out.

Normal users are working with packages and the contents in packages is in sync. It's probably useless to load those strings to kernel (unused). It may be better to thing about the validation (put the firmware version string or maybe a hash for the whole file to the debug .ldc file for the consistency verification).

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 14, 2021

It's probably useless to load those strings to kernel (unused)

Is the kernel forced to keep in memory what it loads from /lib/firmware ?

or maybe a hash for the whole file to the debug .ldc file for the consistency verification

Done a few months ago, see #3326 and earlier.
EDIT: done but with some major design bugs, see #3890 and #5920

@plbossart
Copy link
Member

the logger should be relatively easy to handle as a standalone tool, the main issue I see is that the logger needs to use the exact .ldc file matching the .ri file. If the two files don't match the translation from error code to human-readable error cannot happen and the logger will crap out.

Normal users are working with packages and the contents in packages is in sync. It's probably useless to load those strings to kernel (unused). It may be better to thing about the validation (put the firmware version string or maybe a hash for the whole file to the debug .ldc file for the consistency verification).

it's simple today because we only have simple firmwares. in 2021, we are going to see firmware that contains 3rd-party IP, which will be released by OEMs. I will bet we are going to have confusions between the non-public and public firmwares, and with the two LDC files the potential for mismatches is even bigger.

Also for developers, it's classic that people generate a firmware, and copy it to the target but forget to also copy the .ldc.

What I am saying is that we tried to minimize the firmware file, but it might have been a bridge too far.

@plbossart
Copy link
Member

It's probably useless to load those strings to kernel (unused)

Is the kernel forced to keep in memory what it loads from /lib/firmware ?

I think we do, we only request the firmware on the first boot but reuse it for system resume. we only call release_firmware() in the driver .remove() callback.

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 15, 2021

I forgot about power management, thanks. However I imagine the resume process would start from an already stripped image, wouldn't it? Then it would be trivial not to keep the dictionary in kernel RAM.

I just built everything and looked at file sizes. *.ldc file sizes range between 35K and 85K. Almost all are bigger than 60K. *.ri file sizes range between 90K and 270K, almost all are bigger than 160K.

@perexg
Copy link
Author

perexg commented Jan 15, 2021

It's probably useless to load those strings to kernel (unused)

Is the kernel forced to keep in memory what it loads from /lib/firmware ?

It's just waste of resources, because the whole firmware contents is handled at the load time even if it's scratched later.

or maybe a hash for the whole file to the debug .ldc file for the consistency verification

Done a few months ago, see #3326 and earlier.

Great. I assume that the sof-logger code does some validation to avoid using wrong .ldc file now.

@paulstelian97
Copy link
Collaborator

It's just waste of resources, because the whole firmware contents is handled at the load time even if it's scratched later.

I don't see where there is waste, sorry.

I think @plbossart 's idea goes like this:

  1. The.ldc is made a part of the .ri file at build time. Absolutely zero possibility of .ldc mismatch. No need for any checksum.
  2. After loading the .ri file, the SOF kernel driver immediately discards the .ldc part. This avoids wasting a couple hundreds kilobytes of kernel memory.
  3. The user points the logger at the same firmware file that was loaded by the kernel. The logger ignores the code and loads only the .ldc part.

@plbossart please correct me or thumb this comment up if I understood correctly.

Not sure if it's exactly the same idea as he presented but I see value in this one. There needs to be a checksum or something, a build number maybe, that the firmware must include and the logger tool still needs to check in order to match the firmware on-disk with the loaded one.

@perexg
Copy link
Author

perexg commented Jan 19, 2021

It's just waste of resources, because the whole firmware contents is handled at the load time even if it's scratched later.

I don't see where there is waste, sorry.

You need more memory than required for the standard use at the load time and you also need some CPU time. I agree, it may be negligible, but it should be considered before any change.

There needs to be a checksum or something, a build number maybe, that the firmware must include and the logger tool still needs to check in order to match the firmware on-disk with the loaded one.

Yes, so the reason for the all-in-one file is not so strong now, right?

I'm fine with any method when we can check the debug info consistency.

Another option: If we omit the CPU/memory wasting at the load and runtime, we can keep .ldc contents included to the .ri fw file in the kernel memory and export it via debugfs. Perhaps a kernel option can be added to not load and handle this contents at the fw load time by default.

@lgirdwood
Copy link
Member

We currently have a manifest in the FW header which also includes the FW binary payload size and offset. It would be possible to stitch the ldc file at the end of the FW file so that

  1. kernel would not even have to load the LDC data i.e. it reads header and loads only FW payload.
  2. logger would read the header and skip the FW payload and jump directly to LDC data.
    This way we would keep the single file and not have to worry about consuming any extra kernel or user memory.

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 19, 2021

There needs to be a checksum or something, a build number maybe, that the firmware must include and the logger tool still needs to check in order to match the firmware on-disk with the loaded one.

Yes, so the reason for the all-in-one file is not so strong now, right?

Checksum or not, a single file would be massively more convenient and less error-prone from the perspective of a user who just wants to use the logger.

@perexg
Copy link
Author

perexg commented Jan 19, 2021

From the packaging perspective, it means (using 1.6.1) about 8% size increase - 7.3MB/0.57MB (and impossibility to split the debug info - which is not required to be installed for the standard operation). At least, I don't force to install .ldc files by default in our distros - we have alsa-sof-firmware and alsa-sof-firmware-debug packages. I mean, the split gives us more flexibility (and I can argue that the debug info split is also convenient).

@marc-hb marc-hb added the logger Dictionary and logger label Mar 3, 2021
@lgirdwood lgirdwood modified the milestones: v1.7, v1.9 May 10, 2021
@kv2019i
Copy link
Collaborator

kv2019i commented Feb 15, 2023

Now that we have moved to west to manage the repositories, this (splitting out tools to a separate repo) would be much easier to implement in practise) and we could do this without interfering with SOF developers' workflows. There is currently noone working on this, and this definitely did not happen for v1.9, so moving to TBD milestone until this gets prioritized again.

@kv2019i kv2019i modified the milestones: v1.9, TBD Feb 15, 2023
@marc-hb
Copy link
Collaborator

marc-hb commented Feb 15, 2023

BTW Intel and others are moving away from sof-logger so not going to spend time making it build alone and backwards compatible. There are other tools though.

@dbaluta
Copy link
Collaborator

dbaluta commented Feb 16, 2023

@marc-hb is there a documentation of how is Intel doing logging? What is the replacement for sof-logger?

Does this happen only for SOF with Zephyr or also for SOF with XTOS?

@marc-hb
Copy link
Collaborator

marc-hb commented Feb 16, 2023

As far as Intel hardware is concerned:

  • main branch: Zephyr-only, IPC4 only, future products only. TGL/ADL/RPL+Zephyr+IPC4 is "opt-in"/experimental in the main branch; for developers.
  • stable-v2.2: XTOS-only, IPC3 only. For everything up to and including TGL/ADL/RPL.

Other vendors can keep maintaining sof-logger, XTOS and any IPC in the main branch if they want to but Intel won't participate.

What is the replacement for sof-logger?

Currently https://github.com/thesofproject/sof/tree/main/tools/mtrace (shared memory), DMA solution later. All based on Zephyr logging code.

@perexg perexg changed the title [FEATURE] Move logger (and debug tools?) to other repository [FEATURE] Move logger/trace (and debug tools?) to other repository Feb 16, 2023
@perexg
Copy link
Author

perexg commented Feb 16, 2023

The name of the tool is not relevant. The point is that there should be a split.

@marc-hb
Copy link
Collaborator

marc-hb commented Feb 16, 2023

The discussion was about who's maintaining what and not. Testing and maintaining more combinations is significant work. A separate (and on-going) validation effort is required for each tool; logging and not.

So we need one binary which works with all known firmware / driver combinations to the release date.

This by the way can be done independently of the number of source repos. I mean you can release separately and implement backwards-compatibility across multiple components all hosted in a unique repo. You can also have components hosted in separate repos that are tested only in very specific version combinations; example: sof/west.yml. This must be kept in mind in discussions like these. This is a complex topic https://medium.com/@sdboyer/so-you-want-to-write-a-package-manager-4ae9c17d9527

@lgirdwood lgirdwood modified the milestones: TBD, v2.6 Feb 20, 2023
@lgirdwood
Copy link
Member

@perexg I think most of the bulk code changes are mostly in place for Zephyr now, next steps are some repo reconfiguration to remove sub repos and use west as a repo manifest. This would enable tools to be split out of the SOF repo. Fwiw some tools will also come from Zephyr too.

@lgirdwood lgirdwood modified the milestones: v2.6, TBD, v2.7 May 10, 2023
@lgirdwood
Copy link
Member

Moving to v2.7 west now integrated, some tools in Zephyr, but other tools, topologies and configs still pending.

@alex-cri alex-cri modified the milestones: v2.7, v2.8 Aug 21, 2023
@marc-hb
Copy link
Collaborator

marc-hb commented Oct 26, 2023

October 2023 update:
- sof-logger is dead for Intel and not maintained by anyone else. It's Zephyr logging / mtrace now.
- After many long discussions, we realized that a separate rimage was orders more maintenance work that we could handle, so rimage was just moved back into sof.git: #8178 . There are some generic, non-rimage specific "talking points" about the number of git repos in 8178
- No one seems to use sof-ctl and sof-probes; we never hear anything about them. In any case their code never moves.

So I think the only question left is topologies. This issue was not filed about topologies and the long discussion above barely ever mentioned topologies so I'm closing this issue as "wontfix". Please open a new issue about topologies if desired (referencing this one).

The code should be compatible with all SOF firmware/driver releases.

A separate repo is indeed a strong "compatibility matrix" statement - and a lot of validation work is required to achieve such support. I don't know whether separate topologies would be a realistic validation and support work commitment. For sure it wasn't for the tools above.

(A compability matrix can also be implemented inside a single repo - but that's less intuitive and less assertive)

https://github.com/thesofproject/sof-bin/releases has been implementing incremental releases of topologies just fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request logger Dictionary and logger
Projects
None yet
Development

No branches or pull requests