Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mailbox: use uncached address to write sw regs #8398

Merged
merged 1 commit into from
Nov 3, 2023

Conversation

RanderWang
Copy link
Collaborator

mailbox_sw_regs_write is used by dai to update llp node. In multi-core test, the updated llp node is invisable to other cores although dcache_writeback_region is used on both cavs & ace platforms. Now use uncached address to write memory directly which is aligned with other mailbox access functions.

Validated on cavs & ace platforms.

multi-core topology: #8240

@RanderWang
Copy link
Collaborator Author

please check the following log. core1 and core2 allocate the same slot offset and have the different content of mailbox (the bold lines should be same). The time stamp show that there is no competition risk.

[ 0.449035] ipc: dai_init_llp_info: comp:10 0xe0004 +++ alloc slot 77c, now 19
[ 0.476428] ipc: dai_get_unused_llp_slot: comp:11 0xf0004 used slot c00
[ 0.476431] ipc: dai_get_unused_llp_slot: comp:11 0xf0004 used slot c20
[ 0.476440] ipc: dai_get_unused_llp_slot: comp:11 0xf0004 used slot b00
[ 0.476445] ipc: dai_get_unused_llp_slot: comp:11 0xf0004 *** core[1] confirm slot d10
[ 0.476448] ipc: dai_init_llp_info: comp:11 0xf0004 +++ alloc slot 77c, now 18
[ 0.594686] ipc: dai_get_unused_llp_slot: comp:16 0x140004 used slot c00
[ 0.594688] ipc: dai_get_unused_llp_slot: comp:16 0x140004 used slot d00
[ 0.594696] ipc: dai_get_unused_llp_slot: comp:16 0x140004 used slot b00
[ 0.594701] ipc: dai_get_unused_llp_slot: comp:16 0x140004 used slot c10
[ 0.594705] ipc: dai_get_unused_llp_slot: comp:16 0x140004 *** core[2] confirm slot d20

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcinszkudlinski @lyakh please take a look, we'd have to understand why this PR has an impact.

offset), bytes);
ptr_c = (uint32_t __sparse_cache *)(MAILBOX_SW_REG_BASE + offset);
ptr = cache_to_uncache(ptr_c);
memcpy_s(ptr, MAILBOX_SW_REG_SIZE - offset, src, bytes);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite worrying finding, how come this has impact? Is the register read via cached address by other cores? This still doesn't explain why this PR would have any impact. We should reallly understand why dcache_writeback_region() is not working, we rely on this in many places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure if this is true - but I think that data inside the sw_mailbox may be modified by the Linux driver. dcache writeback may destroy changes if this is the case.
Cache access to the mailbox doesn't make sense anyway.

and
mailbox_sw_regs_write is used by dai to update llp node. In multi-core test, the updated llp node is invisable to other cores although dcache_writeback_region is used

dcache writeback is not enough, dcache_invalidate is needed on the reader core.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am also very surprised. The reading method is mailbox_sw_reg_read, so it should be ok. This change use the same method for mailbox_sw_reg_read.

And there is a PR to switch to uncached memory by who ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux driver doesn't access llp in mailbox now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dcache writeback is not enough, dcache_invalidate is needed on the reader core.

This would be a logical explanation, yes. Can we add such an invalidation on the reader side? But in principle - yes, a cached write plus a writeback shouldn't be faster than just a plain simple uncached write. We'd only benefit from cached access there if the same core were accessing that data multiple times without dropping caches - reading or writing (without writing back) data there. But I doubt that we ever do that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reg reads seem to use uncached addresses, so these seem correct. Nevertheless, i don't think the writeback makes given the range of addresses may not be cache aligned, so best to change the register write routines to use uncached addresses.

@kv2019i what kind of registers are those?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh these are "sw regs", so I guess it can be whatever sw/fw agree upon. Do you think the writeback is safe here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh these are "sw regs", so I guess it can be whatever sw/fw agree upon. Do you think the writeback is safe here?

@kv2019i my understanding of "software registers" is that they're just generic locations in RAM used purely by the software without any direct hardware effects? If that's correct, then at the physical level they should also be handled like ordinary RAM, i.e. all cache management applies?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughs here are

  1. cache ops here are not being aligned or forced on cache line boundaries or sizes.
  2. No obvious locking, so can race (and corrupt with 1 above).
  3. The uncache alias is probably fixing this issue as the HW IO arbitration is probably enforcing serialised IO on the memory for all cores (i.e. the HW bus arbitration is a HW spinlock).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3. The

My thoughs here are

1. cache ops here are not being aligned or forced on cache line boundaries or sizes.

sw_regs_write may be no cache aligned since the llp slot size is 20 bytes

2. No obvious locking, so can race (and corrupt with 1 above).

we have lock to protect sw_regs_write

3. The uncache alias is probably fixing this issue as the HW IO arbitration is probably enforcing serialised IO on the memory for all cores (i.e. the HW bus arbitration is a HW spinlock).

@RanderWang
Copy link
Collaborator Author

RanderWang commented Oct 27, 2023

llp code in dai. Two cores find the same slot is unused, and they check the slot status at different time (time interval is about 20ms, so no competition risk)

	key = k_spin_lock(&sof_get()->fw_reg_lock);

	/* find unused llp slot offset with node_id of zero */
	for (i = 0; i < max_slot; i++, offset += sizeof(slot)) {
		uint32_t node_id;

		node_id = mailbox_sw_reg_read(offset);
		if (!node_id)
			break;
	}

	if (i >= max_slot) {
		comp_err(dev, "can't find free slot");
		k_spin_unlock(&sof_get()->fw_reg_lock, key);
		return -EINVAL;
	}

	memset_s(&slot, sizeof(slot), 0, sizeof(slot));
	slot.node_id = node->dw & IPC4_NODE_ID_MASK;
	mailbox_sw_regs_write(offset, &slot, sizeof(slot));

	k_spin_unlock(&sof_get()->fw_reg_lock, key);

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RanderWang can you remove the volatile and use the uncache alias only - does this help ?

src/platform/intel/ace/include/ace/lib/mailbox.h Outdated Show resolved Hide resolved
offset), bytes);
ptr_c = (uint32_t __sparse_cache *)(MAILBOX_SW_REG_BASE + offset);
ptr = cache_to_uncache(ptr_c);
memcpy_s(ptr, MAILBOX_SW_REG_SIZE - offset, src, bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughs here are

  1. cache ops here are not being aligned or forced on cache line boundaries or sizes.
  2. No obvious locking, so can race (and corrupt with 1 above).
  3. The uncache alias is probably fixing this issue as the HW IO arbitration is probably enforcing serialised IO on the memory for all cores (i.e. the HW bus arbitration is a HW spinlock).

mailbox_sw_regs_write is used by dai to update llp node. In multi-core
test, the updated llp node is invisable to other cores although
dcache_writeback_region is used on both cavs & ace platforms. Now use
uncached address to write memory directly which is aligned with other
mailbox access functions.

Validated on cavs & ace platforms.

Signed-off-by: Rander Wang <[email protected]>
@RanderWang
Copy link
Collaborator Author

removed volatile , thanks!

Tested on TGL & MTL

@RanderWang
Copy link
Collaborator Author

It may be could improve thesofproject/linux#4832 since this PR also fixes dai module initialization failure.

@lgirdwood lgirdwood merged commit 195fed7 into thesofproject:main Nov 3, 2023
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants