Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio: DRC: Change DRC to use lookup table based sine function #8491

Merged
merged 3 commits into from
Jan 11, 2024

Conversation

singalsu
Copy link
Collaborator

@singalsu singalsu commented Nov 17, 2023

This change saves in TGL platform about 13 MPCS, from 83 to 70 MCPS. In MTL platform the saving is 12 MCPS, from 46 to 34 MCPS.

The .bss RAM usage increases by 1 kB from selecting CONFIG_MATH_LUT_TRIG_FIXED.

#define SOFM_LUT_SINE_SIZE (SOFM_LUT_SINE_NQUART + 1)

/* An 1/4 period of sine wave as Q1.31 */
const int32_t sofm_lut_sine_table[SOFM_LUT_SINE_SIZE] = {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Should test if this could be int16_t and size 257 for 16 bit sine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one is check wether can use 16-bit, another thing is whether can shorten the size? do we really need 512?
such a small usage cost 2k bytes table, it is big cost.

Also, could you add comments for how this table calculated? sin(i/2pi)? then we can further figure out whether 512 is must.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to ensure the accuracy,and I guess maybe it is because the latency is 512 point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried 256 size but the quality was worse than with default cordic algorithm based 16 bit sine. So, the table still has 512 elements but uint16_t type was sufficient, so the table is now half of previous.

I added also static.

int32_t sin_val = sin_fixed_16b(denorm_x);

return sin_val << 16;
return sofm_lut_sin_fixed_32b(denorm_x);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of the tooling might complain about a missing line between variable definitions and statements - same below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I didn't notice first. You are right.

Copy link
Contributor

@btian1 btian1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a float version drc source code? I want to first check float version, the fixed version, then optimized version.

#define SOFM_LUT_SINE_SIZE (SOFM_LUT_SINE_NQUART + 1)

/* An 1/4 period of sine wave as Q1.31 */
const int32_t sofm_lut_sine_table[SOFM_LUT_SINE_SIZE] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one is check wether can use 16-bit, another thing is whether can shorten the size? do we really need 512?
such a small usage cost 2k bytes table, it is big cost.

Also, could you add comments for how this table calculated? sin(i/2pi)? then we can further figure out whether 512 is must.

src/math/Kconfig Outdated
@@ -9,6 +9,15 @@ config CORDIC_FIXED
Select this to enable sin(), cos(), asin(), acos(),
and cexp() functions as 16 bit and 32 bit versions.

config LUT_TRIG_FIXED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a cleanup at some point where we have MATH_TRIG_ prefix and likewise convention for all maths APIs, macros, Kconfigs etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, done

#define SOFM_LUT_SINE_SIZE (SOFM_LUT_SINE_NQUART + 1)

/* An 1/4 period of sine wave as Q1.31 */
const int32_t sofm_lut_sine_table[SOFM_LUT_SINE_SIZE] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static ?

src/math/Kconfig Outdated
Select this to enable sofm_lut_sin_fixed_32b() function. The
calculation is using 1/4 wave lookup and interpolation.
This option consumes 2052 bytes .bss RAM for the lookup
table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we offer advice on when each trig type should be used. i.e. I would expect we export the same public API for all maths, but the internal calculations will depend on which Kconfig is selected by the user at build time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some text to Kconfig about preferring the lookup sine when used in hot code parts.

@lgirdwood lgirdwood added this to the v2.9 milestone Nov 23, 2023
@singalsu singalsu marked this pull request as ready for review January 9, 2024 17:20
@singalsu
Copy link
Collaborator Author

singalsu commented Jan 9, 2024

do we have a float version drc source code? I want to first check float version, the fixed version, then optimized version.

There used to be long ago in git first version that was float C by Sebastiano and Johny but it was replaced by fixed point code when the work proceeded. A float version should be found from ChromeOS sources. The float code and fixed conversion work for this contribution is owned by team Google so we don't review it here.

I've used the scripts in tools/test/audio to evaluate objective steady signal audio parameters for DRC and we've not seen difference in team Intel's optimizations. DRC has complex transient signal characteristics, and we currently don't have other but subjective expert listening test method for that. It means for me to listen an album of music with this processing in DUT and try to spot any issues.

zephyr_library_sources_ifdef(CONFIG_MATH_LUT_TRIG_FIXED
${SOF_MATH_PATH}/lut_trig.c
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this next to the other SOF_MATH_PATH (#8620)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong PR? There's no change to CMakeLists.txt in that.

Copy link
Contributor

@ShriramShastry ShriramShastry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the changes and they appear to be in good standing. I hope that LUT's size is good to others.

@@ -3,6 +3,7 @@
config COMP_DRC
bool "Dynamic Range Compressor component"
select CORDIC_FIXED
select MATH_LUT_TRIG_FIXED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be MATH_LUT_SINE_FIXED instead of MATH_LUT_TRIG_FIXED?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll change the config name. There is no need for other functions now.

src/math/Kconfig Outdated
@@ -9,6 +9,16 @@ config CORDIC_FIXED
Select this to enable sin(), cos(), asin(), acos(),
and cexp() functions as 16 bit and 32 bit versions.

config MATH_LUT_TRIG_FIXED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be MATH_LUT_SINE_FIXED instead of MATH_LUT_TRIG_FIXED?

@singalsu
Copy link
Collaborator Author

I have reviewed the changes and they appear to be in good standing. I hope that LUT's size is good to others.

I tried 256 size LUT but the quality dropped a lot. With 512 the quality is a tiny bit better than in 16 bit cordic, so there should be no negative audio quality impact from this.

This patch adds function sofm_lut_sin_fixed_16b(). It was
used earlier in SOF with name sin_fixed() but was remove
at add of Cordic trigonometric library. This sine function
can be used in hot code parts. Due to look-up table usage it
consumes more .bss RAM than cordic version.

Signed-off-by: Seppo Ingalsuo <[email protected]>
src/math/lut_trig.c Show resolved Hide resolved
src/math/lut_trig.c Show resolved Hide resolved
/* Q4.28 x Q12.20 -> Q16.48 --> Q16.31*/
idx_tmp = ((int64_t)w * SOFM_LUT_SINE_C_Q20) >> 17;
idx = (idx_tmp >> 31); /* Shift to Q0 */
frac = (int32_t)(idx_tmp - (idx << 31)); /* Get fraction Q1.31*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems to boil down to

idx_tmp - (idx_tmp & 0xffffffff80000000ULL) ==
idx_tmp & 0x7fffffff

would the compiler optimise that out by itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that but it looks awkward in arithmetic that is not bit-banging to HW registers etc. The shifts have association to Qx.y format. But if that gives MCPS advantage I can change, and comment what happens, I'll try.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this modification 69.976 to 69.925 MCPS, not worth it I think, because of bit-and awkwardness here. I think our perf measurement works down to 0.1 MCPS level, below that it's probably noise.

int theta;

for (theta = 0; theta < 360; ++theta) {
double rad = _M_PI * (theta / 180.0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully the compiler will calculate _M_PI / 180.0 at compile time, but parentheses might actually prevent it from doing that and make it a (redundant) run-time calculation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cmocka test code so we don't care about performance even if there would be emulated floats.

The test function is based on test function for the cordic
sine function. The error tolerance is adjusted to just pass.

Signed-off-by: Seppo Ingalsuo <[email protected]>
This change saves in TGL platform about 13 MPCS, from 83
to 70 MCPS. In MTL platform the saving is 12 MCPS, from 46
to 34 MCPS. The .bss RAM usage increases by 1 kB from
selecting CONFIG_MATH_LUT_SINE_FIXED.

Signed-off-by: Seppo Ingalsuo <[email protected]>
@lgirdwood
Copy link
Member

@singalsu can you check CI. Thanks !

@lgirdwood lgirdwood merged commit 8d2fb32 into thesofproject:main Jan 11, 2024
42 of 44 checks passed
@singalsu singalsu deleted the drc_use_lut_sine branch January 17, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants