Skip to content

Commit

Permalink
P9 Float128 add/sub with round to odd, plus P8 equivalent.
Browse files Browse the repository at this point in the history
Power9 provide round to odd versions for the QP arithmetic operations.
This patch provides P9/8 implementation for add/sub QP with round
to odd. Includes compile, unit, and performance tests.

	* src/pveclib/vec_f128_ppc.h [doxygen brief]:
	Added micro-benchmark data. Other clean up.
	[doxygen f128_softfloat_0_0]: Clarifications.
	[doxygen f128_softfloat_0_0_0]: Clarifications.
	[doxygen f128_softfloat_IRRN_0_0]: Clarifying Note.
	[doxygen f128_softfloat_IRRN_0_1]:
	Improve internal representation (IR) overview.
	[doxygen f128_softfloat_IRRN_0_2]:
	Expond on how the IR impacts rounding.
	[doxygen f128_softfloat_0_0_3_2]:
	Add overview of Add Quad-Precision with Round-to-Odd.
	[doxygen f128_softfloat_0_0_3_3]:
	Add overview of Subract Quad-Precision with Round-to-Odd.
	[__clang__]: Clang keeps changing, tried to fix it.
	(vec_negf128 [__FLOAT128__) && (__GNUC__ > 7]:
	Use C arithmetic syntax for negate.
	(vec_xsaddqpo, vec_xssubqpo): New inline operations.
	(vec_xsmulqpo): Update latency numbers.

	* src/testsuite/arith128_test_f128.c
	(db_vec_xsaddqpo, db_vec_xssubqpo): New debug implementations.
	(test_add_qpo, test_sub_qpo): New unit test functions.
	(test_sub_qpo): Add test_add_qpo and test_sub_qpo to test
	driver.

	* src/testsuite/pveclib_perf.c (test_time_f128):
	Add timed tests timed_gcc_addqpn_f128, timed_lib_addqpo_f128,
	timed_gcc_subqpn_f128, and timed_lib_subqpo_f128.

	* src/testsuite/vec_f128_dummy.c (force_eMin, force_eMin_V0,
	test_vec_xsaddqpo, test_vec_xssubqpo, test_genqpo_v0,
	test_vec_addqpo, test_vec_addqpo_V1, test_vec_addqpo_V0,
	test_negqp_nan_v0, test_vec_subqpo, test_vec_subqpo_V0):
	New compile tests. Some are used in unit and perf tests.
	(test_gcc_addqpn_f128, test_gcc_subqpn_f128):
	New combined compile and perf operation tests.
	(test_gcc_mulqpn_f128, test_gcc_mulqpo_f128): Fix
	combined compile and perf operation tests.

	* src/testsuite/vec_perf_f128.c (test_gcc_addqpn_f128,
	test_gcc_subqpn_f128, test_vec_addqpo, test_vec_subqpo):
	New externs.
 	(test_lib_addqpo_f128, test_lib_subqpo_f128):
 	New inner perf test.
 	(timed_lib_addqpo_f128, timed_lib_subqpo_f128,
 	timed_gcc_addqpn_f128, timed_gcc_subqpn_f128):
 	New timed perf tests.

 	* src/testsuite/vec_perf_f128.h (timed_gcc_addqpn_f128.
 	timed_lib_addqpo_f128, timed_gcc_subqpn_f128,
 	timed_lib_subqpo_f128): New externs.

 	*src/testsuite/vec_pwr10_dummy.c
 	[(__GNUC__ == 11) && (__GNUC_MINOR__ > 2)]:
 	Waiting for fixes in GCC 11.3?

 	* src/testsuite/vec_pwr9_dummy.c (test_negqp_PWR9):
 	New compile test.
 	Various -Wall clean-ups.

Signed-off-by: Steven Munroe <[email protected]>
  • Loading branch information
munroesj52 committed Dec 9, 2021
1 parent c96cd5d commit ba97003
Show file tree
Hide file tree
Showing 8 changed files with 11,995 additions and 6,412 deletions.
1,459 changes: 1,356 additions & 103 deletions src/pveclib/vec_f128_ppc.h

Large diffs are not rendered by default.

15,551 changes: 9,262 additions & 6,289 deletions src/testsuite/arith128_test_f128.c

Large diffs are not rendered by default.

56 changes: 56 additions & 0 deletions src/testsuite/pveclib_perf.c
Original file line number Diff line number Diff line change
Expand Up @@ -850,6 +850,62 @@ test_time_f128 (void)
printf ("\n%s mulqpn_lib tb delta = %lu, sec = %10.6g\n", __FUNCTION__,
t_delta, delta_sec);

printf ("\n%s addqpn_gcc start, ...\n", __FUNCTION__);
t_start = __builtin_ppc_get_timebase ();
for (i = 0; i < TIMING_ITERATIONS; i++)
{
rc += timed_gcc_addqpn_f128 ();
}
t_end = __builtin_ppc_get_timebase ();
t_delta = t_end - t_start;
delta_sec = TimeDeltaSec (t_delta);

printf ("\n%s addqpn_gcc end", __FUNCTION__);
printf ("\n%s addqpn_gcc tb delta = %lu, sec = %10.6g\n", __FUNCTION__,
t_delta, delta_sec);

printf ("\n%s addqpo_lib start, ...\n", __FUNCTION__);
t_start = __builtin_ppc_get_timebase ();
for (i = 0; i < TIMING_ITERATIONS; i++)
{
rc += timed_lib_addqpo_f128 ();
}
t_end = __builtin_ppc_get_timebase ();
t_delta = t_end - t_start;
delta_sec = TimeDeltaSec (t_delta);

printf ("\n%s addqpo_lib end", __FUNCTION__);
printf ("\n%s addqpo_lib tb delta = %lu, sec = %10.6g\n", __FUNCTION__,
t_delta, delta_sec);

printf ("\n%s subqpn_gcc start, ...\n", __FUNCTION__);
t_start = __builtin_ppc_get_timebase ();
for (i = 0; i < TIMING_ITERATIONS; i++)
{
rc += timed_gcc_subqpn_f128 ();
}
t_end = __builtin_ppc_get_timebase ();
t_delta = t_end - t_start;
delta_sec = TimeDeltaSec (t_delta);

printf ("\n%s subqpn_gcc end", __FUNCTION__);
printf ("\n%s subqpn_gcc tb delta = %lu, sec = %10.6g\n", __FUNCTION__,
t_delta, delta_sec);

printf ("\n%s subqpo_lib start, ...\n", __FUNCTION__);
t_start = __builtin_ppc_get_timebase ();
for (i = 0; i < TIMING_ITERATIONS; i++)
{
rc += timed_lib_subqpo_f128 ();
}
t_end = __builtin_ppc_get_timebase ();
t_delta = t_end - t_start;
delta_sec = TimeDeltaSec (t_delta);

printf ("\n%s subqpo_lib end", __FUNCTION__);
printf ("\n%s subqpo_lib tb delta = %lu, sec = %10.6g\n", __FUNCTION__,
t_delta, delta_sec);

return (rc);
}
#endif
Expand Down
Loading

0 comments on commit ba97003

Please sign in to comment.