Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to change SVE vector length for current and children processes #101295

Merged
merged 18 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions src/coreclr/vm/codeman.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1257,6 +1257,9 @@ void EEJitManager::SetCpuInfo()

int cpuFeatures = minipal_getcpufeatures();

// Get the maximum bitwidth of Vector<T>, rounding down to the nearest multiple of 128-bits
uint32_t maxVectorTBitWidth = (CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_MaxVectorTBitWidth) / 128) * 128;
SwapnilGaikwad marked this conversation as resolved.
Show resolved Hide resolved

#if defined(TARGET_X86) || defined(TARGET_AMD64)

#if defined(TARGET_X86) && !defined(TARGET_WINDOWS)
Expand All @@ -1271,9 +1274,6 @@ void EEJitManager::SetCpuInfo()

CPUCompileFlags.Set(InstructionSet_VectorT128);

// Get the maximum bitwidth of Vector<T>, rounding down to the nearest multiple of 128-bits
uint32_t maxVectorTBitWidth = (CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_MaxVectorTBitWidth) / 128) * 128;

if (((cpuFeatures & XArchIntrinsicConstants_VectorT256) != 0) && ((maxVectorTBitWidth == 0) || (maxVectorTBitWidth >= 256)))
{
// We allow 256-bit Vector<T> by default
Expand Down Expand Up @@ -1525,7 +1525,16 @@ void EEJitManager::SetCpuInfo()

if (((cpuFeatures & ARM64IntrinsicConstants_Sve) != 0) && CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_EnableArm64Sve))
{
CPUCompileFlags.Set(InstructionSet_Sve);
int maxVectorLength = (maxVectorTBitWidth >> 3);
SwapnilGaikwad marked this conversation as resolved.
Show resolved Hide resolved
uint64_t systemVectorTLength = GetSystemVectorLength();
SwapnilGaikwad marked this conversation as resolved.
Show resolved Hide resolved

if (maxVectorLength >= systemVectorTLength)
SwapnilGaikwad marked this conversation as resolved.
Show resolved Hide resolved
{
// Enable SVE only when user specified vector length larger than or equal to the system
// vector length. When eabled, SVE would use full vector length available to the process.
SwapnilGaikwad marked this conversation as resolved.
Show resolved Hide resolved
// For a 256-bit machine, if user provides DOTNET_MaxVectorTBitWidth=128, disable SVE.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When DOTNET_MaxVectorTBitWidth is not specified, it is 0 and in that case, we should use the full vector length, the system offers (as the comment says). In that case, we should set CPUCompileFlags.Clear(InstructionSet_VectorT256); on 256-bit machine and CPUCompileFlags.Clear(InstructionSet_VectorT256); and CPUCompileFlags.Clear(InstructionSet_VectorT512); on a 512-bit machine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wouldn't want to CPUCompileFlags.Clear, as that's implicit

Rather instead we would CPUCompileFlags.Set(InstructionSet_VectorT256); if the reported SVE length is 256-bits and CPUCompileFlags.Set(InstructionSet_VectorT512); if the reported SVE length is 512-bits.

They should be off unless otherwise set, but we double check that via a cleanup check anyways (which should be moved to be shared with Arm64):

// Clean up mutually exclusive ISAs
if (CPUCompileFlags.IsSet(InstructionSet_VectorT512))
{
CPUCompileFlags.Clear(InstructionSet_VectorT256);
CPUCompileFlags.Clear(InstructionSet_VectorT128);
}
else if (CPUCompileFlags.IsSet(InstructionSet_VectorT256))
{
CPUCompileFlags.Clear(InstructionSet_VectorT128);
}

-- I think we also need a TODO here explaining that we're artificially restricting the size to 128-bits for the time being, as the support for larger vector sizes hasn't been plumbed through for Arm64 yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wouldn't want to CPUCompileFlags.Clear, as that's implicit

Yes, that's what I meant...if they are already enabled, then we should disable it or vice-versa.

I think we also need a TODO here explaining that we're artificially restricting the size to 128-bits for the time being,

Yes, I think that's what we should do and leave the change of setting Vector256, etc. for later PR, when we address some of the things in #101477 , specifically getVectorTByteLength().

SwapnilGaikwad marked this conversation as resolved.
Show resolved Hide resolved
CPUCompileFlags.Set(InstructionSet_Sve);
}
}

// DCZID_EL0<4> (DZP) indicates whether use of DC ZVA instructions is permitted (0) or prohibited (1).
Expand Down
10 changes: 10 additions & 0 deletions src/coreclr/vm/codeman.h
Original file line number Diff line number Diff line change
Expand Up @@ -1912,6 +1912,16 @@ protected :
return m_CPUCompileFlags;
}

#ifdef __GNUC__
__attribute__((target("sve")))
#endif
inline UINT64 GetSystemVectorLength()
{
UINT64 size;
__asm__ __volatile__("cntb %0" : "=r"(size));
return size;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this to fail for MSVC, which doesn't allow inline assembly?

Can we just call svcntb() on Windows instead, or is there potentially an official OS API for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure this was available for MSVC yet. If so, then great, that's easier.

Is there a min version of MSVC needed to build coreclr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, it might not be available yet, at least its not available in the MSVC version CI is using. -- @kunalspathak might know better when that support is expected to land.

I expect we need some ifdef here regardless since __asm is explicitly unsupported on MSVC for anything but 32-bit x86 and for the general function to be configured to assert if called from Windows, as a safety measure. -- This function directly executing an SVE instruction unguarded is a bit dangerous and there isn't really a way to assert that it's being done safely. It's only safe in the current use-case since its only called if the relevant support was queried from the OS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least it's not available in the MSVC version CI is using

yes, I checked around and they haven't added that support yet, so it will not be anytime sooner

Can we just call svcntb() on Windows instead

They do not and the guidance was to use whatever is exposed by the compiler, so we won't have it at least for new few months.

So, I think in short-term, let us do it in asm guarded with if CPU-capability supports SVE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consideration is the msvc compiler does not support asm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return a hardcoded vector length of 16-bytes until we find a suitable mechanism to retrieve it.

Just noting that we won't be able to actually ship in November in such a state. We will require some form of actual query of the hardware size from the OS to avoid any issues if its run on hardware that has larger length.

It might be overall simpler to just get such a fallback setup now so we don't need to worry about it, risk forgetting it about it, or anything along those lines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be overall simpler to just get such a fallback setup now

Agreed, but AIUI, for windows today:

  • Can't use inline asm, including encoding in hex
  • no OS API available
  • no SVE ACLE available

I'm not aware of any other method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the solution is what @jkotas mentioned of emitting hex code or just write the assembly cntb in the .asm and .S file - https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/arm64/asmhelpers.asm.

Copy link
Member

@tannergooding tannergooding May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like MASM v14.4 supports things and we can just define a function that does:

rdvl    x0, #1
ret

Earlier versions may also support the functionality, but I don't have such earlier versions installed at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR to test how the ci handles use of rdvl.


private:
bool m_storeRichDebugInfo;

Expand Down
Loading