Suggestion: Adding a CPU Database #60

davschneller · 2023-08-28T08:22:12Z

Hi everyone, this is more or less meant as a suggestion issue... So: we're getting to a point where we have many CPU architectures, so Yateto is getting cluttered with a lot of case distinctions. Therefore, the following suggestion: we move everything concerning a CPU architecture in Yateto and SeisSol into an extra file—at the caveat of having to parse it during configuration (or maybe generate a CMake file out of it before that). The following suggestion, as an example, we have Haswell here. The respective file would contain more archs in the end.

- name: Haswell
  identifier: [haswell, hsw]
  vendor: intel
  isa: x86-64
    - avx2
    - fma
  vectorsize: 32
  cachelinesize: 64
  flags:
    - gcc:
        - flags: -mavx2 -mfma
    - icc:
        - flags: -xCORE-AVX2 -fma
  gemmgen:
    - libxsmm_jit:
        - arch: hsw
    - libxsmm:
        - arch: hsw
    - pspamm:
        - arch: hsw
    - mkl:
    - openblas:
    - blis:
    - eigen:

Here, we have:

name denotes a text for pretty-printing (required)
identifier contains a list of unique strings which are used to select the arch (required)
vendor, as far as known, contains a vendor name (optional)
isa contains the basic ISA, and instruction set extensions as bullet points. (all optional)
vectorsize contains the size of the vector register (required? optional?)
cachelinesize contains the size of a cache line (maybe optional, maybe make it default to vectorsize)
flags contains compile flags, as currently used in SeisSol. (optional)
- If any compiler is explicitly given, the specified flags are used. TODO: are there other flags, e.g. LDFLAGS we want to use?
- We may implicitly or explicitly need to add an order to deal with unknown compilers: suggestion would be to default to clang if provided, and to gcc if clang is not given.
gemmgen: the list of GEMM generators that Yateto uses, in preferred order (optional)
- if no GEMM generator is given, Yateto will only generate some for loops
- if we just give a GEMM to Yateto, we think it will go through the list from top to bottom and select the first working GEMM generator for the problem. As it is specified right now in the DefaultGeneratorCollection already.
- for each generator, we may also provide some arguments: for libxsmm_jit, libxsmm and pspamm how the architecture is called internally. (maybe that's unnecessary for libxsmm_jit?) That is useful e.g. for using Zen 1,2,3 with libxsmm: here we would just set arch: hsw.
base: a list of archs to inherit the options from. Archs that are mentioned later in the list override those that are in it earlier. (optional, by default assumed to be an empty list) Not shown in this example.

Suggestions/comments? Did I forget something? Or does a database like that exist already and we only need one for the gemmgen options?

If you like this approach, we may also do sort of the same for GPUs in the end. That might lessen the work in the hw_descr.py as provided in gemmforge/chainforge.

The text was updated successfully, but these errors were encountered:

krenzland · 2023-08-28T08:35:26Z

The idea itself is good. Parsing in cmake might be super painful, but could be converted with a small python script.
BTW, if we'd use JSON, we could parse it directly in Cmake (3.19):
https://cmake.org/cmake/help/latest/command/string.html#json

cachelinesize contains the size of a cache line (maybe optional, maybe make it default to vectorsize)

Should be separate. E.g. on A64FX cachline size is 256 bytes vs 64 byte vector instruction length

redzone modifier (assuming it is required with newer libxsmm versions?)

davschneller · 2023-08-28T11:58:26Z

The idea itself is good. Parsing in cmake might be super painful, but could be converted with a small python script. BTW, if we'd use JSON, we could parse it directly in Cmake (3.19): https://cmake.org/cmake/help/latest/command/string.html#json

I may guess that a Python script may be the most convenient option—we already depend on Python during build due to Yateto itself. Otherwise, we could of course use JSON, or convert from YAML to JSON (if YAML is easier to read for this which I'd assume).

cachelinesize contains the size of a cache line (maybe optional, maybe make it default to vectorsize)

Should be separate. E.g. on A64FX cachline size is 256 bytes vs 64 byte vector instruction length

Ok, then it shall be so.

* redzone modifier (assuming it is required with newer libxsmm versions?)

Is it? If yes, where? (couldn't find it on main when grepping redzone, ZONE, or zone at least)

uphoffc · 2023-08-28T12:31:26Z

https://github.com/SeisSol/SeisSol/blob/e6ef66194c9b4aecc62bd8d5e808427f1376a8d3/CMakeLists.txt#L473

-mno-red-zone should not be required for libxsmm_jit but only for the libxsmm generator. The flag is necessary as the offline generator wraps inline assembly inside a function, but the compiler does not inspect the inline assembly. Now if the inline assembly modifies the stack (pushq / popq) then it overwrites the "red zone" that the compiler thinks it's safe to use.

krenzland · 2023-08-28T14:07:55Z

IMO Json would be easier, you don't modify these settings that often anyways. Assuming this makes the CMake stuff easier.

-mno-red-zone should not be required for libxsmm_jit but only for the libxsmm generator. The flag is necessary as the offline generator wraps inline assembly inside a function, but the compiler does not inspect the inline assembly. Now if the inline assembly modifies the stack (pushq / popq) then it overwrites the "red zone" that the compiler thinks it's safe to use.

Time to deprecate the old libxsmm generator? ;)

uphoffc · 2023-08-28T15:14:56Z

Time to deprecate the old libxsmm generator? ;)

Afaik the offline generator is deprecated for years...

krenzland · 2023-08-28T18:13:38Z

Yes, it is. I meant deprecating it in SeisSol as well and making the jit one standard

davschneller · 2023-09-06T09:10:28Z

Ok, found a CPU database which contains compiler flags and ISA info at least: https://github.com/archspec/archspec-json/blob/master/cpu/microarchitectures.json

But, as far as I could see, no info on the cacheline etc. ... We could include something like the CPUID database here still. That one should have info there.

With that, we would at least have the vector length to specify—i.e. maybe we'll have one template per vector instruction set in the end? As the gemmgen can usually be bound to a vector instruction set at least. And maybe we may add processors, if any of them have specialized gemmgen options.

(long-term, we may have also to consider different floating point/number data types, and so on)

To summarize, here are some more ideas for the format, now splitting between vector arch and CPU:

- identifier: sve512
  features: [sve]
  vectorsize: 64
  type: isa
  gemmgen-priority: [libxsmm_jit, pspamm]
  gemmgen:
    pspamm:
      arch: arm_sve512
    libxsmm_jit:
- identifier: a64fx # which features are implemented is taken from the database above... Or we specify explicitly
  type: cpu
  cachelinesize: 256

gemmgen and its priority have been split, to allow easier subtyping for CPUs. (e.g. if libxsmm has special options for a certain processor, then we can re-specify a gemmgen option there)

krenzland · 2023-09-06T09:19:38Z

The idea is nice, but aren't we overthinking this a bit at this point?
Compiler flags might be a good idea to include in this format. I like the archspec thing (at a first glance) but I haven't closely checked whether it's a good fit for us.

davschneller mentioned this issue Sep 5, 2023

Add Some More CPU Architectures #61

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Adding a CPU Database #60

Suggestion: Adding a CPU Database #60

davschneller commented Aug 28, 2023

krenzland commented Aug 28, 2023

davschneller commented Aug 28, 2023

uphoffc commented Aug 28, 2023

krenzland commented Aug 28, 2023

uphoffc commented Aug 28, 2023

krenzland commented Aug 28, 2023

davschneller commented Sep 6, 2023

krenzland commented Sep 6, 2023

Suggestion: Adding a CPU Database #60

Suggestion: Adding a CPU Database #60

Comments

davschneller commented Aug 28, 2023

krenzland commented Aug 28, 2023

davschneller commented Aug 28, 2023

uphoffc commented Aug 28, 2023

krenzland commented Aug 28, 2023

uphoffc commented Aug 28, 2023

krenzland commented Aug 28, 2023

davschneller commented Sep 6, 2023

krenzland commented Sep 6, 2023