Questions about non-pim logic #10

Drewmackintire · 2024-03-21T07:30:16Z

I'm writing the code for executing some useful kernels for gpt3 and other transformer model. After completing kernels, I'd like to compare the PIM & NON-PIM execution latency by PIMBENCHTESTCASES file. However, when I observe the code, I can't see any typical kernel or execution for non-pim kernel. Can you give a tip for mocking or simulating NON-PIM execution kernel? Thanks.

iamshcha · 2024-04-13T13:54:21Z

The variety of cases is expected to arise based on the tiling method, the sequence of weight data reads by the xPU, and the management of intermediate results. These cases depend on the configuration of the xPU system, for example, the size of the SPM or the impact of on-chip caches can lead to different optimization techniques. For the non-PIM system, we used an oracle approach that assumes sequential reads of the weight data(see genMemTraffic()), which hide the computational overhead, but note that unlike GEMV, this approach cannot be used for operations with high locality data.

iamshcha mentioned this issue May 21, 2024

Where are the compute cycles simulated? #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about non-pim logic #10

Questions about non-pim logic #10

Drewmackintire commented Mar 21, 2024

iamshcha commented Apr 13, 2024

Questions about non-pim logic #10

Questions about non-pim logic #10

Comments

Drewmackintire commented Mar 21, 2024

iamshcha commented Apr 13, 2024