How to import self_ref_v0.cu? #2

Rogerspy · 2022-06-14T03:12:29Z

I extract the self_ref_v0.cu and SRWMlayer from source code, and incorporate them into my own model. I get

ImportError: cannot import name 'self_ref' from 'xx.layers'

My package structure as follow:

layers
     __init__.py
     self_ref_v0.cu
     self_ref_layer.py

self_ref_v0.cu is the file extracted from this repo

self_ref_layer.py is the SRWMlayer package:

 from . import self_ref_v0

 class SWRMlayer(...):
     ...

The text was updated successfully, but these errors were encountered:

Rogerspy · 2022-06-14T03:37:21Z

I extract the self_ref_v0.cu and SRWMlayer from source code, and incorporate them into my own model. I get

ImportError: cannot import name 'self_ref' from 'xx.layers'

My package structure as follow:

layers
     __init__.py
     self_ref_v0.cu
     self_ref_layer.py

* `self_ref_v0.cu` is the file extracted from this repo

* `self_ref_layer.py` is the `SRWMlayer` package:
  ```
   from . import self_ref_v0
  
   class SWRMlayer(...):
       ...
  ```

I solved this problem. But I get another error:

    ...
    out = out.reshape(bsz, slen, self.num_head * self.dim_head)
RuntimeError: CUDA error: too many resources requested for launch
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

pytorch == 1.11.0
bsz=20, slen=512, self.num_head * self.dim_head = 768

I tried different parameters (slen, num_head, dim_head), the same error always occurs.

kazuki-irie · 2022-06-14T10:47:28Z

I'm not sure exactly which parameter is causing the error in your case.
You should first try a much smaller setting to see if that works, and then try larger parameters to find the problematic one.
If this still does not resolve the issue, I guess you should follow the error message, i.e., run the code with "CUDA_LAUNCH_BLOCKING=1"...

One thing to be noted is that in the current implementation, the head dimension (dim_head) can not be too big (due to the shared memory limit).
To train large models, the number of heads (num_head) has to be increased while keeping dim_head small.
The rule of thumb I use (on 2080, P100, and V100 GPUs) is to keep dim_head < 64.
For example, to get the total hidden layer size of 768, I'd set dim_head = 48 and n_head = 16.
This works for me on a 2080 while (dim_head = 96, n_head = 8) doesn't.
This limit depends on the type of GPU you are using.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to import self_ref_v0.cu? #2

How to import self_ref_v0.cu? #2

Rogerspy commented Jun 14, 2022

Rogerspy commented Jun 14, 2022 •

edited

Loading

kazuki-irie commented Jun 14, 2022

How to import self_ref_v0.cu? #2

How to import self_ref_v0.cu? #2

Comments

Rogerspy commented Jun 14, 2022

Rogerspy commented Jun 14, 2022 • edited Loading

kazuki-irie commented Jun 14, 2022

Rogerspy commented Jun 14, 2022 •

edited

Loading