Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embedding test failure #90

Open
FindHao opened this issue Dec 2, 2024 · 0 comments
Open

embedding test failure #90

FindHao opened this issue Dec 2, 2024 · 0 comments

Comments

@FindHao
Copy link
Member

FindHao commented Dec 2, 2024

torch.dynamo has an error when we run all inputs for embedding. See input 8-15, the speedup of inductor is regressed to about 1. but the separate run is good. I guess we may need to clear the torch caches for each run. may need to double check with others.

Test Plan

% python run.py --op embedding  --mode fwd  --precision fp32 --metrics latency,speedup --cudagraph --csv
 50%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                                                                                          | 8/16 [00:10<00:13,  1.63s/it]W1202 14:26:40.900000 1555063 site-packages/torch/_dynamo/convert_frame.py:897] [0/8] torch._dynamo hit config.cache_size_limit (8)
W1202 14:26:40.900000 1555063 site-packages/torch/_dynamo/convert_frame.py:897] [0/8]    function: 'inner' (/home/yhao/.conda/envs/ptd/lib/python3.11/site-packages/torch/_dynamo/external_utils.py:29)
W1202 14:26:40.900000 1555063 site-packages/torch/_dynamo/convert_frame.py:897] [0/8]    last reason: 0/0: tensor 'L['fn']._parameters['weight']' size mismatch at index 1. expected 768, actual 4096. Guard failed on a parameter, consider using torch._dynamo.config.force_parameter_static_shapes = False to allow dynamism on parameters.
W1202 14:26:40.900000 1555063 site-packages/torch/_dynamo/convert_frame.py:897] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
W1202 14:26:40.900000 1555063 site-packages/torch/_dynamo/convert_frame.py:897] [0/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:35<00:00,  2.20s/it]
(B, T, D, V);torch_embedding-latency;liger_embedding-speedup;liger_embedding-latency;inductor_embedding-speedup;inductor_embedding-latency
(32, 512, 768, 1024);0.08198399841785431;2.7226354251327316;0.030112000182271004;2.8184818097216526;0.02908799983561039
(32, 512, 768, 2048);0.08393599838018417;2.6177645608040048;0.03206399828195572;2.5172743799599404;0.033344000577926636
(32, 512, 768, 4096);0.08681599795818329;2.5027674950805396;0.03468799963593483;2.3148464405368494;0.03750399872660637
(32, 512, 768, 8192);0.09107200056314468;2.328968933685183;0.039103999733924866;2.135033736426109;0.04265600070357323
(32, 512, 768, 16384);0.09516800194978714;2.1535119323594123;0.04419200122356415;2.008102695136567;0.04739199951291084
(32, 512, 768, 32768);0.09868799895048141;2.0532622564327214;0.04806400090456009;1.9845560163724596;0.04972799867391586
(32, 512, 768, 65536);0.10198400169610977;1.9869077760572729;0.05132799968123436;1.991875083446141;0.05119999870657921
(32, 512, 768, 131072);0.10425599664449692;1.9841655315227797;0.0525440014898777;1.9938800023055079;0.0522879995405674
(8, 2048, 4096, 1024);0.40057599544525146;2.7706950010260423;0.14457599818706512;0.9976091995791384;0.4015359878540039
(8, 2048, 4096, 2048);0.434143990278244;2.7984734609092823;0.15513600409030914;1.0023641899840199;0.4331200122833252
(8, 2048, 4096, 4096);0.45849600434303284;2.7162086148487603;0.1687999963760376;1.0007683069574616;0.45814400911331177
(8, 2048, 4096, 8192);0.47046399116516113;2.4564744034128263;0.19152000546455383;0.9993881924175364;0.4707520008087158
(8, 2048, 4096, 16384);0.4845440089702606;2.2389472405695843;0.21641600131988525;1.0004624922535932;0.4843200147151947
(8, 2048, 4096, 32768);0.48924800753593445;2.0756176760366825;0.23571200668811798;0.9996077423930674;0.489439994096756
(8, 2048, 4096, 65536);0.492576003074646;1.9957215670248287;0.2468159943819046;0.9996103915740071;0.49276798963546753
(8, 2048, 4096, 131072);0.4926399886608124;1.9616461956873879;0.25113600492477417;0.9981845157581496;0.49353599548339844

% python run.py --op embedding  --mode fwd --num-inputs 1 --input-id 11 --precision fp32 --metrics latency,speedup --cudagraph --csv
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.13s/it]
(B, T, D, V);torch_embedding-latency;liger_embedding-latency;liger_embedding-speedup;inductor_embedding-latency;inductor_embedding-speedup
(8, 2048, 4096, 8192);0.47254401445388794;0.18905599415302277;2.499492367702495;0.23865599930286407;1.980021519820294
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant