Triton flash attention error #57

Atrix21 · 2023-11-16T08:39:37Z

atrix@Atrix:/mnt/c/Users/adity/OneDrive/Desktop/dnabert/DNABERT_2-main/finetune/Scripts$ sh run_dnabert2_new.sh /mnt/c/Users/adity/OneDrive/Desktop/dnabert/GUE
The provided data_path is /mnt/c/Users/adity/OneDrive/Desktop/dnabert/GUE
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using cuda_amp half precision backend
***** Running training *****
  Num examples = 4,904
  Num Epochs = 10
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 6,130
  Number of trainable parameters = 117,070,082
  0%|                                              | 0/6130 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Traceback (most recent call last):
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 1124, in ast_to_ttir
    generator.visit(fn.parse())
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 1017, in visit
    ret = super().visit(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 293, in visit_Module
    ast.NodeVisitor.generic_visit(self, node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/ast.py", line 379, in generic_visit
    self.visit(item)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 1017, in visit
    ret = super().visit(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 362, in visit_FunctionDef
    self.visit_compound_statement(node.body)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 288, in visit_compound_statement
    ret_type = self.visit(stmt)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 1017, in visit
    ret = super().visit(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 804, in visit_For
    self.visit_compound_statement(node.body)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 288, in visit_compound_statement
    ret_type = self.visit(stmt)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 1017, in visit
    ret = super().visit(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 434, in visit_AugAssign
    self.visit(assign)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 1017, in visit
    ret = super().visit(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 414, in visit_Assign
    values = self.visit(node.value)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 1017, in visit
    ret = super().visit(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 463, in visit_BinOp
    rhs = self.visit(node.right)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 1017, in visit
    ret = super().visit(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/ast.py", line 371, in visit
    return visitor(node)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 946, in visit_Call
    return fn(*args, **extra_kwargs, **kws)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/language/core.py", line 30, in wrapper
    return fn(*args, **kwargs)
TypeError: dot() got an unexpected keyword argument 'trans_b'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 303, in <module>
    train()
  File "train.py", line 285, in train
    trainer.train()
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
    return inner_training_loop(
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2735, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2767, in compute_loss
    outputs = model(**inputs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/bert_layers.py", line 862, in forward
    outputs = self.bert(
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/bert_layers.py", line 608, in forward
    encoder_outputs = self.encoder(
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/bert_layers.py", line 446, in forward
    hidden_states = layer_module(hidden_states,
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/bert_layers.py", line 327, in forward
    attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/bert_layers.py", line 240, in forward
    self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/bert_layers.py", line 185, in forward
    attention = flash_attn_qkvpacked_func(qkv, bias)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/flash_attn_triton.py", line 1021, in forward
    o, lse, ctx.softmax_scale = _flash_attn_forward(
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/flash_attn_triton.py", line 826, in _flash_attn_forward
    _fwd_kernel[grid](  # type: ignore
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 114, in run
    ret = self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 232, in run
    return self.fn.run(*args, **kwargs)
  File "<string>", line 63, in _fwd_kernel
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/compiler.py", line 476, in compile
    next_module = compile_kernel(module)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/compiler.py", line 381, in <lambda>
    lambda src: optimize_ttir(ast_to_ttir(src, signature, configs[0], constants, debug=debug, arch=arch), arch))
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler/code_generator.py", line 1133, in ast_to_ttir
    raise CompilationError(fn.src, node, repr(e)) from e
triton.compiler.errors.CompilationError: at 114:24:        else:
            if EVEN_HEADDIM:
                k = tl.load(k_ptrs + start_n * stride_kn,
                            mask=(start_n + offs_n)[:, None] < seqlen_k,
                            other=0.0)
            else:
                k = tl.load(k_ptrs + start_n * stride_kn,
                            mask=((start_n + offs_n)[:, None] < seqlen_k) &
                            (offs_d[None, :] < headdim),
                            other=0.0)
        qk = tl.zeros([BLOCK_M, BLOCK_N], dtype=tl.float32)
        qk += tl.dot(q, k, trans_b=True)
                        ^
TypeError("dot() got an unexpected keyword argument 'trans_b'")
  0%|                                              | 0/6130 [00:17<?, ?it/s][2023-11-16 12:58:30,489] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 3577) of binary: /home/atrix/miniconda3/envs/dna/bin/python3
Traceback (most recent call last):
  File "/home/atrix/miniconda3/envs/dna/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/distributed/run.py", line 806, in main
    run(args)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-11-16_12:58:30
  host      : Atrix.
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 3577)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

ChanceChallacombe · 2023-11-19T17:08:14Z

Have also encountered this

TypeError: dot() got an unexpected keyword argument 'trans_b'

Have you found any solutions?

Atrix21 · 2023-11-21T04:23:01Z

No.
There is also an issue while trying to build triton from the source , it just gets stuck . Probably why the error is occuring

Zhihan1996 · 2023-12-05T00:48:29Z

Can you try pip install triton==2.0.0.dev20221103? It works for me on A100 GPUs.

nettanetta · 2023-12-06T15:15:45Z

I had the same issue and it worked for me. can you please specify the versions of all the packages you used? (the requirement file only has the version of transformers specified)

Atrix21 · 2023-12-07T05:28:20Z

Can you try pip install triton==2.0.0.dev20221103? It works for me on A100 GPUs.

(dna) atrix@Atrix:/mnt/c/Users/adity/OneDrive/Desktop/dnabert2/DNABERT_2/finetune$ sh scripts/run_dnabert2_prom.sh /mnt/c/Users/adity/OneDrive/Desktop/dnabert2/data/balanced_data_prom_vaish/
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['classifier.bias', 'classifier.weight', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using cuda_amp half precision backend
***** Running training *****
  Num examples = 15,077
  Num Epochs = 4
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 2
  Total optimization steps = 944
  Number of trainable parameters = 117,070,082
  0%|                                                                                           | 0/944 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/tmp/tmpk4bib9l3/main.c:2:10: fatal error: cuda.h: No such file or directory
    2 | #include "cuda.h"
      |          ^~~~~~~~
compilation terminated.
Traceback (most recent call last):
  File "<string>", line 21, in _fwd_kernel
KeyError: ('2-.-0-.-0--7929002797455b30efce6e41eddc6b57-3aa563e00c5c695dd945e23b09a86848-42648570729a4835b21c1c18cebedbfe-ff946bd4b3b4a4cbdf8cedc6e1c658e0-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, torch.float32, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, True, 128, 128), (True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (False, False), (False, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 303, in <module>
    train()
  File "train.py", line 285, in train
    trainer.train()
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
    return inner_training_loop(
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2735, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2767, in compute_loss
    outputs = model(**inputs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/bert_layers.py", line 862, in forward
    outputs = self.bert(
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/bert_layers.py", line 608, in forward
    encoder_outputs = self.encoder(
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/bert_layers.py", line 446, in forward
    hidden_states = layer_module(hidden_states,
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/bert_layers.py", line 327, in forward
    attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/bert_layers.py", line 240, in forward
    self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/bert_layers.py", line 185, in forward
    attention = flash_attn_qkvpacked_func(qkv, bias)
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/flash_attn_triton.py", line 1021, in forward
    o, lse, ctx.softmax_scale = _flash_attn_forward(
  File "/home/atrix/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/1d020b803b871a976f5f3d5565f0eac8f2c7bb81/flash_attn_triton.py", line 826, in _flash_attn_forward
    _fwd_kernel[grid](  # type: ignore
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/runtime/jit.py", line 106, in launcher
    return self.run(*args, grid=grid, **kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 86, in run
    return self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 200, in run
    return self.fn.run(*args, **kwargs)
  File "<string>", line 41, in _fwd_kernel
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 1239, in compile
    so = _build(fn.__name__, src_path, tmpdir)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 1169, in _build
    ret = subprocess.check_call(cc_cmd)
  File "/home/atrix/miniconda3/envs/dna/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpk4bib9l3/main.c', '-O3', '-I/usr/local/cuda/include', '-I/home/atrix/miniconda3/envs/dna/include/python3.8', '-I/tmp/tmpk4bib9l3', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpk4bib9l3/_fwd_kernel.cpython-38-x86_64-linux-gnu.so', '-L/usr/lib/wsl/lib']' returned non-zero exit status 1.
  0%|                                                                                           | 0/944 [00:01<?, ?it/s]

this error showed up when after using the above mentioned triton version,
i could run the model without using triton at all , just want to figure out how to use triton for this model

Atrix21 · 2023-12-07T05:31:18Z

I had the same issue and it worked for me. can you please specify the versions of all the packages you used? (the requirement file only has the version of transformers specified)

(dna) atrix@Atrix:/mnt/c/Users/adity/OneDrive/Desktop/dnabert2/DNABERT_2/finetune$ conda list
# packages in environment at /home/atrix/miniconda3/envs/dna:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
_openmp_mutex             5.1                       1_gnu
accelerate                0.24.1                   pypi_0    pypi
aiohttp                   3.9.0                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
antlr4-python3-runtime    4.9.3                    pypi_0    pypi
anyio                     3.5.0            py38h06a4308_0
argon2-cffi               21.3.0             pyhd3eb1b0_0
argon2-cffi-bindings      21.2.0           py38h7f8727e_0
asttokens                 2.0.5              pyhd3eb1b0_0
async-lru                 2.0.4            py38h06a4308_0
async-timeout             4.0.3                    pypi_0    pypi
attrs                     23.1.0           py38h06a4308_0
babel                     2.11.0           py38h06a4308_0
backcall                  0.2.0              pyhd3eb1b0_0
beautifulsoup4            4.12.2           py38h06a4308_0
bertviz                   1.4.0                    pypi_0    pypi
biopython                 1.78             py38h7f8727e_0
blas                      1.0                         mkl
bleach                    4.1.0              pyhd3eb1b0_0
boto3                     1.33.7                   pypi_0    pypi
botocore                  1.33.7                   pypi_0    pypi
brotli-python             1.0.9            py38h6a678d5_7
ca-certificates           2023.08.22           h06a4308_0
certifi                   2023.11.17       py38h06a4308_0
cffi                      1.16.0           py38h5eee18b_0
chardet                   4.0.0           py38h06a4308_1003
charset-normalizer        3.3.2                    pypi_0    pypi
cmake                     3.27.9                   pypi_0    pypi
comm                      0.1.2            py38h06a4308_0
cryptography              41.0.3           py38hdda0065_0
cyrus-sasl                2.1.28               h52b45da_1
datasets                  2.15.0                   pypi_0    pypi
dbus                      1.13.18              hb2f20db_0
debugpy                   1.6.7            py38h6a678d5_0
decorator                 5.1.1              pyhd3eb1b0_0
defusedxml                0.7.1              pyhd3eb1b0_0
dill                      0.3.7                    pypi_0    pypi
einops                    0.7.0                    pypi_0    pypi
evaluate                  0.4.1                    pypi_0    pypi
executing                 0.8.3              pyhd3eb1b0_0
expat                     2.5.0                h6a678d5_0
filelock                  3.13.1                   pypi_0    pypi
fontconfig                2.14.1               h4c34cd2_2
freetype                  2.12.1               h4a9f257_0
frozenlist                1.4.0                    pypi_0    pypi
fsspec                    2023.10.0                pypi_0    pypi
glib                      2.69.1               he621ea3_2
gst-plugins-base          1.14.1               h6a678d5_1
gstreamer                 1.14.1               h5eee18b_1
huggingface-hub           0.19.4                   pypi_0    pypi
icu                       73.1                 h6a678d5_0
idna                      3.4              py38h06a4308_0
importlib-metadata        6.0.0            py38h06a4308_0
importlib_metadata        6.0.0                hd3eb1b0_0
importlib_resources       6.1.0            py38h06a4308_0
intel-openmp              2023.1.0         hdb19cb5_46306
ipykernel                 6.25.0           py38h2f386ee_0
ipython                   8.12.2           py38h06a4308_0
ipywidgets                8.0.4            py38h06a4308_0
jedi                      0.18.1           py38h06a4308_1
jinja2                    3.1.2            py38h06a4308_0
jmespath                  1.0.1                    pypi_0    pypi
joblib                    1.2.0            py38h06a4308_0
jpeg                      9e                   h5eee18b_1
json5                     0.9.6              pyhd3eb1b0_0
jsonschema                4.19.2           py38h06a4308_0
jsonschema-specifications 2023.7.1         py38h06a4308_0
jupyter                   1.0.0            py38h06a4308_8
jupyter-lsp               2.2.0            py38h06a4308_0
jupyter_client            8.6.0            py38h06a4308_0
jupyter_console           6.6.3            py38h06a4308_0
jupyter_core              5.5.0            py38h06a4308_0
jupyter_events            0.8.0            py38h06a4308_0
jupyter_server            2.10.0           py38h06a4308_0
jupyter_server_terminals  0.4.4            py38h06a4308_1
jupyterlab                4.0.8            py38h06a4308_0
jupyterlab_pygments       0.1.2                      py_0
jupyterlab_server         2.25.1           py38h06a4308_0
jupyterlab_widgets        3.0.9            py38h06a4308_0
krb5                      1.20.1               h143b758_1
ld_impl_linux-64          2.38                 h1181459_1
libclang                  14.0.6          default_hc6dbbc7_1
libclang13                14.0.6          default_he11475f_1
libcups                   2.4.2                h2d74bed_1
libedit                   3.1.20221030         h5eee18b_0
libffi                    3.4.4                h6a678d5_0
libgcc-ng                 11.2.0               h1234567_1
libgfortran-ng            11.2.0               h00389a5_1
libgfortran5              11.2.0               h1234567_1
libgomp                   11.2.0               h1234567_1
libllvm14                 14.0.6               hdb19cb5_3
libpng                    1.6.39               h5eee18b_0
libpq                     12.15                hdbd6064_1
libsodium                 1.0.18               h7b6447c_0
libstdcxx-ng              11.2.0               h1234567_1
libuuid                   1.41.5               h5eee18b_0
libxcb                    1.15                 h7f8727e_0
libxkbcommon              1.0.1                h5eee18b_1
libxml2                   2.10.4               hf1b16e4_1
lz4-c                     1.9.4                h6a678d5_0
markupsafe                2.1.3                    pypi_0    pypi
matplotlib-inline         0.1.6            py38h06a4308_0
mistune                   2.0.4            py38h06a4308_0
mkl                       2023.1.0         h213fc3f_46344
mkl-service               2.4.0            py38h5eee18b_1
mkl_fft                   1.3.8            py38h5eee18b_0
mkl_random                1.2.4            py38hdb19cb5_0
mpmath                    1.3.0                    pypi_0    pypi
multidict                 6.0.4                    pypi_0    pypi
multiprocess              0.70.15                  pypi_0    pypi
mysql                     5.7.24               h721c034_2
nbclient                  0.8.0            py38h06a4308_0
nbconvert                 7.10.0           py38h06a4308_0
nbformat                  5.9.2            py38h06a4308_0
ncurses                   6.4                  h6a678d5_0
nest-asyncio              1.5.6            py38h06a4308_0
networkx                  3.1                      pypi_0    pypi
notebook                  7.0.6            py38h06a4308_0
notebook-shim             0.2.3            py38h06a4308_0
numpy                     1.24.4                   pypi_0    pypi
numpy-base                1.24.3           py38h060ed82_1
nvidia-cublas-cu11        11.10.3.66               pypi_0    pypi
nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
nvidia-cudnn-cu11         8.5.0.96                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.2.26                 pypi_0    pypi
nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
nvidia-nccl-cu12          2.18.1                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.3.101                 pypi_0    pypi
nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
omegaconf                 2.3.0                    pypi_0    pypi
openssl                   3.0.12               h7f8727e_0
overrides                 7.4.0            py38h06a4308_0
packaging                 23.2                     pypi_0    pypi
pandas                    2.0.3                    pypi_0    pypi
pandocfilters             1.5.0              pyhd3eb1b0_0
parso                     0.8.3              pyhd3eb1b0_0
pcre                      8.45                 h295c915_0
peft                      0.6.2                    pypi_0    pypi
pexpect                   4.8.0              pyhd3eb1b0_3
pickleshare               0.7.5           pyhd3eb1b0_1003
pip                       23.3             py38h06a4308_0
pkgutil-resolve-name      1.3.10           py38h06a4308_0
platformdirs              3.10.0           py38h06a4308_0
ply                       3.11                     py38_0
pooch                     1.7.0            py38h06a4308_0
prometheus_client         0.14.1           py38h06a4308_0
prompt-toolkit            3.0.36           py38h06a4308_0
prompt_toolkit            3.0.36               hd3eb1b0_0
psutil                    5.9.6                    pypi_0    pypi
ptyprocess                0.7.0              pyhd3eb1b0_2
pure_eval                 0.2.2              pyhd3eb1b0_0
pyarrow                   14.0.1                   pypi_0    pypi
pyarrow-hotfix            0.6                      pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0
pygments                  2.15.1           py38h06a4308_1
pyopenssl                 23.2.0           py38h06a4308_0
pyqt                      5.15.10          py38h6a678d5_0
pyqt5-sip                 12.13.0          py38h5eee18b_0
pysocks                   1.7.1            py38h06a4308_0
python                    3.8.18               h955ad1f_0
python-dateutil           2.8.2              pyhd3eb1b0_0
python-fastjsonschema     2.16.2           py38h06a4308_0
python-json-logger        2.0.7            py38h06a4308_0
pytz                      2023.3.post1     py38h06a4308_0
pyyaml                    6.0.1            py38h5eee18b_0
pyzmq                     25.1.0           py38h6a678d5_0
qt-main                   5.15.2              h53bd1ea_10
qtconsole                 5.5.0            py38h06a4308_0
qtpy                      2.4.1            py38h06a4308_0
readline                  8.2                  h5eee18b_0
referencing               0.30.2           py38h06a4308_0
regex                     2023.10.3                pypi_0    pypi
requests                  2.31.0           py38h06a4308_0
responses                 0.18.0                   pypi_0    pypi
rfc3339-validator         0.1.4            py38h06a4308_0
rfc3986-validator         0.1.1            py38h06a4308_0
rpds-py                   0.10.6           py38hb02cf49_0
s3transfer                0.8.2                    pypi_0    pypi
safetensors               0.4.0                    pypi_0    pypi
scikit-learn              1.3.0            py38h1128e8f_0
scipy                     1.10.1           py38hf6e8229_1
send2trash                1.8.2            py38h06a4308_0
sentencepiece             0.1.99                   pypi_0    pypi
setuptools                68.0.0           py38h06a4308_0
sip                       6.7.12           py38h6a678d5_0
six                       1.16.0             pyhd3eb1b0_1
sniffio                   1.2.0            py38h06a4308_1
soupsieve                 2.5              py38h06a4308_0
sqlite                    3.41.2               h5eee18b_0
stack_data                0.2.0              pyhd3eb1b0_0
sympy                     1.12                     pypi_0    pypi
tbb                       2021.8.0             hdb19cb5_0
terminado                 0.17.1           py38h06a4308_0
threadpoolctl             2.2.0              pyh0d69192_0
tinycss2                  1.2.1            py38h06a4308_0
tk                        8.6.12               h1ccaba5_0
tokenizers                0.13.3                   pypi_0    pypi
tomli                     2.0.1            py38h06a4308_0
torch                     1.13.1                   pypi_0    pypi
tornado                   6.3.3            py38h5eee18b_0
tqdm                      4.66.1                   pypi_0    pypi
traitlets                 5.7.1            py38h06a4308_0
transformers              4.29.2                   pypi_0    pypi
triton                    2.0.0.dev20221103          pypi_0    pypi
typing-extensions         4.8.0                    pypi_0    pypi
typing_extensions         4.7.1            py38h06a4308_0
tzdata                    2023.3                   pypi_0    pypi
urllib3                   2.1.0                    pypi_0    pypi
wcwidth                   0.2.5              pyhd3eb1b0_0
webencodings              0.5.1                    py38_1
websocket-client          0.58.0           py38h06a4308_4
wheel                     0.41.2           py38h06a4308_0
widgetsnbextension        4.0.5            py38h06a4308_0
xxhash                    3.4.1                    pypi_0    pypi
xz                        5.4.2                h5eee18b_0
yaml                      0.2.5                h7b6447c_0
yarl                      1.9.3                    pypi_0    pypi
zeromq                    4.3.4                h2531618_0
zipp                      3.11.0           py38h06a4308_0
zlib                      1.2.13               h5eee18b_0
zstd                      1.5.5                hc292b87_0
(dna) atrix@Atrix:/mnt/c/Users/adity/OneDrive/Desktop/dnabert2/DNABERT_2/finetune$ pip list
Package                   Version
------------------------- -----------------
accelerate                0.24.1
aiohttp                   3.9.0
aiosignal                 1.3.1
antlr4-python3-runtime    4.9.3
anyio                     3.5.0
argon2-cffi               21.3.0
argon2-cffi-bindings      21.2.0
asttokens                 2.0.5
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     23.1.0
Babel                     2.11.0
backcall                  0.2.0
beautifulsoup4            4.12.2
bertviz                   1.4.0
biopython                 1.78
bleach                    4.1.0
boto3                     1.33.7
botocore                  1.33.7
Brotli                    1.0.9
certifi                   2023.11.17
cffi                      1.16.0
chardet                   4.0.0
charset-normalizer        2.0.4
cmake                     3.27.9
comm                      0.1.2
cryptography              41.0.3
datasets                  2.15.0
debugpy                   1.6.7
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.7
einops                    0.7.0
evaluate                  0.4.1
executing                 0.8.3
fastjsonschema            2.16.2
filelock                  3.13.1
frozenlist                1.4.0
fsspec                    2023.10.0
huggingface-hub           0.19.4
idna                      3.4
importlib-metadata        6.0.0
importlib-resources       6.1.0
ipykernel                 6.25.0
ipython                   8.12.2
ipywidgets                8.0.4
jedi                      0.18.1
Jinja2                    3.1.2
jmespath                  1.0.1
joblib                    1.2.0
json5                     0.9.6
jsonschema                4.19.2
jsonschema-specifications 2023.7.1
jupyter                   1.0.0
jupyter_client            8.6.0
jupyter-console           6.6.3
jupyter_core              5.5.0
jupyter-events            0.8.0
jupyter-lsp               2.2.0
jupyter_server            2.10.0
jupyter_server_terminals  0.4.4
jupyterlab                4.0.8
jupyterlab-pygments       0.1.2
jupyterlab_server         2.25.1
jupyterlab-widgets        3.0.9
MarkupSafe                2.1.1
matplotlib-inline         0.1.6
mistune                   2.0.4
mkl-fft                   1.3.8
mkl-random                1.2.4
mkl-service               2.4.0
mpmath                    1.3.0
multidict                 6.0.4
multiprocess              0.70.15
nbclient                  0.8.0
nbconvert                 7.10.0
nbformat                  5.9.2
nest-asyncio              1.5.6
networkx                  3.1
notebook                  7.0.6
notebook_shim             0.2.3
numpy                     1.24.3
nvidia-cublas-cu11        11.10.3.66
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu11    11.7.99
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu11  11.7.99
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu11         8.5.0.96
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.18.1
nvidia-nvjitlink-cu12     12.3.101
nvidia-nvtx-cu12          12.1.105
omegaconf                 2.3.0
overrides                 7.4.0
packaging                 23.1
pandas                    2.0.3
pandocfilters             1.5.0
parso                     0.8.3
peft                      0.6.2
pexpect                   4.8.0
pickleshare               0.7.5
pip                       23.3
pkgutil_resolve_name      1.3.10
platformdirs              3.10.0
ply                       3.11
pooch                     1.7.0
prometheus-client         0.14.1
prompt-toolkit            3.0.36
psutil                    5.9.0
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   14.0.1
pyarrow-hotfix            0.6
pycparser                 2.21
Pygments                  2.15.1
pyOpenSSL                 23.2.0
PyQt5                     5.15.10
PyQt5-sip                 12.13.0
PySocks                   1.7.1
python-dateutil           2.8.2
python-json-logger        2.0.7
pytz                      2023.3.post1
PyYAML                    6.0.1
pyzmq                     25.1.0
qtconsole                 5.5.0
QtPy                      2.4.1
referencing               0.30.2
regex                     2023.10.3
requests                  2.31.0
responses                 0.18.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rpds-py                   0.10.6
s3transfer                0.8.2
safetensors               0.4.0
scikit-learn              1.3.0
scipy                     1.10.1
Send2Trash                1.8.2
sentencepiece             0.1.99
setuptools                68.0.0
sip                       6.7.12
six                       1.16.0
sniffio                   1.2.0
soupsieve                 2.5
stack-data                0.2.0
sympy                     1.12
terminado                 0.17.1
threadpoolctl             2.2.0
tinycss2                  1.2.1
tokenizers                0.13.3
tomli                     2.0.1
torch                     1.13.1
tornado                   6.3.3
tqdm                      4.66.1
traitlets                 5.7.1
transformers              4.29.2
triton                    2.0.0.dev20221103
typing_extensions         4.7.1
tzdata                    2023.3
urllib3                   1.26.18
wcwidth                   0.2.5
webencodings              0.5.1
websocket-client          0.58.0
wheel                     0.41.2
widgetsnbextension        4.0.5
xxhash                    3.4.1
yarl                      1.9.3
zipp                      3.11.0

Zhihan1996 · 2023-12-11T07:22:56Z

I have never seen this error before, but it seems to result from the GCC version. Maybe you need to checkout the triton Github for more information. What's the error is you do pip uninstall triton?

lucaskbobadilla · 2023-12-15T18:33:14Z

I am stuck here too. I tried to pip uninstall triton and the error persists. Any way to run it without triton?

Atrix21 · 2023-12-19T04:25:35Z

i followed the instructions given in the readme file except for building triton from source, triton gets installed as a dependency of the requirements file. i then

pip uninstall triton

then it started working.

Atrix21 · 2023-12-19T04:29:16Z

I have never seen this error before, but it seems to result from the GCC version. Maybe you need to checkout the triton Github for more information. What's the error is you do pip uninstall triton?

ill check the triton Github , I don't understand what u mean by " What's the error is you do pip uninstall triton?"

lucaskbobadilla · 2023-12-19T20:18:17Z

Ok, so I changed my fine tuning process to a kubernetes pod where I have access to A100 GPUs and doing the pip uninstall triton step worked. But I still not able to use triton. I found this triton-lang/triton#1098 which I will give it a try.

Atrix21 · 2023-12-19T20:20:35Z

I also checked the triton repository and its issues but nothing worked for me , let me know if you find any working solutions

Atrix21 · 2023-12-20T07:02:31Z

I have never seen this error before, but it seems to result from the GCC version. Maybe you need to checkout the triton Github for more information. What's the error is you do pip uninstall triton?

what gcc version did it work on for u?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton flash attention error #57

Triton flash attention error #57

Atrix21 commented Nov 16, 2023

ChanceChallacombe commented Nov 19, 2023

Atrix21 commented Nov 21, 2023

Zhihan1996 commented Dec 5, 2023

nettanetta commented Dec 6, 2023

Atrix21 commented Dec 7, 2023

Atrix21 commented Dec 7, 2023

Zhihan1996 commented Dec 11, 2023

lucaskbobadilla commented Dec 15, 2023

Atrix21 commented Dec 19, 2023

Atrix21 commented Dec 19, 2023

lucaskbobadilla commented Dec 19, 2023

Atrix21 commented Dec 19, 2023

Atrix21 commented Dec 20, 2023

Triton flash attention error #57

Triton flash attention error #57

Comments

Atrix21 commented Nov 16, 2023

ChanceChallacombe commented Nov 19, 2023

Atrix21 commented Nov 21, 2023

Zhihan1996 commented Dec 5, 2023

nettanetta commented Dec 6, 2023

Atrix21 commented Dec 7, 2023

Atrix21 commented Dec 7, 2023

Zhihan1996 commented Dec 11, 2023

lucaskbobadilla commented Dec 15, 2023

Atrix21 commented Dec 19, 2023

Atrix21 commented Dec 19, 2023

lucaskbobadilla commented Dec 19, 2023

Atrix21 commented Dec 19, 2023

Atrix21 commented Dec 20, 2023