[Bug fix] Register Fusion Pass fuse policy assign wrong output edges #514
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Support for int16_t load ( bloom fp16 model
for Register fusion pass (welder) fused node with multiple outputs, current code makes a wrong assignment of output edge, which will cause mistakes in some cases.
3. re-write the CUDA_ARCH string in Cuda Codegen CMakeList.txt in a more friendly way.
in current way of
if we wanna use some features which must be in sm_86, we should comment the low cuda arch gencode flag, otherwise we will get an compilation error.
with the new CUDA_ARCH SET way
we no longer have this concern.
should be
grid[2].get_to(m_gridDim.z);