You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I regularly follow the developments on this project, and I must say that I am very interested and pleased with the direction curated-transformers is taking. The code is very understandable and high-quality, it's a pleasure to work with, congratulations!
This is perhaps already in your plans, but just to mention it here, I think a very nice addition to the project would be to have at least one reference implementation of an encoder-decoder style Transformers, such as the T5 architecture. T5 models are very popular for some tasks, especially in the < 1B parameters range which is still very relevant nowadays. Currently we have reference implementations for decoder-style and encoder-style models, but we're missing at least one reference implementation of an encoder-decoder-style architecture, perhaps with a reusable cross-attention block.
The text was updated successfully, but these errors were encountered:
Good question. Support for encoder-decoder architectures is definitely planned. The reason that we don't have them yet is that we first focused on encoder-only to cover the standard spaCy pipelines and then decoder-only for common LLMs, but encoder-decoder is something that we want.
I regularly follow the developments on this project, and I must say that I am very interested and pleased with the direction
curated-transformers
is taking. The code is very understandable and high-quality, it's a pleasure to work with, congratulations!This is perhaps already in your plans, but just to mention it here, I think a very nice addition to the project would be to have at least one reference implementation of an encoder-decoder style Transformers, such as the T5 architecture. T5 models are very popular for some tasks, especially in the < 1B parameters range which is still very relevant nowadays. Currently we have reference implementations for decoder-style and encoder-style models, but we're missing at least one reference implementation of an encoder-decoder-style architecture, perhaps with a reusable cross-attention block.
The text was updated successfully, but these errors were encountered: