some research in NLP
OST Collection: An AI-powered suite of models that predict the next word matches with remarkable accuracy (Text Generative Models). OST Collection is based on a novel approach to work as a full and intelligent NLP Model.
What is LLLM Assitance ?
it's stand for Large Local Language Model Assistance and what does this have to do?
let first see what are the Pros and Cons for Current LLMs available from big companies like OpenAI and Google
Pros:
-
Advanced Natural Language Understanding: These LLMs have the ability to understand and generate human-like text, making them useful for a wide range of natural language processing tasks.
-
Broad Applications: LLMs can be applied to various tasks such as language translation, text summarization, question answering, and more, making them versatile tools for developers and researchers.
-
Continuous Improvement: Both OpenAI and Google are actively working on improving their LLMs, which means that users can benefit from ongoing updates and enhancements.
Cons:
-
Ethical Concerns: Large language models have raised ethical concerns related to misinformation, bias, and potential misuse, prompting the need for responsible deployment and usage.
-
Computational Resources: Training and using LLMs require significant computational resources, which can be a barrier for smaller organizations or individuals with limited access to high-performance computing.
-
Environmental Impact: The energy consumption associated with training and running large language models has raised concerns about their environmental impact, particularly in terms of carbon emissions.
-
Data Safety: when you are using These companies AIs you data is not safe, and they have a fully transparent layer to see through your messages
-
Acting Limitations: You can not tell AI exactly how and when to act or talk
But with Large Local Language Model Assistance these things are going to be supported and i don't think like just telling without any proof in progress is something cool so just wait until 20 Nov :)
what is EasyDel ?
EasyDeL is an OpenSource Library to make your training faster and more Optimized With cool Options for training and serving in JAX/Flax and support these models with their cool options
- Llama (Support
FSDP
,MP
,DP
)(Supports gradient checkpointing) - GPT-J (Support
FSDP
,MP
,DP
)(Supports gradient checkpointing) - LT (Support
FSDP
,MP
,DP
)(Supports gradient checkpointing) - MosaicMPT (Support
FSDP
,MP
,DP
)(Supports gradient checkpointing) - GPTNeoX (Support
FSDP
,MP
,DP
)(Supports gradient checkpointing) - Falcon (Support
FSDP
,MP
,DP
)(Supports gradient checkpointing) - Palm (Support
FSDP
,MP
,DP
)(Supports gradient checkpointing) - T5 (Support
FSDP
,MP
,DP
)(Supports gradient checkpointing) - OPT (Support
FSDP
,MP
,DP
)(Supports gradient checkpointing)
the available models are trained with EasyDel on cloud TPUs
check available pretrained model EasyDel-OST Collection Like
-
Base-Falcon-7B-easydel
-
Base-MPT-1B-easydel
-
Base-MPT-7B-easydel
-
ITDF-Falcon-easydel-v0
-
ITDF-Llama-easydel-v2
-
ITDF-Llama2-easydel-v0
-
ITDF-OpenLlama-easydel-v0
-
ITDF-OpenLlama-easydel-v1
-
ITDF-OpenLlama-easydel-v2
-
Llama-Chat-easydel
-
Llama-easydel
and Many More...
Mpt-7B-Assistant(Dragon) Colab π
Model Link | Max Sentence Length | Parameters |
---|---|---|
Mpt-7B-Assistant(Dragon) π | 5144 | 7B |
LGeM-13B-MT π | 2048 | 13B |
chatLGeM π | 3300 | 7B |
LGeM-7B-C π | 2048 | 7B |
GT-J-6B π | 2048 | 6B |
LGeM-3.5B π | 2048 | 3.5B |
LGeM-1B π | 1024 | 1B |
LGeM-7B π | 2048 | 7B |
PGT-1B π | 1280 | 1B |
you have many options to choose which code to choose for train the models but we recommend using train.py that you can use fsdp and deepspeed
DeepSpeed Example
deepspeed --no_python --master_addr=4008 --num_gpus=<number_of_your_gpus_here> train.py \
--use_deepspeed \
--dataset <your dataset> \
--dataset_field <field in dataset that tokenizer tokeniz > \
--max_length=<your_max_length> \
--auto_batch \
--save_safetensors \
--model_id='trainer' \
--no_resume_from_checkpoint \
--cls_to_wrap=<YourModelBlock> \
--logging_step=10 \
--report_to='wandb' \
--save_total_limit=2 \
--no_do_eval \
--lr_scheduler_type='cosine'
FSDP Example
torchrun --nproc-per-node=<number_of_your_gpus_here> --master-port=4008 --standalone train.py \
--use_fsdp \
--dataset <your dataset> \
--dataset_field <field in dataset that tokenizer tokeniz > \
--max_length=<your_max_length> \
--auto_batch \
--save_safetensors\
--model_id='trainer' \
--no_resume_from_checkpoint\
--cls_to_wrap=<YourModelBlock> \
--logging_step=10 \
--report_to='wandb' \
--save_total_limit=2 \
--no_do_eval \
--lr_scheduler_type='cosine'
- upcoming soon
- LLM
- uses ALIBI as positionnal embeddings significantly outperforms other embeddings for zero-shot generalization.
- flash attention
- 1B , 3B ,7B ,12B 50B
- context length 9K
-
what is LGeM , LGeM is a CausalLM Model that trained on self instruct data (Alpaca data) and for initilization of the first train of main model (weight are available) I used pre weights from Alpaca LoRA (open source)
-
it's Decoder Only
-
built in Pytorch
-
you can simply import model like
from modules import LGeMForCausalLM
- and Training code is available at LGeM-Train.py (check source)
- training parameters
-
- learning rate 1e-4
-
- AdamW (weight decay 1e-2)
-
- batch 2
-
- A 100 80GB used for training (4 X)
python3 LGeM-train.py
Available at Huggingface
-
First model is LLama (LLama is the same model as Meta (old Facebook) model but had some developments )
-
it's Decoder Only
-
built in Pytorch
-
you can simply import model like
from modules import LLamaModel
- and Training code is available at LLama-Train.py (check source)
python3 LLama-train.py
-
LLMoU is an NLP model fast and good enough to play around with
-
it's Decoder Only
-
and have configs start from LLMoU-S to LLMoU-LLX
-
built in Pytorch
-
you can simply import model like
from modules import LLMoUModel
- and Training code is available at LLMoU-Train.py (check source)
python3 LLMoU-train.py
-
LLmP is one of the best current models in this project that uses ALiBi, and it's kinda the best Model in the series
-
it's Decoder Only
-
and have configs start from LLmP-S to LLmP-LLX
-
built in Pytorch
-
you can simply import model like
from modules import LLmP
- and Training code is available at LLmP-Train.py (check source)
python3 LLmP-train.py
-
LLmPU is Decoder Encoder (Transformer) and it's working perfectly fine
-
it's Decoder Encoder
-
and have configs start from LLmPU-S to LLmPU-LLX
-
built in Pytorch and using transformers from huggingface
-
you can simply import model like
-
weight are Available for Pytorch
# for simple training
from modules import LLmPUModel
# for use and generate [interface]
from modules import LLmPUForConditionalGeneration
- and Training code is available at LLmPU-Train.py (check source)
python3 LLmPU-train.py
-
PGT (Poetry Generated Transformers [funny name :) ]) is actually a nice model that can perform very nicely in multitask command and I recommend to train it with specific tasks and the weight will be available soon to use around (3.9 B)
-
it's Decoder Only
-
and have configs start from PGT-S to PGT-LLX
-
built in Pytorch
-
you can simply import model like
from modules import PGT
- and Training code is available at PGT-Train.py (check source)
python3 PGT-train.py
Model | Hidden size | number of Layers | number of Heads | Max Sentence Length | Parameters |
---|---|---|---|---|---|
PGT-S | 768 | 10 | 12 | 256 | 148.62 M |
PGT-M | 1024 | 18 | 12 | 512 | > 15 B |
PGT-X | 1536 | 28 | 16 | 512 | 947.30 M |
PGT-LX | 2048 | 34 | 32 | 768 | 1,917.49 B |
PGT-LXX | 4096 | 64 | 32 | 2000 | 13,297.54 B |
LLama | 4096 | 18 | 16 | 256 | 5,243.83 B |
LLmP-S | 768 | 10 | 8 | ALiBi | 148.82 M |
LLmP-ML | 1024 | 18 | 16 | ALiBi | > 15 B |
LLmP | 1536 | 24 | 16 | ALiBi | 834.00 M |
LLmP-X | 1792 | 36 | 16 | ALiBi | 1,567.58 B |
LLmP-L | 2048 | 32 | 32 | ALiBi | 1,816.68 B |
LLmP-LX | 4096 | 48 | 32 | ALiBi | > 15 B |
LLMoU-S | 768 | 10 | 8 | 512 | 148.14 M |
LLMoU-ML | 1024 | 18 | 16 | 512 | 329.71 M |
LLMoU | 1536 | 26 | 16 | 256 | 891.03 M |
LLMoU-X | 2048 | 34 | 32 | 256 | 1,918.02 B |
LLMoU-L | 2048 | 48 | 32 | 1024 | 2,622.98 B |
LLMoU-LX | 2048 | 52 | 32 | 2048 | > 15 B |
LLmPU-base | 1792 | 8 | 12 | 512 | 598.64 M |
LLmPU-S | 1024 | 6 | 12 | 256 | 225.68 M |
LLmPU-L | 1792 | 10 | 12 | 768 | 758.30 M |
LLmPU-LX | 2048 | 14 | 12 | 768 | 1,791.52 B |
Hi there π
I like to train deep neural nets on large datasets π§ . Among other things in this world:)
Contributions are always welcome!
email at [email protected]
This project is used by the following companies:
- You Can Be First One Here :)
- hello i am @erfanzar
ALiBi : Towards Accurate and Robust Identification of Backdoor Attacks in Federated Learning
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
RoFormer: Enhanced Transformer with Rotary Position Embedding