Skip to content

OST Collection: An AI-powered suite of models that predict the next word matches with remarkable accuracy (Text Generative Models). OST Collection is based on a novel approach to work as a full and intelligent NLP Model.

License

Notifications You must be signed in to change notification settings

erfanzar/OST-OpenSourceTransformers

Repository files navigation

OST

some research in NLP

OST Collection: An AI-powered suite of models that predict the next word matches with remarkable accuracy (Text Generative Models). OST Collection is based on a novel approach to work as a full and intelligent NLP Model.

LLLM-Assistance

What is LLLM Assitance ?

it's stand for Large Local Language Model Assistance and what does this have to do?

let first see what are the Pros and Cons for Current LLMs available from big companies like OpenAI and Google

Pros:

  1. Advanced Natural Language Understanding: These LLMs have the ability to understand and generate human-like text, making them useful for a wide range of natural language processing tasks.

  2. Broad Applications: LLMs can be applied to various tasks such as language translation, text summarization, question answering, and more, making them versatile tools for developers and researchers.

  3. Continuous Improvement: Both OpenAI and Google are actively working on improving their LLMs, which means that users can benefit from ongoing updates and enhancements.

Cons:

  1. Ethical Concerns: Large language models have raised ethical concerns related to misinformation, bias, and potential misuse, prompting the need for responsible deployment and usage.

  2. Computational Resources: Training and using LLMs require significant computational resources, which can be a barrier for smaller organizations or individuals with limited access to high-performance computing.

  3. Environmental Impact: The energy consumption associated with training and running large language models has raised concerns about their environmental impact, particularly in terms of carbon emissions.

  4. Data Safety: when you are using These companies AIs you data is not safe, and they have a fully transparent layer to see through your messages

  5. Acting Limitations: You can not tell AI exactly how and when to act or talk

But with Large Local Language Model Assistance these things are going to be supported and i don't think like just telling without any proof in progress is something cool so just wait until 20 Nov :)

EasyDel

what is EasyDel ?

EasyDeL is an OpenSource Library to make your training faster and more Optimized With cool Options for training and serving in JAX/Flax and support these models with their cool options

  • Llama (Support FSDP, MP, DP)(Supports gradient checkpointing)
  • GPT-J (Support FSDP, MP, DP)(Supports gradient checkpointing)
  • LT (Support FSDP, MP, DP)(Supports gradient checkpointing)
  • MosaicMPT (Support FSDP, MP, DP)(Supports gradient checkpointing)
  • GPTNeoX (Support FSDP, MP, DP)(Supports gradient checkpointing)
  • Falcon (Support FSDP, MP, DP)(Supports gradient checkpointing)
  • Palm (Support FSDP, MP, DP)(Supports gradient checkpointing)
  • T5 (Support FSDP, MP, DP)(Supports gradient checkpointing)
  • OPT (Support FSDP, MP, DP)(Supports gradient checkpointing)

the available models are trained with EasyDel on cloud TPUs

check available pretrained model EasyDel-OST Collection Like

  1. Base-Falcon-7B-easydel

  2. Base-MPT-1B-easydel

  3. Base-MPT-7B-easydel

  4. ITDF-Falcon-easydel-v0

  5. ITDF-Llama-easydel-v2

  6. ITDF-Llama2-easydel-v0

  7. ITDF-OpenLlama-easydel-v0

  8. ITDF-OpenLlama-easydel-v1

  9. ITDF-OpenLlama-easydel-v2

  10. Llama-Chat-easydel

  11. Llama-easydel

and Many More...

Trained Available Models

EasyUse Model LInk

Mpt-7B-Assistant(Dragon) Colab πŸš€

chatLGeM Colab πŸš€

LGeM-7B-C Colab πŸš€

Model Link Max Sentence Length Parameters
Mpt-7B-Assistant(Dragon) πŸš€ 5144 7B
LGeM-13B-MT πŸš€ 2048 13B
chatLGeM πŸš€ 3300 7B
LGeM-7B-C πŸš€ 2048 7B
GT-J-6B πŸš€ 2048 6B
LGeM-3.5B πŸš€ 2048 3.5B
LGeM-1B πŸš€ 1024 1B
LGeM-7B πŸš€ 2048 7B
PGT-1B πŸš€ 1280 1B

Train or Finetune

you have many options to choose which code to choose for train the models but we recommend using train.py that you can use fsdp and deepspeed

DeepSpeed Example

deepspeed --no_python --master_addr=4008 --num_gpus=<number_of_your_gpus_here> train.py \
--use_deepspeed \
--dataset <your dataset> \
--dataset_field <field in dataset that tokenizer tokeniz > \
--max_length=<your_max_length> \
--auto_batch \
--save_safetensors \
--model_id='trainer' \
--no_resume_from_checkpoint \
--cls_to_wrap=<YourModelBlock> \
--logging_step=10 \
--report_to='wandb' \
--save_total_limit=2 \
--no_do_eval \
--lr_scheduler_type='cosine'

FSDP Example

torchrun --nproc-per-node=<number_of_your_gpus_here> --master-port=4008 --standalone train.py \
--use_fsdp \
--dataset <your dataset> \
--dataset_field <field in dataset that tokenizer tokeniz > \
--max_length=<your_max_length> \
--auto_batch \
--save_safetensors\
--model_id='trainer' \
--no_resume_from_checkpoint\
--cls_to_wrap=<YourModelBlock> \
--logging_step=10 \
--report_to='wandb' \
--save_total_limit=2 \
--no_do_eval \
--lr_scheduler_type='cosine'

LT (LucidTransformers)-Models

  • upcoming soon
  • LLM
  • uses ALIBI as positionnal embeddings significantly outperforms other embeddings for zero-shot generalization.
  • flash attention
  • 1B , 3B ,7B ,12B 50B
  • context length 9K

LGeM πŸš€

  • what is LGeM , LGeM is a CausalLM Model that trained on self instruct data (Alpaca data) and for initilization of the first train of main model (weight are available) I used pre weights from Alpaca LoRA (open source)

  • it's Decoder Only

  • built in Pytorch

  • you can simply import model like

from modules import LGeMForCausalLM
  • and Training code is available at LGeM-Train.py (check source)
  • training parameters
    • learning rate 1e-4
    • AdamW (weight decay 1e-2)
    • batch 2
    • A 100 80GB used for training (4 X)
python3 LGeM-train.py

Available at Huggingface

LLama πŸš€

  • First model is LLama (LLama is the same model as Meta (old Facebook) model but had some developments )

  • it's Decoder Only

  • built in Pytorch

  • you can simply import model like

from modules import LLamaModel
  • and Training code is available at LLama-Train.py (check source)
python3 LLama-train.py

LLMoU πŸš€

  • LLMoU is an NLP model fast and good enough to play around with

  • it's Decoder Only

  • and have configs start from LLMoU-S to LLMoU-LLX

  • built in Pytorch

  • you can simply import model like

from modules import LLMoUModel
  • and Training code is available at LLMoU-Train.py (check source)
python3 LLMoU-train.py

LLmP πŸš€

  • LLmP is one of the best current models in this project that uses ALiBi, and it's kinda the best Model in the series

  • it's Decoder Only

  • and have configs start from LLmP-S to LLmP-LLX

  • built in Pytorch

  • you can simply import model like

from modules import LLmP
  • and Training code is available at LLmP-Train.py (check source)
python3 LLmP-train.py

LLmPU πŸš€

  • LLmPU is Decoder Encoder (Transformer) and it's working perfectly fine

  • it's Decoder Encoder

  • and have configs start from LLmPU-S to LLmPU-LLX

  • built in Pytorch and using transformers from huggingface

  • you can simply import model like

  • weight are Available for Pytorch

# for simple training
from modules import LLmPUModel
# for use and generate [interface]
from modules import LLmPUForConditionalGeneration
  • and Training code is available at LLmPU-Train.py (check source)
python3 LLmPU-train.py

PGT πŸš€

  • PGT (Poetry Generated Transformers [funny name :) ]) is actually a nice model that can perform very nicely in multitask command and I recommend to train it with specific tasks and the weight will be available soon to use around (3.9 B)

  • it's Decoder Only

  • and have configs start from PGT-S to PGT-LLX

  • built in Pytorch

  • you can simply import model like

from modules import PGT
  • and Training code is available at PGT-Train.py (check source)
python3 PGT-train.py

Charts πŸ“Š

Model Hidden size number of Layers number of Heads Max Sentence Length Parameters
PGT-S 768 10 12 256 148.62 M
PGT-M 1024 18 12 512 > 15 B
PGT-X 1536 28 16 512 947.30 M
PGT-LX 2048 34 32 768 1,917.49 B
PGT-LXX 4096 64 32 2000 13,297.54 B
LLama 4096 18 16 256 5,243.83 B
LLmP-S 768 10 8 ALiBi 148.82 M
LLmP-ML 1024 18 16 ALiBi > 15 B
LLmP 1536 24 16 ALiBi 834.00 M
LLmP-X 1792 36 16 ALiBi 1,567.58 B
LLmP-L 2048 32 32 ALiBi 1,816.68 B
LLmP-LX 4096 48 32 ALiBi > 15 B
LLMoU-S 768 10 8 512 148.14 M
LLMoU-ML 1024 18 16 512 329.71 M
LLMoU 1536 26 16 256 891.03 M
LLMoU-X 2048 34 32 256 1,918.02 B
LLMoU-L 2048 48 32 1024 2,622.98 B
LLMoU-LX 2048 52 32 2048 > 15 B
LLmPU-base 1792 8 12 512 598.64 M
LLmPU-S 1024 6 12 256 225.68 M
LLmPU-L 1792 10 12 768 758.30 M
LLmPU-LX 2048 14 12 768 1,791.52 B

πŸš€ About Me

Hi there πŸ‘‹

I like to train deep neural nets on large datasets 🧠. Among other things in this world:)

Contributing

Contributions are always welcome!

email at [email protected]

Used By

This project is used by the following companies:

  • You Can Be First One Here :)

Author

Reference & Papers used

Hello, It's GPT-2 -- How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems

Attention Is All You Need

ALiBi : Towards Accurate and Robust Identification of Backdoor Attacks in Federated Learning

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

RoFormer: Enhanced Transformer with Rotary Position Embedding

About

OST Collection: An AI-powered suite of models that predict the next word matches with remarkable accuracy (Text Generative Models). OST Collection is based on a novel approach to work as a full and intelligent NLP Model.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published