Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFT issue #4

Closed
ErikZ719 opened this issue Nov 7, 2024 · 4 comments
Closed

SFT issue #4

ErikZ719 opened this issue Nov 7, 2024 · 4 comments
Labels

Comments

@ErikZ719
Copy link

ErikZ719 commented Nov 7, 2024

Hi, I follow the default settings (pyproject.toml) to do fine tuning experiments on 3X3090 and it reports out of memory, is this normal? Pre-training works fine.
`Package Version Editable project location


absl-py 2.1.0
accelerate 0.26.1
aiofiles 23.2.1
altair 5.4.1
annotated-types 0.7.0
anyio 4.6.2.post1
attrs 24.2.0
bitsandbytes 0.44.1
certifi 2022.12.7
charset-normalizer 2.1.1
click 8.1.7
contourpy 1.3.0
cycler 0.12.1
deepspeed 0.13.1
docker-pycreds 0.4.0
einops 0.6.1
einops-exts 0.0.4
exceptiongroup 1.2.2
fastapi 0.115.4
ffmpy 0.4.0
filelock 3.13.1
flash-attn 2.5.8
fonttools 4.54.1
fsspec 2024.2.0
gitdb 4.0.11
GitPython 3.1.43
gradio 4.16.0
gradio_client 0.8.1
grpcio 1.67.1
h11 0.14.0
hjson 3.1.0
httpcore 0.17.3
httpx 0.24.0
huggingface-hub 0.26.2
idna 3.4
importlib_resources 6.4.5
Jinja2 3.1.3
joblib 1.4.2
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
kiwisolver 1.4.7
latex2mathml 3.77.0
llava 1.2.2.post1 /root/zqy/cca-llava
Markdown 3.7
markdown-it-py 3.0.0
markdown2 2.5.1
MarkupSafe 2.1.5
matplotlib 3.9.2
mdurl 0.1.2
mpmath 1.3.0
narwhals 1.12.1
networkx 3.2.1
ninja 1.11.1.1
numpy 1.26.3
orjson 3.10.11
packaging 24.1
pandas 2.2.3
peft 0.13.2
pillow 10.2.0
pip 24.3.1
platformdirs 4.3.6
protobuf 5.28.3
psutil 6.1.0
py-cpuinfo 9.0.0
pydantic 2.9.2
pydantic_core 2.23.4
pydub 0.25.1
Pygments 2.18.0
pynvml 11.5.0
pyparsing 3.2.0
python-dateutil 2.9.0.post0
python-multipart 0.0.17
pytz 2024.2
PyYAML 6.0.2
referencing 0.35.1
regex 2024.9.11
requests 2.28.1
rich 13.9.4
rpds-py 0.20.1
ruff 0.7.2
safetensors 0.4.5
scikit-learn 1.2.2
scipy 1.14.1
semantic-version 2.10.0
sentencepiece 0.1.99
sentry-sdk 2.17.0
setproctitle 1.3.3
setuptools 75.1.0
shellingham 1.5.4
shortuuid 1.0.13
six 1.16.0
smmap 5.0.1
sniffio 1.3.1
starlette 0.41.2
svgwrite 1.4.3
sympy 1.13.1
tensorboard 2.18.0
tensorboard-data-server 0.7.2
threadpoolctl 3.5.0
timm 0.6.13
tokenizers 0.15.2
tomlkit 0.12.0
torch 2.1.1+cu121
torchaudio 2.1.1+cu121
torchvision 0.16.1+cu121
tqdm 4.66.6
transformers 4.37.2
triton 2.1.0
typer 0.12.5
typing_extensions 4.12.2
tzdata 2024.2
urllib3 1.26.13
uvicorn 0.32.0
wandb 0.18.5
wavedrom 2.0.3.post3
websockets 11.0.3
Werkzeug 3.1.1
wheel 0.44.0
xformers 0.0.23
`

@xing0047
Copy link
Owner

xing0047 commented Nov 7, 2024

Hi @ErikZ719

Thanks for your feedback.

As LLaVA official repo states, if you do not have enough gpu memory for LLaVA training, please consider,

  1. Use LoRA: finetune_lora.sh. As LLaVA indicates, 7B training can be fitted in 8-RTX3090 (I runned lora training on 4090 and seems work well). Make sure per_device_train_batch_size*gradient_accumulation_steps is the same as the provided script for best reproducibility.

  2. Replace zero3.json with zero3_offload.json which offloads some parameters to CPU RAM. This slows down the training speed.

@ErikZ719
Copy link
Author

ErikZ719 commented Nov 7, 2024

Thank you very much for your reply. Would it be possible to show your cuda version and “pip list” on the 4090? I'm getting a version conflict error with my lora training.

@xing0047
Copy link
Owner

xing0047 commented Nov 7, 2024

Hi @ErikZ719

Please check below for your reference.

  • pip list

    Package                       Version     Editable project location
    ----------------------------- ----------- ---------------------------------------
    accelerate                    0.26.1
    aiofiles                      23.2.1
    aiohappyeyeballs              2.4.3
    aiohttp                       3.10.8
    aiosignal                     1.3.1
    altair                        5.2.0
    annotated-types               0.6.0
    antlr4-python3-runtime        4.9.3
    anyio                         4.6.0
    asttokens                     2.4.1
    async-timeout                 4.0.3
    attrs                         24.2.0
    av                            13.0.0
    bitsandbytes                  0.44.1
    black                         24.1.0
    bleach                        6.1.0
    blis                          0.7.11
    braceexpand                   0.1.7
    Brotli                        1.0.9
    cachetools                    5.3.3
    catalogue                     2.0.10
    certifi                       2024.8.30
    cffi                          1.16.0
    cfgv                          3.4.0
    chardet                       5.2.0
    charset-normalizer            3.3.2
    click                         8.1.7
    cloudpathlib                  0.16.0
    colorama                      0.4.6
    confection                    0.1.4
    contexttimer                  0.3.3
    contourpy                     1.3.0
    cycler                        0.12.1
    cymem                         2.0.8
    DataProperty                  1.0.1
    datasets                      2.16.1
    decorator                     4.4.2
    decord                        0.6.0
    deepspeed                     0.13.1
    diffusers                     0.16.0
    dill                          0.3.7
    distlib                       0.3.8
    distro                        1.9.0
    docker-pycreds                0.4.0
    easydict                      1.9
    einops                        0.6.1
    einops-exts                   0.0.4
    et-xmlfile                    1.1.0
    evaluate                      0.4.3
    exceptiongroup                1.2.0
    executing                     2.0.1
    fairscale                     0.4.4
    fastapi                       0.115.0
    ffmpy                         0.4.0
    filelock                      3.13.1
    fonttools                     4.54.1
    frozenlist                    1.4.1
    fsspec                        2023.10.0
    ftfy                          6.1.3
    gitdb                         4.0.11
    GitPython                     3.1.43
    gmpy2                         2.1.2
    gradio                        4.16.0
    gradio_client                 0.8.1
    h11                           0.14.0
    h5py                          3.10.0
    hf_transfer                   0.1.8
    hjson                         3.1.0
    httpcore                      0.16.3
    httpx                         0.23.3
    huggingface-hub               0.25.1
    identify                      2.5.35
    idna                          3.7
    imageio-ffmpeg                0.4.9
    importlib_resources           6.4.5
    iopath                        0.1.10
    ipython                       8.22.1
    isort                         5.13.2
    jedi                          0.19.1
    Jinja2                        3.1.4
    jiter                         0.5.0
    joblib                        1.3.2
    jsonlines                     4.0.0
    jsonschema                    4.23.0
    jsonschema-specifications     2023.12.1
    kaggle                        1.6.6
    kiwisolver                    1.4.7
    langcodes                     3.3.0
    latex2mathml                  3.77.0
    lazy_loader                   0.3
    llava                         1.2.2.post1 /home/xingyun/xingy/cca-llava
    lmms_eval                     0.2.4       /home/xingyun/xingy/cca-llava/lmms-eval
    loguru                        0.7.2
    lxml                          5.3.0
    markdown-it-py                3.0.0
    markdown2                     2.5.0
    MarkupSafe                    2.1.3
    matplotlib                    3.9.2
    matplotlib-inline             0.1.6
    mbstrdecoder                  1.1.3
    mdurl                         0.1.2
    mkl_fft                       1.3.10
    mkl_random                    1.2.7
    mkl-service                   2.4.0
    moviepy                       1.0.3
    mpmath                        1.3.0
    multidict                     6.1.0
    multiprocess                  0.70.15
    murmurhash                    1.0.10
    mutagen                       1.47.0
    mypy-extensions               1.0.0
    networkx                      3.2.1
    ninja                         1.11.1.1
    nltk                          3.8.1
    nodeenv                       1.8.0
    numexpr                       2.10.1
    numpy                         1.26.4
    nvidia-cublas-cu12            12.1.3.1
    nvidia-cuda-cupti-cu12        12.1.105
    nvidia-cuda-nvrtc-cu12        12.1.105
    nvidia-cuda-runtime-cu12      12.1.105
    nvidia-cufft-cu12             11.0.2.54
    nvidia-curand-cu12            10.3.2.106
    nvidia-cusolver-cu12          11.4.5.107
    nvidia-cusparse-cu12          12.1.0.106
    nvidia-nvjitlink-cu12         12.3.101
    nvidia-nvtx-cu12              12.1.105
    omegaconf                     2.3.0
    openai                        1.51.0
    opencv-python-headless        4.10.0.84
    opendatasets                  0.1.22
    openpyxl                      3.1.5
    orjson                        3.10.7
    packaging                     24.1
    pandas                        2.2.3
    parso                         0.8.3
    pathspec                      0.12.1
    pathvalidate                  3.2.1
    peft                          0.13.0
    pillow                        10.4.0
    pip                           24.2
    platformdirs                  4.2.0
    portalocker                   2.8.2
    pre-commit                    3.6.2
    preshed                       3.0.9
    proglog                       0.1.10
    prompt-toolkit                3.0.43
    protobuf                      3.20.0
    psutil                        6.0.0
    pure-eval                     0.2.2
    py-cpuinfo                    9.0.0
    pyarrow                       15.0.0
    pyarrow-hotfix                0.6
    pybind11                      2.13.6
    pycocoevalcap                 1.2
    pycocotools                   2.0.8
    pycparser                     2.21
    pycryptodomex                 3.21.0
    pydantic                      2.9.2
    pydantic_core                 2.23.4
    pydeck                        0.8.1b0
    pydub                         0.25.1
    Pygments                      2.17.2
    pynvml                        11.5.0
    pyparsing                     3.1.4
    PySocks                       1.7.1
    pytablewriter                 1.2.0
    python-dateutil               2.9.0.post0
    python-magic                  0.4.27
    python-multipart              0.0.12
    python-slugify                8.0.4
    pytz                          2024.2
    PyYAML                        6.0.1
    pyyaml_env_tag                0.1
    referencing                   0.35.1
    regex                         2024.9.11
    requests                      2.32.3
    rfc3986                       1.5.0
    rich                          13.9.1
    rpds-py                       0.20.0
    ruff                          0.6.8
    sacrebleu                     2.4.3
    safetensors                   0.4.5
    scikit-image                  0.22.0
    scikit-learn                  1.2.2
    scipy                         1.14.1
    seaborn                       0.13.2
    semantic-version              2.10.0
    sentencepiece                 0.1.99
    sentry-sdk                    2.14.0
    setproctitle                  1.3.3
    setuptools                    75.1.0
    shellingham                   1.5.4
    shortuuid                     1.0.13
    six                           1.16.0
    smart-open                    6.4.0
    smmap                         5.0.1
    sniffio                       1.3.1
    soundfile                     0.12.1
    spacy-legacy                  3.0.12
    spacy-loggers                 1.0.5
    sqlitedict                    2.1.0
    srsly                         2.4.8
    stack-data                    0.6.3
    starlette                     0.38.6
    streamlit                     1.31.1
    svgwrite                      1.4.3
    sympy                         1.12
    tabledata                     1.3.3
    tabulate                      0.9.0
    tcolorpy                      0.1.6
    tenacity                      8.3.0
    tensorboardX                  2.6.2.2
    text-unidecode                1.3
    threadpoolctl                 3.5.0
    tifffile                      2024.2.12
    tiktoken                      0.7.0
    timm                          0.6.13
    tokenizers                    0.15.2
    toml                          0.10.2
    tomli                         2.0.2
    tomlkit                       0.12.0
    toolz                         0.12.1
    torch                         2.1.1
    torchvision                   0.16.1
    tornado                       6.4
    tqdm                          4.66.5
    tqdm-multiprocess             0.0.11
    traitlets                     5.14.1
    transformers                  4.37.2
    transformers-stream-generator 0.0.5
    triton                        2.1.0
    typepy                        1.3.2
    typer                         0.12.5
    typing_extensions             4.11.0
    tzdata                        2024.2
    tzlocal                       5.2
    urllib3                       2.2.3
    uvicorn                       0.31.0
    validators                    0.22.0
    virtualenv                    20.25.1
    wandb                         0.18.2
    wasabi                        1.1.2
    watchdog                      4.0.0
    wavedrom                      2.0.3.post3
    wcwidth                       0.2.13
    weasel                        0.3.4
    webencodings                  0.5.1
    websockets                    13.1
    wheel                         0.44.0
    xformers                      0.0.23
    xxhash                        3.5.0
    yarl                          1.13.1
    yt-dlp                        2024.9.27
    zss                           1.2.0
    zstandard                     0.23.0
    
  • cuda

    import torch
    print(torch.version.cuda)  # 12.1

@ErikZ719
Copy link
Author

ErikZ719 commented Nov 7, 2024

What can i say! Man,thank you very much. : )

@xing0047 xing0047 added the SFT label Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants