SFT issue #4

ErikZ719 opened this issue Nov 7, 2024

ErikZ719 opened this issue Nov 7, 2024


ErikZ719 commented Nov 7, 2024

Hi, I follow the default settings (pyproject.toml) to do fine tuning experiments on 3X3090 and it reports out of memory, is this normal? Pre-training works fine.
xing0047 commented Nov 7, 2024

Hi @ErikZ719

Thanks for your feedback.

As LLaVA official repo states, if you do not have enough gpu memory for LLaVA training, please consider,

  1. Use LoRA: As LLaVA indicates, 7B training can be fitted in 8-RTX3090 (I runned lora training on 4090 and seems work well). Make sure per_device_train_batch_size*gradient_accumulation_steps is the same as the provided script for best reproducibility.

  2. Replace zero3.json with zero3_offload.json which offloads some parameters to CPU RAM. This slows down the training speed.

ErikZ719 commented Nov 7, 2024

Thank you very much for your reply. Would it be possible to show your cuda version and “pip list” on the 4090? I'm getting a version conflict error with my lora training.

xing0047 commented Nov 7, 2024

Hi @ErikZ719

Please check below for your reference.

  • pip list

    Package                       Version     Editable project location
    ----------------------------- ----------- ---------------------------------------
    accelerate                    0.26.1
    aiofiles                      23.2.1
    aiohappyeyeballs              2.4.3
    aiohttp                       3.10.8
    aiosignal                     1.3.1
    altair                        5.2.0
    annotated-types               0.6.0
    antlr4-python3-runtime        4.9.3
    anyio                         4.6.0
    asttokens                     2.4.1
    async-timeout                 4.0.3
    attrs                         24.2.0
    av                            13.0.0
    bitsandbytes                  0.44.1
    black                         24.1.0
    bleach                        6.1.0
    blis                          0.7.11
    braceexpand                   0.1.7
    Brotli                        1.0.9
    cachetools                    5.3.3
    catalogue                     2.0.10
    certifi                       2024.8.30
    cffi                          1.16.0
    cfgv                          3.4.0
    chardet                       5.2.0
    charset-normalizer            3.3.2
    click                         8.1.7
    cloudpathlib                  0.16.0
    colorama                      0.4.6
    confection                    0.1.4
    contexttimer                  0.3.3
    contourpy                     1.3.0
    cycler                        0.12.1
    cymem                         2.0.8
    DataProperty                  1.0.1
    datasets                      2.16.1
    decorator                     4.4.2
    decord                        0.6.0
    deepspeed                     0.13.1
    diffusers                     0.16.0
    dill                          0.3.7
    distlib                       0.3.8
    distro                        1.9.0
    docker-pycreds                0.4.0
    easydict                      1.9
    einops                        0.6.1
    einops-exts                   0.0.4
    et-xmlfile                    1.1.0
    evaluate                      0.4.3
    exceptiongroup                1.2.0
    executing                     2.0.1
    fairscale                     0.4.4
    fastapi                       0.115.0
    ffmpy                         0.4.0
    filelock                      3.13.1
    fonttools                     4.54.1
    frozenlist                    1.4.1
    fsspec                        2023.10.0
    ftfy                          6.1.3
    gitdb                         4.0.11
    GitPython                     3.1.43
    gmpy2                         2.1.2
    gradio                        4.16.0
    gradio_client                 0.8.1
    h11                           0.14.0
    h5py                          3.10.0
    hf_transfer                   0.1.8
    hjson                         3.1.0
    httpcore                      0.16.3
    httpx                         0.23.3
    huggingface-hub               0.25.1
    identify                      2.5.35
    idna                          3.7
    imageio-ffmpeg                0.4.9
    importlib_resources           6.4.5
    iopath                        0.1.10
    ipython                       8.22.1
    isort                         5.13.2
    jedi                          0.19.1
    Jinja2                        3.1.4
    jiter                         0.5.0
    joblib                        1.3.2
    jsonlines                     4.0.0
    jsonschema                    4.23.0
    jsonschema-specifications     2023.12.1
    kaggle                        1.6.6
    kiwisolver                    1.4.7
    langcodes                     3.3.0
    latex2mathml                  3.77.0
    lazy_loader                   0.3
    llava                         1.2.2.post1 /home/xingyun/xingy/cca-llava
    lmms_eval                     0.2.4       /home/xingyun/xingy/cca-llava/lmms-eval
    loguru                        0.7.2
    lxml                          5.3.0
    markdown-it-py                3.0.0
    markdown2                     2.5.0
    MarkupSafe                    2.1.3
    matplotlib                    3.9.2
    matplotlib-inline             0.1.6
    mbstrdecoder                  1.1.3
    mdurl                         0.1.2
    mkl_fft                       1.3.10
    mkl_random                    1.2.7
    mkl-service                   2.4.0
    moviepy                       1.0.3
    mpmath                        1.3.0
    multidict                     6.1.0
    multiprocess                  0.70.15
    murmurhash                    1.0.10
    mutagen                       1.47.0
    mypy-extensions               1.0.0
    networkx                      3.2.1
    nltk                          3.8.1
    nodeenv                       1.8.0
    numexpr                       2.10.1
    numpy                         1.26.4
    nvidia-cuda-cupti-cu12        12.1.105
    nvidia-cuda-nvrtc-cu12        12.1.105
    nvidia-cuda-runtime-cu12      12.1.105
    nvidia-nvjitlink-cu12         12.3.101
    nvidia-nvtx-cu12              12.1.105
    omegaconf                     2.3.0
    openai                        1.51.0
    opendatasets                  0.1.22
    openpyxl                      3.1.5
    orjson                        3.10.7
    packaging                     24.1
    pandas                        2.2.3
    parso                         0.8.3
    pathspec                      0.12.1
    pathvalidate                  3.2.1
    peft                          0.13.0
    pillow                        10.4.0
    pip                           24.2
    platformdirs                  4.2.0
    portalocker                   2.8.2
    pre-commit                    3.6.2
    preshed                       3.0.9
    proglog                       0.1.10
    prompt-toolkit                3.0.43
    protobuf                      3.20.0
    psutil                        6.0.0
    pure-eval                     0.2.2
    py-cpuinfo                    9.0.0
    pyarrow                       15.0.0
    pyarrow-hotfix                0.6
    pybind11                      2.13.6
    pycocoevalcap                 1.2
    pycocotools                   2.0.8
    pycparser                     2.21
    pycryptodomex                 3.21.0
    pydantic                      2.9.2
    pydantic_core                 2.23.4
    pydeck                        0.8.1b0
    pydub                         0.25.1
    Pygments                      2.17.2
    pynvml                        11.5.0
    pyparsing                     3.1.4
    PySocks                       1.7.1
    pytablewriter                 1.2.0
    python-dateutil               2.9.0.post0
    python-magic                  0.4.27
    python-multipart              0.0.12
    python-slugify                8.0.4
    pytz                          2024.2
    PyYAML                        6.0.1
    pyyaml_env_tag                0.1
    referencing                   0.35.1
    regex                         2024.9.11
    requests                      2.32.3
    rfc3986                       1.5.0
    rich                          13.9.1
    rpds-py                       0.20.0
    ruff                          0.6.8
    sacrebleu                     2.4.3
    safetensors                   0.4.5
    scikit-image                  0.22.0
    scikit-learn                  1.2.2
    scipy                         1.14.1
    seaborn                       0.13.2
    semantic-version              2.10.0
    sentencepiece                 0.1.99
    sentry-sdk                    2.14.0
    setproctitle                  1.3.3
    setuptools                    75.1.0
    shellingham                   1.5.4
    shortuuid                     1.0.13
    six                           1.16.0
    smart-open                    6.4.0
    smmap                         5.0.1
    sniffio                       1.3.1
    soundfile                     0.12.1
    spacy-legacy                  3.0.12
    spacy-loggers                 1.0.5
    sqlitedict                    2.1.0
    srsly                         2.4.8
    stack-data                    0.6.3
    starlette                     0.38.6
    streamlit                     1.31.1
    svgwrite                      1.4.3
    sympy                         1.12
    tabledata                     1.3.3
    tabulate                      0.9.0
    tcolorpy                      0.1.6
    tenacity                      8.3.0
    text-unidecode                1.3
    threadpoolctl                 3.5.0
    tifffile                      2024.2.12
    tiktoken                      0.7.0
    timm                          0.6.13
    tokenizers                    0.15.2
    toml                          0.10.2
    tomli                         2.0.2
    tomlkit                       0.12.0
    toolz                         0.12.1
    torch                         2.1.1
    torchvision                   0.16.1
    tornado                       6.4
    tqdm                          4.66.5
    tqdm-multiprocess             0.0.11
    traitlets                     5.14.1
    transformers                  4.37.2
    transformers-stream-generator 0.0.5
    triton                        2.1.0
    typepy                        1.3.2
    typer                         0.12.5
    typing_extensions             4.11.0
    tzdata                        2024.2
    tzlocal                       5.2
    urllib3                       2.2.3
    uvicorn                       0.31.0
    validators                    0.22.0
    virtualenv                    20.25.1
    wandb                         0.18.2
    wasabi                        1.1.2
    watchdog                      4.0.0
    wavedrom                      2.0.3.post3
    wcwidth                       0.2.13
    weasel                        0.3.4
    webencodings                  0.5.1
    websockets                    13.1
    wheel                         0.44.0
    xformers                      0.0.23
    xxhash                        3.5.0
    yarl                          1.13.1
    yt-dlp                        2024.9.27
    zss                           1.2.0
    zstandard                     0.23.0
  • cuda

    import torch
    print(torch.version.cuda)  # 12.1

ErikZ719 commented Nov 7, 2024

What can i say! Man,thank you very much. : )

