Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during quantization step in VideoQuery example on Jetson Orin NX #53

Open
saarCogni opened this issue Nov 14, 2024 · 0 comments
Open

Comments

@saarCogni
Copy link

I'm encountering an issue while running the VideoQuery example on my Jetson Orin NX (8GB RAM, 500GB SSD). The process fails during the quantization step. I'm using the jetson-containers auto-choose Docker container.

Setup:

  • Jetson Orin NX 8GB with 500GB SSD
  • JetPack 6 and L4T 36.4.0
  • Running in a Docker container using jetson-containers
  • Command executed:
    jetson-containers run $(autotag nano_llm)
    python3 -m nano_llm.agents.video_query --api=mlc \
        --model Efficient-Large-Model/VILA1.5-3b \
        --max-context-len 256 \
        --max-new-tokens 32 \
        --video-input /dev/video0 \
        --video-output webrtc://@:8554/output \
        --nanodb /data/nanodb/coco/2017

Setup:

  python3 -m nano_llm.agents.video_query --api=mlc \
      --model Efficient-Large-Model/VILA1.5-3b \
      --max-context-len 256 \
      --max-new-tokens 32 \
      --video-input /dev/video0 \
      --video-output webrtc://@:8554/output \
      --nanodb /data/nanodb/coco/2017
  /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
    warnings.warn(
  Fetching 13 files:   0%|                                                                                                            | 0/13 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
    warnings.warn(
  Fetching 13 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 35429.47it/s]
  Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 110035.75it/s]
  16:01:44 | INFO | loading /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b/snapshots/42d1dda6807cc521ef27674ca2ae157539d17026 with MLC
  16:01:48 | INFO | NumExpr defaulting to 6 threads.
  16:01:48 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
  ['/data/models/mlc/dist/VILA1.5-3b/ctx256/VILA1.5-3b-q4f16_ft/mlc-chat-config.json', '/data/models/mlc/dist/VILA1.5-3b/ctx256/VILA1.5-3b-q4f16_ft/params/mlc-chat-config.json']
  16:01:50 | INFO | running MLC quantization:
  
  python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA1.5-3b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA1.5-3b/ctx256 --use-safetensors 
  
  
  Using path "/data/models/mlc/dist/models/VILA1.5-3b" for model "VILA1.5-3b"
  Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
  Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
  Get old param:   0%|                                                                                                          | 0/197 [00:00<?, ?tensors/sStart computing and quantizing weights... This may take a while.                                                               | 0/327 [00:00<?, ?tensors/s]
  Get old param:   1%|| 2/197 [00:02<03:22,  1.04s/tensors]Traceback (most recent call last):                                                                                    | 1/327 [00:02<13:23,  2.47s/tensors]
    File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
      exec(code, run_globals)
    File "/opt/NanoLLM/nano_llm/agents/video_query.py", line 357, in <module>
      agent = VideoQuery(**vars(args)).run() 
    File "/opt/NanoLLM/nano_llm/agents/video_query.py", line 44, in __init__
      self.llm = ChatQuery(model=model, drop_inputs=True, vision_scaling=vision_scaling, warmup=True, **kwargs) #ProcessProxy('ChatQuery', model=model, drop_inputs=True, vision_scaling=vision_scaling, warmup=True, **kwargs)
    File "/opt/NanoLLM/nano_llm/plugins/chat_query.py", line 78, in __init__
      self.model = NanoLLM.from_pretrained(model, **kwargs)
    File "/opt/NanoLLM/nano_llm/nano_llm.py", line 91, in from_pretrained
      model = MLCModel(model_path, **kwargs)
    File "/opt/NanoLLM/nano_llm/models/mlc.py", line 60, in __init__
      quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
    File "/opt/NanoLLM/nano_llm/models/mlc.py", line 276, in quantize
      subprocess.run(cmd, executable='/bin/bash', shell=True, check=True)  
    File "/usr/lib/python3.10/subprocess.py", line 526, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA1.5-3b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA1.5-3b/ctx256 --use-safetensors ' died with <Signals.SIGKILL: 9>.

Any help or suggestions would be appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant