Error during quantization step in VideoQuery example on Jetson Orin NX #53

saarCogni · 2024-11-14T14:15:39Z

I'm encountering an issue while running the VideoQuery example on my Jetson Orin NX (8GB RAM, 500GB SSD). The process fails during the quantization step. I'm using the jetson-containers auto-choose Docker container.

Setup:

Jetson Orin NX 8GB with 500GB SSD
JetPack 6 and L4T 36.4.0
Running in a Docker container using jetson-containers

Command executed:

jetson-containers run $(autotag nano_llm)
python3 -m nano_llm.agents.video_query --api=mlc \
    --model Efficient-Large-Model/VILA1.5-3b \
    --max-context-len 256 \
    --max-new-tokens 32 \
    --video-input /dev/video0 \
    --video-output webrtc://@:8554/output \
    --nanodb /data/nanodb/coco/2017

Setup:

  python3 -m nano_llm.agents.video_query --api=mlc \
      --model Efficient-Large-Model/VILA1.5-3b \
      --max-context-len 256 \
      --max-new-tokens 32 \
      --video-input /dev/video0 \
      --video-output webrtc://@:8554/output \
      --nanodb /data/nanodb/coco/2017
  /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
    warnings.warn(
  Fetching 13 files:   0%|                                                                                                            | 0/13 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
    warnings.warn(
  Fetching 13 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 35429.47it/s]
  Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 110035.75it/s]
  16:01:44 | INFO | loading /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b/snapshots/42d1dda6807cc521ef27674ca2ae157539d17026 with MLC
  16:01:48 | INFO | NumExpr defaulting to 6 threads.
  16:01:48 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
  ['/data/models/mlc/dist/VILA1.5-3b/ctx256/VILA1.5-3b-q4f16_ft/mlc-chat-config.json', '/data/models/mlc/dist/VILA1.5-3b/ctx256/VILA1.5-3b-q4f16_ft/params/mlc-chat-config.json']
  16:01:50 | INFO | running MLC quantization:
  
  python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA1.5-3b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA1.5-3b/ctx256 --use-safetensors 
  
  
  Using path "/data/models/mlc/dist/models/VILA1.5-3b" for model "VILA1.5-3b"
  Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
  Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
  Get old param:   0%|                                                                                                          | 0/197 [00:00<?, ?tensors/sStart computing and quantizing weights... This may take a while.                                                               | 0/327 [00:00<?, ?tensors/s]
  Get old param:   1%|▉                                                                                                 | 2/197 [00:02<03:22,  1.04s/tensors]Traceback (most recent call last):                                                                                    | 1/327 [00:02<13:23,  2.47s/tensors]
    File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
      exec(code, run_globals)
    File "/opt/NanoLLM/nano_llm/agents/video_query.py", line 357, in <module>
      agent = VideoQuery(**vars(args)).run() 
    File "/opt/NanoLLM/nano_llm/agents/video_query.py", line 44, in __init__
      self.llm = ChatQuery(model=model, drop_inputs=True, vision_scaling=vision_scaling, warmup=True, **kwargs) #ProcessProxy('ChatQuery', model=model, drop_inputs=True, vision_scaling=vision_scaling, warmup=True, **kwargs)
    File "/opt/NanoLLM/nano_llm/plugins/chat_query.py", line 78, in __init__
      self.model = NanoLLM.from_pretrained(model, **kwargs)
    File "/opt/NanoLLM/nano_llm/nano_llm.py", line 91, in from_pretrained
      model = MLCModel(model_path, **kwargs)
    File "/opt/NanoLLM/nano_llm/models/mlc.py", line 60, in __init__
      quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
    File "/opt/NanoLLM/nano_llm/models/mlc.py", line 276, in quantize
      subprocess.run(cmd, executable='/bin/bash', shell=True, check=True)  
    File "/usr/lib/python3.10/subprocess.py", line 526, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA1.5-3b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA1.5-3b/ctx256 --use-safetensors ' died with <Signals.SIGKILL: 9>.

Any help or suggestions would be appreciated!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error during quantization step in VideoQuery example on Jetson Orin NX #53

Error during quantization step in VideoQuery example on Jetson Orin NX #53

saarCogni commented Nov 14, 2024

Error during quantization step in VideoQuery example on Jetson Orin NX #53

Error during quantization step in VideoQuery example on Jetson Orin NX #53

Comments

saarCogni commented Nov 14, 2024

Setup:

Setup: