Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello author, I am glad that you have made contributions to Jetson users. I have a question I would like to ask #1

Open
cplasfwst opened this issue Sep 16, 2024 · 78 comments

Comments

@cplasfwst
Copy link

Hello author, I am glad that you have made contributions to Jetson users. I have a question I would like to ask

Some errors occurred when I used the command to build the docker image. Why?

image

@cplasfwst
Copy link
Author

My l4t version is 36.3.0

@remy415
Copy link
Owner

remy415 commented Sep 16, 2024

@cplasfwst The container llama_cpp:gguf-{L4T_VERSION} with L4T_VERSION=36.3.0 is not available from that developer. Try L4T_VERSION=36.2.0 instead. See dusty-nv llama_cpp for more details.

image

@cplasfwst
Copy link
Author

image
Do you know what causes this?

@remy415
Copy link
Owner

remy415 commented Sep 16, 2024

@cplasfwst
Yes, it would seem like it’s having an issue with libcudart.so. First, on your Jetson run sudo find /usr/local/cuda -name ‘*libcudart*’ 2>/dev/null

Also for posterity you should run docker run -itu0 —rm dustynv/llama_cpp:gguf-r36.2.0 find /usr/local/cuda -name ‘*libcudart*’ 2>/dev/null and paste the output from those please.

@cplasfwst
Copy link
Author

image
Why am I running like this?

@cplasfwst
Copy link
Author

image
Dear author, my device environment is like this. What should I do to run ollama normally? Please help me. Thank you very much!

@remy415
Copy link
Owner

remy415 commented Sep 16, 2024

@cplasfwst Yes, it would seem like it’s having an issue with libcudart.so. First, on your Jetson run sudo find /usr/local/cuda -name ‘*libcudart*’ 2>/dev/null

Also for posterity you should run docker run -itu0 —rm dustynv/llama_cpp:gguf-r36.2.0 find /usr/local/cuda -name ‘*libcudart*’ 2>/dev/null and paste the output from those please.

When you are building Ollama, the sub-task to build an external library called llama_cpp is unable to properly import libcudart.so from your installed CUDA libraries. Please run the two commands I referenced in the quoted text so we can see if the libcudart.so file is present or not.

@cplasfwst
Copy link
Author

sudo find /usr/local/cuda -name ‘libcudart’ 2>/dev/null

image
I ran the command you mentioned and there was no output.
In addition, I am not very familiar with the more in-depth problems. I am a beginner and I really want to use Jetson ATX OIN to run Ollama. Do you have time to remotely find the problem for me? I would be grateful and I can pay a certain fee.

@remy415
Copy link
Owner

remy415 commented Sep 16, 2024

I am not with my computer, I am using my cell phone to type to you, sorry.

Please try this command, it is similar but checks a different directory: sudo find /usr -name ‘*cudart*’ 2>/dev/null and docker run -itu0 —rm dustynv/llama_cpp:gguf-r36.2.0 find /usr -name ‘*cudart*’ 2>/dev/null

@cplasfwst
Copy link
Author

libcudart

Thank you very much for your patience in answering my questions. Thank you very much! ! Thank you very much! ! I couldn't find libcudart using the command, but I found the directory of this so manually. Please see the picture below:
IPVO}B66PB2KIXML%()_DI8
After confirming that this so file exists, what should I do?

@remy415
Copy link
Owner

remy415 commented Sep 16, 2024

@cplasfwst
Okay if there is no output then that might be a problem. Libcudart.so is a standard library released with the CUDA toolkit, which is required for almost all AI programs on Jetson. CUDA Runtime (cudart) is their user-friendly API. It should be installed by default with Jetpack 6.

This command will take longer to run but will scan your entire file system for the library in case I tried searching the wrong one previously.

sudo find / -name ‘*cudart*’ 2>/dev/null

if that returns nothing, try this (NEW EDIT: CUDA instead of cudart for apt list)
sudo apt list | grep -i install | grep -i cuda

if that still finds nothing, please check that you installed Jetpack correctly to your Orin AGX.

@remy415
Copy link
Owner

remy415 commented Sep 16, 2024

I checked the documentation and nVidia forums to make sure that the r36.2 containers work with r36.3, the main container engineer at nVidia confirmed it does work:

image

So the problem is the libcudart file is missing or moved, I hope your find commands will discover it.

@cplasfwst
Copy link
Author

image
I have encountered many problems now, which prevent me from using ollama. I don't know how to deal with it. I have been researching for a day. I really hope you can help me.

@cplasfwst
Copy link
Author

image

@remy415
Copy link
Owner

remy415 commented Sep 18, 2024

I’ve updated the docker file, it looks like Dusty-NV updated the llama cpp container tag to just be r36.2.0.

Can you also run ls -latr /usr/local/cuda/lib/libcud*?

@cplasfwst
Copy link
Author

image
ls -latr /usr/local/cuda/lib/libcud*
The result of running is this

In addition, I ran the build again on the Jetson ATX Orin device and got this prompt:
A209617502AD36FE0AFEF7C7B097FE39

56069A388FC31E46379CF32A7479C63A

Thanks for your help again. I hope to run Ollama on Jetson ATX Orin. This problem has troubled me for many days.

@cplasfwst
Copy link
Author

image

But I tested ls -latr /usr/local/cuda/lib64/libcud* and it can appear

@remy415
Copy link
Owner

remy415 commented Sep 18, 2024

Thank you for running the search. I was able to find some suggestions on fixing the problem. It seems like the compiler can’t find libcuda.so to link the CUDA functions after compiling.

Please run docker run -itu0 --rm dustynv/llama_cpp:r36.2.0 find /usr/lib -name ‘*libcuda.so*’ 2>/dev/null

@remy415
Copy link
Owner

remy415 commented Sep 18, 2024

I have added the path /usr/lib/aarch64-linux-gnu to LD path in the Dockerfile. Try that

@cplasfwst
Copy link
Author

02b7df4b178347865347961848a4675a
Hello, this is what I got when I tried to install it again. My wife gave birth a few days ago, so I was busy for a few days. I came back today to continue the operation. Please help me. The installation went wrong.

@remy415
Copy link
Owner

remy415 commented Sep 20, 2024

The tar file isn’t unzipping correctly, and your image cuts off the error message where the real issue is. I need more of the log.

@remy415
Copy link
Owner

remy415 commented Sep 20, 2024

Also congratulations on your new child, you should be with your family :)

@cplasfwst
Copy link
Author

49c6bbab81b16732bc4f07ea7a4eb41f
b745b4e8cb65e7f71ddbf20d78f6be53
357d7494b01695b6e2bf2215bad63701
bb5db6dc93398cf72a2cf69a2a84e578

Can you see the correct error message in these pictures?

@remy415
Copy link
Owner

remy415 commented Sep 20, 2024

image
This is the problem here. The tar isn’t downloading properly. Can you run the curl command again manually to see if you can download it outside of your container build

CMAKE_VERSION=3.22.1 curl -s -L https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}-linux-$(uname -m).tar.gz

@cplasfwst
Copy link
Author

1de61c32ffe7afc436c868dd21299523
The command I entered did not return anything. What is the reason?

@cplasfwst
Copy link
Author

image
It seems that my network cannot ping the URL github.com. I try to change the network.

@cplasfwst
Copy link
Author

The curl command doesn't work for me. Where should I change it to wget? I can get the file using wget

@remy415
Copy link
Owner

remy415 commented Sep 20, 2024

Since your network is operable now, try the docker build again

@cplasfwst
Copy link
Author

5eb927c14b370785d99e7ab53bec22ab
43c4bcad25a6f034a8b3d7278d06231a
3ca66f710bd89e0728274a4abf057af2
I changed the sh script to wget and it seems to be able to download normally, but this is a new error. What is the error?

@cplasfwst
Copy link
Author

我把命令修改为这样,请问是否有错误?
image

@cplasfwst
Copy link
Author

您是否尝试docker image pull dustynv/ollama:r36.3.0过相关的运行命令?我以为您已经尝试过这个,但我认为值得问您一下。

https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/ollama

Sorry, I just went out to eat and just came back. I haven't tried this yet.

@cplasfwst
Copy link
Author

您是否尝试docker image pull dustynv/ollama:r36.3.0过相关的运行命令?我以为您已经尝试过这个,但我认为值得询问您一下。

https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/ollama

I haven't tried this yet, I'm a newbie with a jetson and don't really understand what this is T.T

@remy415
Copy link
Owner

remy415 commented Sep 21, 2024

A Jetson is just a computer with Linux on it. Its software is sometimes a little different but it’s mostly the same as a regular Linux machine.

Here is some recommended setup for Jetsons created by dusty-nv. He is an NVidia engineer who maintains a container repository for Jetson devices.

https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md

@cplasfwst
Copy link
Author

Thank you for your guidance, looking forward to your good news

@remy415
Copy link
Owner

remy415 commented Sep 22, 2024

@cplasfwst I originally made this repository to get Ollama working on my Jetson, but when I completed my work I pushed this work to the dusty-nv jetson-containers repository and he integrated it into his project. The Jetson-Containers repository is the most up-to-date repository for Jetson containers and the best place to get updates to running AI on Jetson.

Can you explain what your purpose for building the Ollama container is? Do you just want to run Ollama on your Jetson for fun or are you trying to build a custom Ollama binary?

I was able to follow the guide from dusty-nv here and run Ollama on my Jetson r36.3.0 with jetson-containers run ollama $(autotag ollama). Make sure you follow all the steps from the jetson-containers setup guide, then run your desired command.

@remy415
Copy link
Owner

remy415 commented Sep 22, 2024

After reviewing dusty-nv's container build process, in r36.x.x he employs a multi-stage build process that installs updates from an nvidia repository. I am not currently using this process and it would take quite some time to adopt it for this repo, when he already has the work finished and is able to provide a completed container.

Please use dusty-nv's jetson-container repository here

@cplasfwst
Copy link
Author

@cplasfwst我最初创建这个存储库是为了让 Ollama 在我的 Jetson 上运行,但当我完成我的工作时,我将这项工作推送到了 dusty-nv jetson-containers 存储库,然后他将其集成到他的项目中。Jetson -Containers存储库是 Jetson 容器的最新存储库,也是获取在 Jetson 上运行 AI 的更新的最佳位置。

您能解释一下构建 Ollama 容器的目的是什么吗?您只是想在 Jetson 上运行 Ollama 以求好玩,还是想构建自定义 Ollama 二进制文件?

我可以按照 dusty-nv 的指南我的 Jetson r36.3.0 上运行 Ollama jetson-containers run ollama $(autotag ollama)。确保遵循 jetson-containers 设置指南中的所有步骤,然后运行所需的命令。

I let ollama run on my jetson machine so that my machine can use ollama to play. Originally, ollama runs on my laptop, but it is not convenient for my laptop to be turned on 24 hours a day, so I plan to use the jetson machine to run it 24 hours a day.

@cplasfwst
Copy link
Author

在查看 dusty-nv 的容器构建过程后,他在 r36.xx 中采用了多阶段构建过程,从 nvidia 存储库安装更新。我目前没有使用此过程,当他已经完成工作并能够提供完整的容器时,将其采用到这个存储库中需要相当长的时间。

请在此处使用 dusty-nv 的 jetson-container 存储库

I don't quite understand the meaning, does it mean if I use this, I can run ollama normally on the jetson machine?

@remy415
Copy link
Owner

remy415 commented Sep 23, 2024

Ollama is designed to be both the backend API and a front-end client. For normal operation, someone would run Ollama as a server on the system with the GPU, then you can use Ollama on another terminal to connect to the first process.

If you use Jetson-containers command it will run it in a container as a background process and return control to a bash shell running in the container.

@remy415
Copy link
Owner

remy415 commented Sep 23, 2024

In other words, you run Ollama twice.

@remy415
Copy link
Owner

remy415 commented Sep 23, 2024

在查看 dusty-nv 的容器构建过程后,他在 r36.xx 中采用了多阶段构建过程,从 nvidia 存储库安装更新。我目前没有使用此过程,当他已经完成工作并能够提供完整的容器时,将其采用到这个存储库中需要相当长的时间。
请在此处使用 dusty-nv 的 jetson-container 存储库

I don't quite understand the meaning, does it mean if I use this, I can run ollama normally on the jetson machine?

Yes, you can run Ollama on the Jetson with it, it will run on a docker container and will open a listening port that you can connect to with another Ollama instance as a client or with a web client like the Openllm Webui.

@cplasfwst
Copy link
Author

image

Thank you very much for your guidance these days. I am running Ollama normally. I would also like to ask, can you tell me why the jetson atx orin 64G device in jtop is only running 8 cores? Do we need to turn it on?

@cplasfwst
Copy link
Author

There are 4 cores that are not running

@cplasfwst
Copy link
Author

@cplasfwst我最初创建这个存储库是为了让 Ollama 在我的 Jetson 上运行,但当我完成我的工作时,我将这项工作推送到了 dusty-nv jetson-containers 存储库,然后他将其集成到他的项目中。Jetson -Containers存储库是 Jetson 容器的最新存储库,也是获取在 Jetson 上运行 AI 的更新的最佳位置。

您能解释一下构建 Ollama 容器的目的是什么吗?您只是想在 Jetson 上运行 Ollama 以求好玩,还是想构建自定义 Ollama 二进制文件?

我可以按照 dusty-nv 的指南我的 Jetson r36.3.0 上运行 Ollama jetson-containers run ollama $(autotag ollama)。确保遵循 jetson-containers 设置指南中的所有步骤,然后运行所需的命令。

I use ollama to let One API call it, so that I can use ollama's interface to return information on another machine

@remy415
Copy link
Owner

remy415 commented Sep 23, 2024

image

Thank you very much for your guidance these days. I am running Ollama normally. I would also like to ask, can you tell me why the jetson atx orin 64G device in jtop is only running 8 cores? Do we need to turn it on?

That is normal, it is a mechanism put in place for x64 CPUs since they tend to use SMT/Hyper-Threading. You can technically change the inner workings to adjust that, but it will not be worth the time because most of your processing on the Orin is from the GPU so your performance gain will be minimal.

@cplasfwst
Copy link
Author

It took me a long time to finally get my jetson atx orin 64G to run ollama successfully! Thank you for your patience and help. I will continue to put other dockers on the jetson atx orin 64G device. I plan to put all the original raspberry pi docker images on the jetson atx orin device.

@remy415
Copy link
Owner

remy415 commented Sep 23, 2024

Stable-Diffusion is really interesting too!

@cplasfwst
Copy link
Author

I can put some Docker images running normally on the server Linux on the Jetson device, right? Because I see that the memory is relatively large, I plan to use Jetson as the main device, put all the Docker images I need on it, and let Jetson proxy the Raspberry Pi 5 to continue running all my services. Is that okay? Running ollama and my other docker images at the same time

@remy415
Copy link
Owner

remy415 commented Sep 23, 2024

Yes but the problem is AI programs take a lot of memory. Some AI models are huge and require big chunks of memory and you should only run one model at a time. I know you have 64GB but llama 40b will use almost all of that. You can try it and it might work.

If you’re trying to use multiple machines like that it can be really complex. It seems like you need clustering with Docker Swarm or Kubernetes, but that’s beyond what I can advise.

@cplasfwst
Copy link
Author

image
If I control the occupancy like this, there will be no problem running it 24 hours a day, right?

@cplasfwst
Copy link
Author

Controlling the video memory to around 32G should not have much impact on the machine, right?

@remy415
Copy link
Owner

remy415 commented Sep 24, 2024

If it seems okay to you, it should be okay. Have fun!

@cplasfwst
Copy link
Author

grateful

如果你觉得没问题,那就没问题。玩得开心!

@cplasfwst
Copy link
Author

Long time no see, dear author, I would like to ask why after I run ollama, because the 30 other devices I allocated APIs to call ollama's API, the speed is very slow. I checked jtop and found that the power consumption seems to be not fully turned on, and the GPU is only 612MHZ. Is this normal?
image

@remy415
Copy link
Owner

remy415 commented Oct 19, 2024

@cplasfwst Hello, I am not very familiar with the power profile of the AGX Orin. However, based on nvidia specifications and your screenshot it would appear your current power mode is 30w, while the specification says it supports a higher power mode of 40w and an unlocked mode.

Please examine this guide and change the power mode to MaxN (Power mode 0) using the given commands. I hope that helps.

Please also consider that this will increase the heat generated by the device. You might want to check the fan mode and ensure it is set to the highest mode, I think it is cool. This has a Fan Control section to check the setting and update it if it is needed.

@cplasfwst
Copy link
Author

Thank you very much, I tried turning on the max mode, it really fully utilized the performance of my device! Very good, you solved so many problems for me, thank you very much! You are a very good person。

@remy415
Copy link
Owner

remy415 commented Oct 20, 2024

You’re welcome, I hope you have fun with developing on your Jetson!

@cplasfwst
Copy link
Author

After I turned on the max mode, the overall speed was about 5 times faster, but I saw that the CPU temperature reached around 70. If I run it for 24 hours, will the device be damaged?
image

@remy415
Copy link
Owner

remy415 commented Oct 20, 2024

Did you change the fan profile? I suggested it to you previously but in your screenshot it doesn’t show your fans at full power.

No, 70C does not typically damage processors.

@cplasfwst
Copy link
Author

image
I see that the option here is already in cool mode. When I was at 30W before, the option was also cool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants