Add support for Tenstorrent Grayskull and Wormhole PCIe cards #10858

cyrozap · 2024-12-16T23:55:24Z

cyrozap
Dec 16, 2024

I was going to post a feature request, but the issue template said to post here first. Hopefully I'm not the only one interested in this!

Background

Tenstorrent makes some AI accelerator cards (Grayskull and Wormhole) that connect to a host system over PCIe and are purchasable for reasonable amounts of money (IMO the price would be a lot more reasonable if it was reduced by 50%, but I digress). Probably the most interesting feature of these cards is that the Wormhole variant can be linked to other Wormhole cards over multiple 400G Ethernet links to make one big accelerator with lots of GDDR6 memory--not unlike how Nvidia GPUs can be linked together to share memory over NVLink. Tenstorrent sells some prebuilt systems with this configuration (TT-LoudBox and TT-QuietBox), each including four Wormhole n300 cards with a combined total of 96 GB of device DRAM--enough to run an LLM with 80 billion parameters.

Request

I have a Wormhole n300 card, and I'd like to get it working with ollama, which means it needs to be working with llama.cpp first. I'm hoping to be able to benchmark it and compare the performance to my 4090.

Existing efforts

@marty1885 appears to be working on a backend for this already here, though I'm not sure what the current status of it is.

Other information

Tenstorrent offers a cloud service with instances that have multiple Grayskull and Wormhole cards. I can't afford to drop $15k on a TT-QuietBox just to find out whether or not the performance is good, but depending on how much the cloud access costs it might be possible to do development and performance testing without breaking the bank.

marty1885 · 2024-12-17T02:32:14Z

marty1885
Dec 17, 2024

Hi,
Yes I am working on a backend for Tenstorrent's processors. It is my hobby work but I managed to convince Tenstorrent to give me some hardware to speed things up. We agree that the backend should eventually be upstreamed and Tenstorrent is supportive of the effort. I have some (limited) relation and support from their engineers. Really an awesome bunch of people, specially comparing to other alternative hardware vendors. The backend is a work in progress and not ready for production use yet. There's some critical operators missing from TTNN that severally hinders the performance.

There is also a few problems that is going to make upstreaming challenging, but will deal with that in due time.

though I'm not sure what the current status of it is.

Documentation is available in the repo. I just wrote it so it's a quite up to date view . https://github.com/marty1885/llama.cpp/blob/metalium-support/docs/backend/Metalium.md

There's a list of op support request in TTNN. Which I hope they can get to soon. https://github.com/tenstorrent/tt-metal/issues?q=is%3Aissue+state%3Aopen+label%3AGGML

I have a Wormhole n300 card, and I'd like to get it working with ollama, which means it needs to be working with llama.cpp first. I'm hoping to be able to benchmark it and compare the performance to my 4090.

Feel free to try my codebase. But again, it's quite slow as of now. Until the operator support is being added in TTNN. Contributions are totally welcomed. Though IMO most that can be done is done. I spend more time in TTNN fixing their problems lately.

With that said, some notes

Grayskull will have trouble support some up coming operators due to hardware limitations
Only smallers models (TinyLLaMA, Gemma 2B) is validated
FP32 support missing until they fix a few bugs

I'll be speaking at FOSDEM 2025 about the effort. Please come if you happen to be at Brussels. Will love to hear what people think!
https://fosdem.org/2025/schedule/event/fosdem-2025-5122-building-a-new-ggml-backend-how-challenges-and-oppertunities-with-novel-accelerators/

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Tenstorrent Grayskull and Wormhole PCIe cards #10858

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Add support for Tenstorrent Grayskull and Wormhole PCIe cards #10858

cyrozap Dec 16, 2024

Background

Request

Existing efforts

Other information

Replies: 1 comment

marty1885 Dec 17, 2024

cyrozap
Dec 16, 2024

marty1885
Dec 17, 2024