diff --git a/README.md b/README.md
index 0a33e4f..0ab4625 100644
--- a/README.md
+++ b/README.md
@@ -6,10 +6,8 @@
Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage. This project proves that it's possible split the workload of LLMs across multiple devices and achieve a significant speedup. Distributed Llama allows you to run huge LLMs in-house. The project uses TCP sockets to synchronize the state. You can easily configure your AI cluster by using a home router.
-
-
- Distributed Llama running Llama 2 70B on 8 Raspberry Pi 4B devices
-
+> [!TIP]
+> Check out the new article: [🌳 How to Run Llama 3.1 405B on Home Devices? Build AI Cluster!](https://medium.com/@b4rtaz/how-to-run-llama-3-405b-on-home-devices-build-ai-cluster-ad0d5ad3473b)
### 🔥 Setup Root Node by Single Command
@@ -105,13 +103,18 @@ I - inference time of the root node, T - network transfer time of the root node.
**Raspberry Pi 4B 8 GB**
-Weights = Q40, Buffer = Q80, nSamples = 16, switch = TP-Link LS1008G, tested on 0.1.0 version
-
-
+
8 x Raspberry Pi 4B 8GB
+
+
+ Distributed Llama running Llama 2 70B Q40 on 8 Raspberry Pi 4B devices
+
+
+Weights = Q40, Buffer = Q80, nSamples = 16, switch = TP-Link LS1008G, tested on 0.1.0 version
+
| Model | 1 x RasPi 4B 8 GB | 2 x RasPi 4B 8 GB | 4 x RasPi 4B 8 GB | 8 x RasPi 4B 8 GB |
|-------------|---------------------------------------------------------------------|-----------------------------------------------------------------------|--------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| Llama 2 7B | **1312.50 ms**
I: 1307.94 ms, T: 1.81 ms | **793.69 ms**
I: 739.00 ms, T: 52.50 ms | **494.00 ms** 🔥
I: 458.81 ms, T: 34.06 ms | **588.19 ms**
I: 296.69 ms, T: 289.75 ms |