LLMs well suited for tool-calling #3

uogbuji · 2024-07-05T06:46:48Z

uogbuji
Jul 5, 2024
Maintainer

My initial experimentation with various models under Toolio was rather scattershot, and for example Hermes-2-Theta-Llama-3-8B was one of the first I tried because I've always rated Nous Research fine-tunes very highly for all sorts of purposes. Then yesterday I came across the BigCodeBench Leaderboard, and no surprise Theta is up well there, though in its 70B incarnation. Anyway, I thought I'd start this ongoing discussion of LLMs that 1) work with MLX and 2) are good for tool-calling based on various ratings & experiences.

From the HF release post:

We just released the 🌸 BigCodeBenchmark: testing LLMs on more realistic and harder coding tasks involving tool usage. 🛠

While benchmarks like HumanEval are saturating even GPT-4o or DeepSeekCoder-v2 just land around 50% while humans get 97%!

A few highlights 🚀:

🛠 tasks utilize diverse function calls from 139 popular Python libraries.

🤓 complex, user-oriented instructions for each task

📊 includes verified examples and high test coverage

🙋‍♂️ comes in a standard function complete form as well as instruction version

Resources:

🤗📊 HF Leaderboard

🤗🗂️ HF Dataset

🤗🔍 HF Data Viewer

💻 Code

📝 Paper

Note: DeepSeekCoder-v2 is 236B params.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLMs well suited for tool-calling #3

{{title}}

Replies: 0 comments

Select a reply

LLMs well suited for tool-calling #3

uogbuji Jul 5, 2024 Maintainer

Replies: 0 comments

uogbuji
Jul 5, 2024
Maintainer