For my experiments with CrewAI, I decided to try 3 different projects, starting from the easiest to the most complex. The aim of the experiments was to have a team of AI agents do following work for me:
- Examine my Startup idea
- Build AI newsletter with Google SERP
- Build AI newsletter with Reddit Scraper
- Email classifier [WIP]
- OpenAI -> GPT-4
- Gemini Pro
- Mistral 7B
- Nice, coherent results
- Didn't understand that it should use scraping tool for the output
- Result is a bunch of generic text from training data
- Mistral 7B instruct
- Nice, coherent results, a lot of emojis
- Didn't use any scraping tool for the output
- Result is a bunch of generic text from training data.
- Open Chat 3.5 7B
- The best and most "newsletter-y" results
- But again, didn't use any tool, so generic content
- Nous Hermes 7B
- Ok results
- didn't use any tool
- generic content
- Open Hermes 2.5 7B
- The tone and style of writing is great
- but generic content
- didn't understand that it needs to use tools
- Starling 7B
- Ok results
- didn't use any tool
- generic content
- Llama 2 13B
- The only model that "understood" what the task is
- but the text wasn't coherent enough, didn's sound like a newsletter
- Llama 2 13B chat
- Didn't understand the task or produce any output
- Llama 2 13B text
- Didn't understand the task or produce any output
- Llama 2 7B
- not coherent
- didn't use any tool
- no output
- Llama 2 7B text
- No actual output
- didn't use any tool
- generic content
- Llama 2 7B chat
- Didn't use any tool
- generic content
- Phi-2
- The smallest model ran into biggest problems
- Lost track of what it's suppose to do, no output