FastAPI for efficient, AI-driven web scraping using Scrapegraph-ai
Note
This project is a fork of scrapegraph-ai-fastapi, fixed and adapted to support multi page scraping.
Table of Contents
Copy .env.example
to .env
and configure your API keys. Due to the special nature of the Gemini model, it is configured separately. Other models are configurable via API_KEY
and API_BASE_URL
.
GOOGLE_API_KEY=
GOOGLE_API_ENDPOINT=
API_KEY=
API_BASE_URL=
Ensure you have a Docker instance running. For MacOS, I recommend using OrbStack.
Available commands:
npm run docker:build
- Build the Docker imagenpm run docker:dev
- Run the container in development modenpm run dev
- Build and run in one commandnpm run docker:stop
- Stop running containersnpm run docker:clean
- Clean up Docker resources
The API supports multiple model providers and models, using langchain's init_chat_model
.
-
Google Gemini
- Provider:
google_genai
- Model:
google_genai/gemini-1.5-flash-latest
// or other model - Requires:
GOOGLE_API_KEY
orGOOGLE_API_ENDPOINT
in.env
- Provider:
-
OpenAI
- Provider:
openai
- Model:
gpt-4o-mini
// or other model - Requires:
API_KEY
orAPI_BASE_URL
in.env
- Provider:
-
Ollama
- Provider:
ollama
- Model:
ollama/llama3.1
// or other model
- Provider:
You can find more supported models on the langchain website init_chat_model.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you're interested in contributing to this project, please read the contribution guide.