A VSCode extension that generates markdown documentation from web pages and GitHub repositories.
If you find Docs Miner useful, please consider leaving a star ⭐ on github repository or buying me a coffee ☕ to keep me motivated to work on this project.
- Generate markdown documentation from any web URL or GitHub repository
- Two scraping methods:
- API Method (Faster but may fail on some sites)
- Browser Method (Slower but more reliable)
- Smart crawling that follows:
- Subdirectory structure from the initial URL for websites
- Repository file structure for GitHub repositories
- Configurable crawling depth with precise level control
- Real-time progress tracking
- Stop crawling at any time
- Automatically saves the markdown file in your current workspace
- Opens the generated file for immediate viewing
-
Open the Docs Miner sidebar (look for the Docs Miner icon in the Activity Bar)
-
Enter the URL you want to generate documentation from:
- For websites: any web URL (e.g., https://example.com)
- For GitHub: repository URL (e.g., https://github.com/username/repo) or specific directory (e.g., https://github.com/username/repo/tree/main/docs)
-
Adjust the crawling depth using the slider:
Website depth levels
- Depth 1: Only the entered page
- Depth 2: The entered page and links at the same directory level
- Depth 3: The entered page and links up to two directory levels
- Depth 4: The entered page and links up to three directory levels
- Depth 5: The entered page and links up to four directory levels
GitHub repository depth levels
- Depth 1: Root files only
- Depth 2: Root + one directory level
- Depth 3: Root + two directory levels
- Depth 4: Root + three directory levels
- Depth 5: Root + four directory levels
-
Specify the file name for the generated documentation. If not specified, the URL will be used instead.
-
Specify the output folder for the generated documentation. If not specified, the current workspace folder will be used.
-
Alternatively, use the "Add to File" button to choose an existing markdown file to append the crawled content to.
-
Click "Start Crawling" to begin
-
Monitor the progress in real-time
-
Use the "Stop Crawling" button if you want to end the process early
The markdown file will be automatically created in your specified output folder and opened for viewing.
- VSCode 1.80.0 or higher
- Active internet connection
Choose one of the following installation methods:
- Open VS Code
- Go to the Extensions view (Ctrl+Shift+X)
- Search for "Docs Miner"
- Click Install
- Go to the latest release
- Download the latest
docs-miner-x.x.x.vsix
file - In VS Code:
- Go to Extensions view (Ctrl+Shift+X)
- Click '...' menu (top-right)
- Select 'Install from VSIX...'
- Choose the downloaded file
- Clone the repository:
git clone https://github.com/3choff/docs-miner
- Run
npm install
in the terminal - Run
npm run compile
to build the extension - To create a VSIX package:
- Install vsce:
npm install -g @vscode/vsce
- Run
vsce package
- The .vsix file will be created in the root directory
- Install vsce:
- To install the VSIX:
- Go to VS Code Extensions view
- Click the '...' menu (top-right)
- Select 'Install from VSIX...'
- Choose the generated .vsix file
- The extension offers two methods for content extraction:
- Jina AI Reader API: Fast but may fail on some websites
- Browser-based scraping: More reliable but slower, handles JavaScript-heavy sites
- Crawling is restricted to subdirectories of the initial URL to ensure focused documentation
- Rate limiting: 0.5 second delay between requests to prevent overloading
- May be affected by website's robots.txt and rate limiting policies
- Skips non-documentation links (Images, executables, etc.)
Feedback and contributions are welcome. If you encounter any issues or have suggestions for improvements, please create a new issue on the GitHub repository.
If you'd like to contribute to the development of the extension, feel free to submit a pull request with your changes.
This extension is licensed under the MIT License.