The code in this repository collects data from forges (github.com, gitlab.com and GitLab instances) about accounts (GitHub organizations or GitLab groups), their repositories and libraries.
Given this list of accounts URLs and this csv of platforms, we collect the data we need for code.gouv.fr.
- Clone this repository:
git clone https://git.sr.ht/~codegouvfr/codegouvfr-fetch-data && cd codegouvfr-fetch-data
- Install Python dependencies:
pip install -r requirements.txt
- Create a GitHub Token
- Create an account on libraries.io and create an API key on your account page.
- Set the following environment variables: GITHUB_TOKEN and LIBRARIESIO_API_KEY. Ex:
export GITHUB_TOKEN="your github token" ; export LIBRARIESIO_API_KEY="your libraries.io api key"
- Create the folders that will receive the output data:
mkdir -p data/organizations/csv && mkdir -p data/organizations/json && mkdir -p data/repositories/csv && mkdir -p data/repositories/json && mkdir -p data/libraries/csv && mkdir -p data/libraries/json
- Check the content of the
platforms.csv
file and update its content if needed.
Launch the script with python fetch.py
. The output files will be available in the subfolders of data
.
We aim at collecting data from more forges:
SourceHut is our priority because Etalab hosts some of its source code here.
If you are familiar with SourceHut GraphQL APIs and can help with contributing, feel free to send a patch to ~codegouvfr/[email protected] or to reach us directly.
We use Table Schema files.
Please refer to the schema files in this directory.
The source code of this repository is published under MIT.
2018-2023 DINSIC, DINUM, Etalab, Antoine Augusti, Bastien Guerry.
2018-2023 Other contributors, as readable in the history of this repository.