German Government Domains

An incomplete listing of german government domains (and the code for the scraper used to build the list).

You can download the list as a .csv file or view it with github's pretty formatting.

Why?

There currently isn't a publicly available directory of all the domain names registered by the german government and its agencies. Such a directory would be useful for people looking to get an aggregate view of government websites and how they are hosted. For example, Ben Balter has been doing some great work analyzing the official set of US .gov domains.

This is by no means an official or a complete list. It is intended to be a first step toward a better understanding of how the government is managing its official sites.

What can I do with it?

Plug the CSV into 18F/domain-scan to get more data (like HTTPS support) about the domains
Check the IPv6 reachability
Test if the sites are reachable even without the www. subdomain
...?

How to use

The list is populated by scrapers and static files and merged by a makefile. To run the process yourself, checkout this repository and run:

bundle install
make

After everything ran, you can look into data/domains.csv.

Scrapers and Sources

bundde-behoerden-scraper.rb: crawls an official government agency list.
data/source/bmf.csv: list from BMF, manually extracted from their digital services page on bundesfinanzministerium.de
data/source/ifg-bmas.csv: is a list from BMAS, aquired with a freedom of information request
data/source/ifg-bmvi.csv: is a list from BMVI, aquired with a freedom of information request
data/source/ifg-bva.csv: is a list from Bva, aquired with a freedom of information request
data/source/ifg-dwd.csv: is a list from DWD, aquired with a freedom of information request
data/source/overrides.csv: manually curated list of domains for which the scraper returns a wrong agency name

Contributing

I'd love to have some help with this! Please feel free to create an issue or submit a pull request if you notice something that can be better. Specifically, suggesting additional pages we can scrape and domains that are either not found or have incorrect organization names associated with them would be very helpful.

Ideas

scrape even more domains
- scrape news articles on already known government sites
- use the list of projects using the Government Site Builder
- use a wikidata query to get the website property of agencies listed in wikipedia category pages
manual collection
- try to get some domains with an freedom of information request
- look for domains in minor interpellations

Thanks

Thanks to @esonderegger for the dotmil domains project that served as an template for this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE		LICENSE
Makefile		Makefile
bundde-behoerden-scraper.rb		bundde-behoerden-scraper.rb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

German Government Domains

Why?

What can I do with it?

How to use

Scrapers and Sources

Contributing

Ideas

Thanks

About

Releases

Packages

Languages

License

deknos/german-gov-domains

Folders and files

Latest commit

History

Repository files navigation

German Government Domains

Why?

What can I do with it?

How to use

Scrapers and Sources

Contributing

Ideas

Thanks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages